This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Machine Learning (10701) Fall 2008 Final Exam Professor: Eric Xing Date: December 8, 2008 . There are 9 questions in this exam (18 pages including this cover sheet) . Questions are not equally difficult. . This exam is open to book and notes. Computers, PDAs, Cell phones are not allowed. . You have three hours. . Good luck! 1 1 Assorted Questions [20 points] 1. ( True or False , 2 pts) PCA and Spectral Clustering (such as Andrew Ng’s) perform eigen decomposition on two different matrices. However, the size of these two matrices are the same. Solutions: F 2. ( True or False , 2 pts) The dimensionality of the feature map generated by polynomial kernel (e.g., K ( x,y ) = (1 + x · y ) d )is polynomial wrt the power d of the polynomial kernel. Solutions: T 3. ( True or False , 2 pts) Since classification is a special case of regression, logistic regression is a special case of linear regression. Solutions: F 4. ( True or False , 2 pts) For any two variables x and y having joint distribution p ( x,y ), we always have H [ x,y ] ≥ H [ x ] + H [ y ] where H is entropy function. Solutions: F 5. ( True or False , 2 pts) The Markov Blanket of a node x in a graph with vertex set X is the smallest set Z such that x ⊥ X/ { Z ∪ x } Z Solutions: T 6. ( True or False , 2 pts) For some directed graphs, moralization decreases the number of edges present in the graph. Solutions: F 7. ( True or False , 2 pts) The L 2 penalty in a ridge regression is equivalent to a Laplace prior on the weights. Solutions: F 8. ( True or False , 2 pts) There is at least one set of 4 points in ℜ 3 that can be shattered by 2 the hypothesis set of all 2D planes in ℜ 3 . Solutions: T 9. ( True or False , 2 pts) The loglikelihood of the data will always increase through successive iterations of the expectation maximation algorithm. Solutions: F 10. ( True or False , 2 pts) One disadvantage of Qlearning is that it can only be used when the learner has prior knowledge of how its actions affect its environment. Solutions: F 3 2 Support Vector Machine(SVM) [10 pts] 1. Properties of Kernel 1.1. (2 pts) Prove that the kernel K ( x 1 ,x 2 ) is symmetric, where x i and x j are the feature vectors for i th and j th examples. hints: Your proof will not be longer than 2 or 3 lines. Solutions: Let Φ( x 1 ) and Φ( x 2 ) be the feature maps for x i and x j , respectively. Then, we have K ( x 1 ,x 2 ) = Φ( x 1 ) ′ Φ( x 2 ) = Φ( x 2 ) ′ Φ( x 1 ) = K ( x 2 ,x 1 ) 1.2. (4 pts) Given n training examples ( x i ,x j )( i,j = 1 ,...,n ), the kernel matrix A is an n × n square matrix, where A ( i,j ) = K ( x i ,x j ). Prove that the kernel matrix A is semipositive definite. hints: (1) Remember that an n × n matrix A is semipositive definite iff. for any n dimensional vector f , we have f ′ Af ≥ 0. (2) For simplicity, you can prove this statement just for the following particular kernel function: K ( x i ,x j ) = (1 + x i x j ) 2 ....
View
Full Document
 Spring '09
 Lanzi
 Regression Analysis, Machine Learning, Probability theory, Markov chain, Maximum likelihood, Estimation theory

Click to edit the document details