final2008f-solution

# final2008f-solution - Machine Learning(10-701 Fall 2008...

This preview shows pages 1–5. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Machine Learning (10-701) Fall 2008 Final Exam Professor: Eric Xing Date: December 8, 2008 . There are 9 questions in this exam (18 pages including this cover sheet) . Questions are not equally difficult. . This exam is open to book and notes. Computers, PDAs, Cell phones are not allowed. . You have three hours. . Good luck! 1 1 Assorted Questions [20 points] 1. ( True or False , 2 pts) PCA and Spectral Clustering (such as Andrew Ng’s) perform eigen- decomposition on two different matrices. However, the size of these two matrices are the same. Solutions: F 2. ( True or False , 2 pts) The dimensionality of the feature map generated by polynomial kernel (e.g., K ( x,y ) = (1 + x · y ) d )is polynomial wrt the power d of the polynomial kernel. Solutions: T 3. ( True or False , 2 pts) Since classification is a special case of regression, logistic regression is a special case of linear regression. Solutions: F 4. ( True or False , 2 pts) For any two variables x and y having joint distribution p ( x,y ), we always have H [ x,y ] ≥ H [ x ] + H [ y ] where H is entropy function. Solutions: F 5. ( True or False , 2 pts) The Markov Blanket of a node x in a graph with vertex set X is the smallest set Z such that x ⊥ X/ { Z ∪ x }| Z Solutions: T 6. ( True or False , 2 pts) For some directed graphs, moralization decreases the number of edges present in the graph. Solutions: F 7. ( True or False , 2 pts) The L 2 penalty in a ridge regression is equivalent to a Laplace prior on the weights. Solutions: F 8. ( True or False , 2 pts) There is at least one set of 4 points in ℜ 3 that can be shattered by 2 the hypothesis set of all 2D planes in ℜ 3 . Solutions: T 9. ( True or False , 2 pts) The log-likelihood of the data will always increase through successive iterations of the expectation maximation algorithm. Solutions: F 10. ( True or False , 2 pts) One disadvantage of Q-learning is that it can only be used when the learner has prior knowledge of how its actions affect its environment. Solutions: F 3 2 Support Vector Machine(SVM) [10 pts] 1. Properties of Kernel 1.1. (2 pts) Prove that the kernel K ( x 1 ,x 2 ) is symmetric, where x i and x j are the feature vectors for i th and j th examples. hints: Your proof will not be longer than 2 or 3 lines. Solutions: Let Φ( x 1 ) and Φ( x 2 ) be the feature maps for x i and x j , respectively. Then, we have K ( x 1 ,x 2 ) = Φ( x 1 ) ′ Φ( x 2 ) = Φ( x 2 ) ′ Φ( x 1 ) = K ( x 2 ,x 1 ) 1.2. (4 pts) Given n training examples ( x i ,x j )( i,j = 1 ,...,n ), the kernel matrix A is an n × n square matrix, where A ( i,j ) = K ( x i ,x j ). Prove that the kernel matrix A is semi-positive definite. hints: (1) Remember that an n × n matrix A is semi-positive definite iff. for any n dimensional vector f , we have f ′ Af ≥ 0. (2) For simplicity, you can prove this statement just for the following particular kernel function: K ( x i ,x j ) = (1 + x i x j ) 2 ....
View Full Document

{[ snackBarMessage ]}

### Page1 / 18

final2008f-solution - Machine Learning(10-701 Fall 2008...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online