1 Final Exam COSC 6342 Machine Learning May 12, 2011 Version B Your Name: Your Student id: Problem 1 : Ensembles Problem 2 : Support Vector Machines Problem 3 : Belief Networks Problem 4 : Kernels Problem 5 : Reinforcement Learning Problem 6 : DBSCAN/K-Means Problem 7 : All kind of Questions Problem 8 : Comparing Classifiers Problem 9 : Machine Learning in General : Grade: The exam is “open books and notes” and you have 115 minutes to complete the exam. The exam will count about 33% towards the course grade.
2 1) Ensemble Methods  a) One key problem of ensemble methods is to obtain diverse ensembles; what are the characteristics of a “diverse ensemble”?  The members of the ensemble make different kind of errors. b) What is the key idea of boosting? How does the boosting approach encourage the creating of diverse ensemble  Boosting uses weighted sampling and increases weights of examples that were misclassified and decreases the weights of examples that were classified correctly, encouraging the generation of classifiers which classify examples which have been mostly misclassified in the past leading to the creation of ensembles which make different kind of errors. c) The AdaBoost algorithm restarts if the accuracy of classifiers drops below 50%. Why?  Using ensembles whose base classifiers have a below 50% accuracy leads to a drop in accuracy: the ensemble classifier performs worse than the base classifiers themselves, and the drop is higher if the base classifiers make different kind of errors. 2 ) Support Vector Machines  a) Support vector machines maximize the width of the margin which separates the examples of 2 classes. What advantage does a classifier with a wide margin have over a classifier that has a much smaller margin?  If the amount of noise with respect to an example is less than half of the margin of the SVM classifier, the example will always be classified correctly; therefore, having larger margins makes classifiers less sensitive to noise. b) Non-linear support vector machine which use kernels which map a dataset into a higher dimensional space are quite popular. What advantages you see in using non-linear support vector machine over linear support vector machines?  There is a higher probability to find a hyperplane in a higher dimensional space which linearly separates the example of the two classes as there are much more possible hyperplanes in the higher dimensional space; and even if linear
This is the end of the preview. Sign up to
access the rest of the document.