May 12, 2011
Your Student id:
Problem 1 : Ensembles
Problem 2 : Support Vector Machines
Problem 3 : Belief Networks
Problem 4 : Kernels
Problem 5 : Reinforcement Learning
Problem 6 : DBSCAN/K-Means
Problem 7 : All kind of Questions
Problem 8 : Comparing Classifiers
Problem 9 : Machine Learning in General
The exam is “open books and notes” and you have 115 minutes to complete
the exam. The exam will count about 33% towards the course grade.
1) Ensemble Methods 
One key problem of ensemble methods is to obtain diverse ensembles; what are the
characteristics of a “diverse ensemble”? 
The members of the ensemble make different kind of errors.
What is the key idea of boosting? How does the boosting approach encourage the
creating of diverse ensemble 
Boosting uses weighted sampling and increases weights of examples that
were misclassified and decreases the weights of examples that were classified
correctly, encouraging the generation of classifiers which classify examples
which have been mostly misclassified in the past leading to the creation of
ensembles which make different kind of errors.
The AdaBoost algorithm restarts if the accuracy of classifiers drops below 50%.
Using ensembles whose base classifiers have a below 50% accuracy leads to a
drop in accuracy: the ensemble classifier performs worse than the base
classifiers themselves, and the drop is higher if the base classifiers make
different kind of errors.
) Support Vector Machines 
a) Support vector machines maximize the width of the margin which separates the
examples of 2 classes. What advantage does a classifier with a wide margin have over a
classifier that has a much smaller margin? 
the amount of noise with respect to an example is less than half of the margin
of the SVM classifier, the example will always be classified correctly; therefore,
having larger margins makes classifiers less sensitive to noise.
b) Non-linear support vector machine which use kernels which map a dataset into a
higher dimensional space are quite popular. What advantages you see in using non-linear
support vector machine over linear support vector machines? 
There is a higher probability
to find a hyperplane in a higher dimensional space
which linearly separates the example of the two classes as there are much more
possible hyperplanes in the higher dimensional space; and even if linear
This is the end of the preview. Sign up
access the rest of the document.