This preview shows page 1. Sign up to view the full content.
Unformatted text preview: org/tutorials/dtree18.pdf)
Common Node Impurity M e as ure s wikicour senote.com/w/index.php?title= Stat841&pr intable= yes 71/74 10/09/2013 Stat841  Wiki Cour se Notes Some common node impurity measures are:
Misclassification error: Gini Index: Cross entropy: KNeares t Neighbours Clas s ification (http://en.wikipedia.org/wiki/Kneares t_neighbor_algorithm)
K nearest neighbours is a very simple algorithm that classifies points based on a majority vote of the
nearest points in the feature space, with the object being assigned to
the class most common among its
nearest neighbors.
is a positive integer, typically small which is chosen by cross validation. If
, then the object is simply
assigned to the class of its nearest neighbor.
1. Ties are broken at random.
2. If we assume the features are real, we can use the Euclidean distance in feature space.
3. Since the features are measured in different units, we can standardize the features to have mean zero and variance 1.
Prope rty[42] (http://e n.wikipe dia.org/wiki/Kne are s t_ne ighbor_algorithm#Prope rtie s )
K mearest neighbor algorithm has some strong results. As the number of data points goes infinity, the algorithm is guaranteed to yield an error rate no worse than twice the
Bayes error rate (the minimum achievable error rate given the distribution of the data). K nearest neighbor is guaranteed to approach the Bayes error rate, for some value of
k (where k increases as a function of the number of data points). Boos ting
Boosting (http://en.wikipedia.org/wiki/Boosting) algorithms are a class of machine learning meta algorithms that can improve weak classifiers. If we have a weak classifier
which slightly does better than random classification, then by assigning larger weights to points which are misclassified and trying to minimize the new cost function, we
probably can get a new classifier which classifies with less error. This procedure can be repeated for a finite number of times and then a new classifier which is a weighed
aggregation of the generat...
View Full
Document
 Winter '13

Click to edit the document details