This preview shows page 1. Sign up to view the full content.
Unformatted text preview: s d j in the training set is determined. The k most similar training
documents (neighbors) are selected. The proportion of neighbors having the
same class may be taken as an estimator for the probability of that class, and
the class with the largest proportion is assigned to document di . The optimal
number k of neighbors may be estimated from additional training data by
Nearest neighbor classiﬁcation is a nonparametric method and it can be shown
that for large data sets the error rate of the 1-nearest neighbor classiﬁer is never
larger than twice the optimal error rate (Hastie et al. 2001). Several studies
have shown that k-nearest neighbor methods have very good performance in Band 20 – 2005 33 Hotho, Nürnberger, and Paaß
practice (Joachims 1998). Their drawback is the computational effort during
classiﬁcation, where basically the similarity of a document with respect to all
other documents of a training set has to be determined. Some extensions are
discussed in Se...
View Full Document
- Summer '11