2001 several studies have shown that k nearest

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: s d j in the training set is determined. The k most similar training documents (neighbors) are selected. The proportion of neighbors having the same class may be taken as an estimator for the probability of that class, and the class with the largest proportion is assigned to document di . The optimal number k of neighbors may be estimated from additional training data by cross-validation. Nearest neighbor classification is a nonparametric method and it can be shown that for large data sets the error rate of the 1-nearest neighbor classifier is never larger than twice the optimal error rate (Hastie et al. 2001). Several studies have shown that k-nearest neighbor methods have very good performance in Band 20 – 2005 33 Hotho, Nürnberger, and Paaß practice (Joachims 1998). Their drawback is the computational effort during classification, where basically the similarity of a document with respect to all other documents of a training set has to be determined. Some extensions are discussed in Se...
View Full Document

Ask a homework question - tutors are online