Unformatted text preview: cation: Kernels that
imply a high dimensional feature space show slightly better results in terms of
precision and recall, but they are subject to overﬁtting (Leopold & Kindermann
3.1.6 Classiﬁer Evaluations During the last years text classiﬁers have been evaluated on a number of
benchmark document collections. It turns out that the level of performance of
course depends on the document collection. Table 1 gives some representative
results achieved for the Reuters 20 newsgroups collection (Sebastiani 2002,
p.38). Concerning the relative quality of classiﬁers boosted trees, SVMs, and
k-nearest neighbors usually deliver top-notch performance, while naïve Bayes
and decision trees are less reliable.
decision tree C4.5
boosted tree F1 -value
0.878 Table 1: Performance of Different Classiﬁers for the Reuters collection 3.2 Clustering Clustering method can be used in order to ﬁnd groups of documents with
similar content. The result of clustering is typically a p...
View Full Document