Unformatted text preview: hest relative entropy
as an index term. Then mark again this document and all other documents
containing this term. Repeat this process until all documents are marked, then
unmark them all and start again. The process can be terminated when the
desired number of index terms have been selected. A more detailed discussion
of the beneﬁts of this approach for clustering – with respect to reduction of
words required in order to obtain a good clustering performance – can be found
in Borgelt & Nürnberger (2004).
An index term selection methods that is more appropriate if we have to learn a
classiﬁer for documents is discussed in Sect. 3.1.1. This approach also considers
the word distributions within the classes.
2.2 The Vector Space Model Despite of its simple data structure without using any explicit semantic information, the vector space model enables very efﬁcient analysis of huge document
collections. It was originally introduced for indexing and information retrieval
View Full Document