Unformatted text preview: ror-prone, in practice frequently stemming methods are applied.
Stemming methods try to build the basic forms of words, i.e. strip the plural
’s’ from nouns, the ’ing’ from verbs, or other afﬁxes. A stem is a natural group
of words with equal (or very similar) meaning. After the stemming process,
every word is represented by its stem. A well-known rule based stemming
algorithm has been originally proposed by Porter (Porter 1980). He deﬁned a
set of production rules to iteratively transform (English) words into their stems.
2.1.2 Index Term Selection To further decrease the number of words that should be used also indexing or
keyword selection algorithms can be used (see, e.g. Deerwester et al. (1990);
Witten et al. (1999)). In this case, only the selected keywords are used to describe
the documents. A simple method for keyword selection is to extract keywords
based on their entropy. E.g. for each word t in the vocabulary the entropy as
deﬁned by Lochbaum & Streeter (1989) can be c...
View Full Document
This note was uploaded on 06/19/2011 for the course IT 2258 taught by Professor Aymenali during the Summer '11 term at Abu Dhabi University.
- Summer '11