Unformatted text preview: ror-prone, in practice frequently stemming methods are applied. Stemming methods try to build the basic forms of words, i.e. strip the plural ’s’ from nouns, the ’ing’ from verbs, or other affixes. A stem is a natural group of words with equal (or very similar) meaning. After the stemming process, every word is represented by its stem. A well-known rule based stemming algorithm has been originally proposed by Porter (Porter 1980). He defined a set of production rules to iteratively transform (English) words into their stems. 2.1.2 Index Term Selection To further decrease the number of words that should be used also indexing or keyword selection algorithms can be used (see, e.g. Deerwester et al. (1990); Witten et al. (1999)). In this case, only the selected keywords are used to describe the documents. A simple method for keyword selection is to extract keywords based on their entropy. E.g. for each word t in the vocabulary the entropy as defined by Lochbaum & Streeter (1989) can be c...
