2003 bloehdorn hotho 2004 band 20 2005 29 hotho

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: nguistic preprocessing is of limited value compared to the simple bag-of-words approach with basic preprocessing. The reason is that the co-occurrence of terms in the vector representation serves as an automatic disambiguation, e.g. for classification (Leopold & Kindermann 2002). Recently some progress was made by enhancing bag of words with linguistic feature for text clustering and classification (Hotho et al. 2003; Bloehdorn & Hotho 2004). Band 20 – 2005 29 Hotho, Nürnberger, and Paaß 3 Data Mining Methods for Text One main reason for applying data mining methods to text document collections is to structure them. A structure can significantly simplify the access to a document collection for a user. Well known access structures are library catalogues or book indexes. However, the problem of manual designed indexes is the time required to maintain them. Therefore, they are very often not up-to-date and thus not usable for recent publications or frequently changing information sources like the World Wide Web. The existing methods fo...
View Full Document

Ask a homework question - tutors are online