If from unlabeled documents it may be determined that

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: use index term selection methods as discussed above in Sect. 3.1.1. 32 LDV-FORUM A Brief Survey of Text Mining Although this model is unrealistic due to its restrictive independence assumption it yields surprisingly good classifications (Dumais et al. 1998; Joachims 1998). It may be extended into several directions (Sebastiani 2002). As the effort for manually labeling the documents of the training set is high, some authors use unlabeled documents for training. Assume that from a small training set it has been established that word ti is highly correlated with class Lc . If from unlabeled documents it may be determined that word t j is highly correlated with ti , then also t j is a good predictor for class Lc . In this way unlabeled documents may improve classification performance. In Nigam et al. (2000) the authors used a combination of Expectation-Maximization (EM, Dempster et al. (1977)) and a naïve Bayes classifier and were able to reduce the classification error by up to 30%. 3.1.3 Nearest Neighbor Classifier Ins...
View Full Document

This note was uploaded on 06/19/2011 for the course IT 2258 taught by Professor Aymenali during the Summer '11 term at Abu Dhabi University.

Ask a homework question - tutors are online