It measures how useful t j is for predicting l1 from

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: lassification problem at hand. One commonly used ranking score is the information gain which for a term t j is defined as IG (t j ) = 2 ∑ c =1 p( Lc ) log2 1 2 1 1 − ∑ p(t j =m) ∑ p( Lc |t j =m) log2 p ( L c ) m =0 p ( L c | t j =m ) c =1 (8) Here p( Lc ) is the fraction of training documents with classes L1 and L2 , p(t j =1) and p(t j =0) is the number of documents with / without term t j and p( Lc |t j =m) is the conditional probability of classes L1 and L2 if term t j is contained in the document or is missing. It measures how useful t j is for predicting L1 from an information-theoretic point of view. We may determine IG (t j ) for all terms and remove those with very low information gain from the dictionary. In the following sections we describe the most frequently used data mining methods for text categorization. Band 20 – 2005 31 Hotho, Nürnberger, and Paaß 3.1.2 Naïve Bayes Classifier Probabilistic classifiers start with the assumption that the words of a document di have been generated by a probabilistic mech...
View Full Document

Ask a homework question - tutors are online