This procedure is recursively applied to mi and mi it

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: bastiani (2002). 3.1.4 Decision Trees Decision trees are classifiers which consist of a set of rules which are applied in a sequential way and finally yield a decision. They can be best explained by observing the training process, which starts with a comprehensive training set. It uses a divide and conquer strategy: For a training set M with labelled documents the word ti is selected, which can predict the class of the documents in the best way, e.g. by the information gain (8). Then M is partitioned into two subsets, the subset Mi+ with the documents containing ti , and the subset Mi− with the documents without ti . This procedure is recursively applied to Mi+ and Mi− . It stops if all documents in a subset belong to the same class Lc . It generates a tree of rules with an assignment to actual classes in the leaves. Decision trees are a standard tool in data mining (Quinlan 1986; Mitchell 1997). They are fast and scalable both in the number of variables and the size of the training set. For text mining, however, they have the drawbac...
View Full Document

This note was uploaded on 06/19/2011 for the course IT 2258 taught by Professor Aymenali during the Summer '11 term at Abu Dhabi University.

Ask a homework question - tutors are online