This preview shows page 1. Sign up to view the full content.
Unformatted text preview: bastiani (2002).
3.1.4 Decision Trees Decision trees are classiﬁers which consist of a set of rules which are applied
in a sequential way and ﬁnally yield a decision. They can be best explained
by observing the training process, which starts with a comprehensive training
set. It uses a divide and conquer strategy: For a training set M with labelled
documents the word ti is selected, which can predict the class of the documents
in the best way, e.g. by the information gain (8). Then M is partitioned into two
subsets, the subset Mi+ with the documents containing ti , and the subset Mi−
with the documents without ti . This procedure is recursively applied to Mi+
and Mi− . It stops if all documents in a subset belong to the same class Lc . It
generates a tree of rules with an assignment to actual classes in the leaves.
Decision trees are a standard tool in data mining (Quinlan 1986; Mitchell
1997). They are fast and scalable both in the number of variables and the size of
the training set. For text mining, however, they have the drawbac...
View Full Document
This note was uploaded on 06/19/2011 for the course IT 2258 taught by Professor Aymenali during the Summer '11 term at Abu Dhabi University.
- Summer '11