Use the count matrix to make decisions
Multiway Split
Twoway split
Continuous Attributes: Computing GINI Index

Use binary decisions based on
one value
o
Several choices for splitting value
o
Number of possible splitting values = Number of distinct values

Each splitting value has a
count matrix
associated with it
o
Class counts in each of the partitions, A < v and A >= v

Simple method to
choose best v
o
For each v, scan the database to gather count matrix and
compute its GINI Index
o
Computationally inefficient
o
Repetition of work
o
BigO n^2

For
efficient computation
: for each attribute,
o
Sort the attribute on values
o
Linearly scan these values, each time updating the count matrix
and computing the GINI index
o
Choose the split position that has the lease GINI index
o
BigO log n
Splitting based on classification error

Measures misclassification error made by a node
o
Maximum 0.5
records are
equally
distributed among all classes
Implying
least interesting
information
o
Minimum 0
all records belong to
one
class
Implying
most interesting
information.
Low degree of freedom and impurity
Pure node or homogeneous node

Classification error at a node t:
Comparing Attribute Selection Measures
1)
Information gain
based towards multivalued attributes
2)
Gain ratio
tens to prefer unbalanced splits in which one partition
is much smaller than the others
3)
Gini index
a.
biased to multivalued attributes
b.
has difficulty when number of classes is large
c.
tends to favor tests that result in equalsized partitions and
purity I both partitions
Stopping Criteria for Tree Induction

stop expanding a node when all the records belong to the same class

the maximum tree depth has been reached

early termination – prepruning
Determine the final tree size

use
minimum description length (MDL)
principle
o
measures use encoding techniques to define the best decision
tree as the one that requires the fewest number of bits to both
encode the tree
encode the exceptions to the tree
o
main idea that the simplest of solutions is preferred
o
has the least bias toward multivalued attributes
o
every model provides a (lossless) encoding of our data
o
the model that gives the shortest encoding (best compression)
of the data is the best
implies regularities in the data

halting growth of the tree when the encoding is minimized
**
Complex models
describe the
data in a
lot of detail
but implies
maximum description length
(
expensive
to describe the model)
**
Simple models
imply
minimum description length
and are
cheap
to
describe but also leading to describing the data being expensive
Extracting Classification Rules from the Tree

represent the knowledge in the form of IFTTHEN rules

one rule is created for each path from the root to a leaf

each attributevalue pair along a path forms a conjunction

the leaf node holds the class prediction

rules are easier for humans to understand
Avoiding Overfitting in Classification

an induced tree may overfit the training data
