Unformatted text preview: oration Some classification trees were designed for and therefore work best when the predictor variables are
also categorical. Continuous predictors can frequently be used even in these cases by converting the
continuous variable to a set of ranges (binning). Some decision trees do not support continuous
response variables (i.e., will not build regression trees), in which case the response variables in the
training set must also be binned to output classes.
Multivariate Adaptive Regression Splines (MARS)
In the mid-1980s one of the inventors of CART, Jerome H. Friedman, developed a method designed
to address its shortcomings.
The main disadvantages he wanted to eliminate were:
• Discontinuous predictions (hard splits).
• Dependence of all splits on previous ones.
• Reduced interpretability due to interactions, especially high-order interactions.
To this end he developed the MARS algorithm. The basic idea of MARS is quite simple, while the
algorithm itself is rather involved. Very briefly, the CART disadvantages are taken care of by:
• Replacing the discontinuous branching at a node with a continuous transition modeled by a
pair of straight lines. At the end of the model-building process, the straight lines at each
node are replaced with a very smooth function called a spline.
• Not requiring that new splits be dependent on previous splits.
Unfortunately, this means MARS loses the tree structure of CART and cannot produce rules. On the
other hand, MARS automatically finds and lists the most important predictor variables as well as the
interactions among predictor variables. MARS also plots the dependence of the response on each
predictor. The result is an automatic non-linear step-wise regression tool.
MARS, like most neural net and decision tree algorithms, has a tendency to overfit the training data.
This can be addressed in two ways. First, manual cross validation can be performed and the algorithm
tuned to provide good prediction on the test set. Second, there are various tuning parameters in the
algorithm itself that can guide internal cross validation.
Rule induction is a method for deriving a set of rules to classify cases. Although decision trees can
produce a set of rules, rule induction methods generate a set of independent rules which do not
necessarily (and are unlikely to) form a tree. Because the rule inducer is not forcing splits at each
level, and can look ahead, it may be able to find different and sometimes better patterns for
classification. Unlike trees, the rules generated may not cover all possible situations. Also unlike
trees, rules may sometimes conflict in their predictions, in which case it is necessary to choose which
rule to follow. One common method to resolve conflicts is to assign a confidence to rules and use the
one in which you are most confident. Alternatively, if more than two rules conflict, you may let them
vote, perhaps weighting their votes by the confidence you have in each rule. © 1999 Two Crows Corporation 17 K-nearest neighbor and memory-based reasoning (MBR)
When trying to solve new problems, people often look at solutions to similar problems that they have
previously solved. K-nearest neighbor (k-NN) is a classification technique that uses a version of this
same method. It decides in which class to place a new case by examining some number — the “k” in
k-nearest neighbor — of the most similar cases or neighbors (Figure 8). It counts the number of cases
for each class, and assigns the new case to the same class to which most of its neighbors belong. X X X
Y X NX
X X Y X
Y Figure 8. K-nearest neighbor. N is a new case. It would be
assigned to the class X because the seven X’s within the ellipse
outnumber the two Y’s. The first thing you must do to apply k-NN is to find a measure of the distance between attributes in
the data and then calculate it. While this is easy for numeric data, categorical variables need special
handling. For example, what is the distance between blue and green? You must then have a way of
summing the distance measures for the attributes. Once you can calculate the distance between cases,
you then select the set of already classified cases to use as the basis for classifying new cases, decide
how large a neighborhood in which to do the comparisons, and also decide how to count the
neighbors themselves (e.g., you might give more weight to nearer neighbors than farther neighbors).
K-NN puts a large computational load on the computer because the calculation time increases as the
factorial of the total number of points. While it’s a rapid process to apply a decision tree or neural net
to a new case, k-NN requires that a new calculation be made for each new case. To speed up k-NN,
frequently all the data is kept in memory. Memory-based reasoning usually refers to a k-NN classifier
kept in memory.
K-NN models are very easy to understand when there are few predictor variables. They are also
useful for building models that in...
View Full Document
- Winter '08
- Data Mining, .........