Decision Trees
Non-Buyer
Residence
<>NY
=NY
Age
< 35
>=35
Buyer
Non-Buyer
Trees and Rules
Goal: Classify or predict an outcome based on a set of predictors
(independent variables)
Partitions the predictor space into multiple simple regions
The output is a
Logistic regression
and some linear regression basics
Binary dependent variable: Odds
p: probability of an event occurring
1-p: probability of the event not occurring
Odds = p/(1-p)
Odds of winning is 1:3 => odds of 1 win in 3 losses
Odds = 0.25/(1-0.25
Data mining - Overview
.discovering meaningful patterns from large quantities of data
Process of maximizing the value of business data
Models
Input
x
MODEL
Output
y
y = f (x)
What is a model?
y = w1x1 + w2 x2 + + wk xk
IF [(age < 35) and ( $30K< income< $
Performance assessment
Why evaluate?
Multiple methods are available to classify or predict
For each method, multiple choices are available for
settings
To choose best model, need to assess each
models performance
Does performance measure used match the
Data mining - Intro
Data mining
Extracting useful information from large datasets
Process of exploration and analysis (by automatic and semi-automatic means) of
large quantities of data in order to discover meaningful patterns and rules
Process of discove
Data
- data exploration, transformations
- data reduction - PCA
Steps in Data Mining
1.
2.
3.
4.
5.
6.
7.
8.
9.
Define/understand purpose
Obtain data (may involve random sampling)
Explore, clean, pre-process data
Reduce the data; if supervised DM, partiti
Random Forests
Bias, variance
bias2
variance
error
Bias: ability of a technique to
accurately model the problem
Variance: different accuracies
with different training data
Bias-variance tradeoff
(lower variance models have
high bias, and vice versa)
http:
Evaluating Performance ROC, AUC
Evaluating classifiers
Model can output
discrete class value
continuous value (estimate of class membership probability)
thresholds applied to obtain class value
Actual class
p
Predicted p
class
n
Totals
True
positives
n
Boosted trees
Boosting: Foundations and Algorithms by Robert Schapire, Yoav Freund
(https:/mitpress.mit.edu/sites/default/files/titles/content/boosting_foundations_algorithms/toc.html)
A Short Introduction to Boosting by Y Freund, R Schapire
(http:/www.si