AUC = 0.97

Evaluating probability models
Log Likelihood
Logarithm of the product of the probability the model assigne
d to each example
Log likelihood the model
assigns to the test data,
model is better as we get
closer to 0
Log likelihood rescaled by the number of data points
to give us a rough average surprise per data point.

Evaluating probability models
Log
Likelihood
The spam model assigns a log likelihood of -134.9478, which is
much better than the null model’s -306.8952.

Evaluating probability models
Deviance
The deviance is defined as -2*(logLikelihood-S), where S is a
technical constant called “the log likelihood of the saturated
model.”
The lower the residual deviance, the better the model.
We’re most concerned with differences of deviance, such as t
he difference between the null deviance and the model devia
nce
In our case, this difference is -2*(-306.8952-S) - -2*(-134.9
478- S)=344.9.

Evaluating probability models
Akaike information criterion (AIC)
AIC = deviance + 2*numberOfParameters
AIC is deviance penalized for model complexity
Useful for comparing models with different measur
es of complexity and variables with differing numbe
r of levels.

Evaluating probability models
Entropy
Technical measure of information or surprise, and is measur
ed in a unit called
bits
.
Conditional entropy is a measure that gives an indication of h
ow good the prediction is on different categories, tempered b
y how often it predicts different categories.
Initial entropy is
0.97 bits per
example, a lot of
surprise. The
conditional entropy
is only 0.39 bits
per example.

Evaluating ranking models
Ranking models
Given a set of examples, sort the rows or assign ra
nks to the rows.
Often trained by converting groups of examples int
o many pair-wise decision.
Evaluation method
Method used for evaluating classifiers
Spearman’s rank correlation coefficient
Data mining concepts of lift

Evaluating clustering models
Hard to evaluate because it is unsupervised
Create a 2-dimensional cluster

Evaluating clustering models
Distance

Evaluating clustering models
Distance metrics are good for checking the pe
rformance of clustering, but not always good f
or business need.
Treat Cluster as classification or scores for m
odel evaluation.

Validating Models

Overfitting
An overfit model looks great on training data and perform poorly on new
data.
Generation error is significantly greater than training error
Avoid it by preferring simpler model

KDD Cup Example
The dataset has 230 facts about 50,000 credit card
accounts and its about customer relationship manage
ment.
From these features, the goal was to predict account
cancellation (called
churn
), the innate tendency to use
new products and services (called
appetency
), and wil
lingness to respond favorably to marketing pitches (ca
lled
upselling
).

KDD Cup Example
Build single-variable models
Use only one variable at a time before scal
e it into general modeling.

#### You've reached the end of your free preview.

Want to read all 52 pages?

You've reached the end of this preview.

- Fall '17
- Haiyuan wang
- Computer Science