The Validation data is scored in detail as shown below. Scoring means using
the fitted model to classify each record of the data.
That was just one of the many techniques in XLMiner Classification Tree.
A typical Data Mining exercise involves several alte
Categorical prediction models produce a prediction score for each
example. The higher the score, the more likely the example is in a
One problem is deciding the threshold (classification score cut-off):
only examples with scores above t
"Lift ratio" indicates how much more likely we are to encounter a
cookbook transaction if we consider just those transactions where an
Italian cookbook and a Youthbook is purchased, as compared to the
entire population of transactions - it's the confiden
Lets get going with XLMiner.
Notice that XLMiner is as easy to use as Excel!
All we need to do is use the friendly menus. We follow just three simple steps
to fit a model and see the outputs!
Well create two partitions by choosing the records randomly.
Robustness of the algorithm: non-volatile algorithms which produce
consistent error rates - i.e. low standard deviation of error estimates - over
different training and test samples are better
Overall error rates (accuracy)
Class-specific error rates (acc
Validating models on test data is one way of assessing the expected
performance of models.
However, the real proof is in the pudding: tracking the actual
performance of the model in a real campaign!
You need to keep statistics of how many responses you a
Gains Tables and Charts give us an idea of the profitability of a
model, in terms of likely costs and revenues from using it.
Furthermore, Gains Tables and Charts allow us to easily see what
number or percentage of prospects we ought to mail in order to
Cross-validation is similar to jack-knifing, but intended for large data
sets: In cross-validation you leave out a group of observations each
time, instead of just a single observation as in jack-knifing.
Cross-validation is used for assessing the robust
Cost-based predictive models incorporate costs in order to give
predictive focus. The goal is simply to reduce the errors that matter
! To produce cost-based models we can use:
stratified sampling: create a disproportionate stratified
sample by sampling