This preview shows page 1. Sign up to view the full content.
Unformatted text preview: averaged to give a better estimate of the true accuracy of the model built on all the data.
28 © 1999 Two Crows Corporation Typically, the more general n-fold cross validation is used. In this method, the data is randomly
divided into n disjoint groups. For example, suppose the data is divided into ten groups. The first
group is set aside for testing and the other nine are lumped together for model building. The
model built on the 90% group is then used to predict the group that was set aside. This process is
repeated a total of 10 times as each group in turn is set aside, the model is built on the remaining
90% of the data, and then that model is used to predict the set-aside group. Finally, a model is
built using all the data. The mean of the 10 independent error rate predictions is used as the error
rate for this last model.
Bootstrapping is another technique for estimating the error of a model; it is primarily used with
very small data sets. As in cross validation, the model is built on the entire dataset. Then
numerous data sets called bootstrap samples are created by sampling from the original data set.
After each case is sampled, it is replaced and a case is selected again until the entire bootstrap
sample is created. Note that records may occur more than once in the data sets thus created. A
model is built on this data set, and its error rate is calculated. This is called the resubstitution
error. Many bootstrap samples (sometimes over 1,000) are created. The final error estimate for
the model built on the whole data set is calculated by taking the average of the estimates from
each of the bootstrap samples.
Based upon the results of your model building, you may want to build another model using the
same technique but different parameters, or perhaps try other algorithms or tools. For example,
another approach may increase your accuracy. No tool or technique is perfect for all data, and it is
difficult if not impossible to be sure before you start which technique will work the best. It is
quite common to build numerous models before finding a satisfactory one.
6. Evaluation and interpretation.
a. Model Validation. After building a model, you must evaluate its results and interpret their
significance. Remember that the accuracy rate found during testing applies only to the data on
which the model was built. In practice, the accuracy may vary if the data to which the model
is applied differs in important and unknowable ways from the original data. More
importantly, accuracy by itself is not necessarily the right metric for selecting the best model.
You need to know more about the type of errors and the costs associated with them.
Confusion matrices. For classification problems, a confusion matrix is a very useful tool for
understanding results. A confusion matrix (Figure 9) shows the counts of the actual versus
predicted class values. It shows not only how well the model predicts, but also presents the
details needed to see exactly where things may have gone wrong. The following table is a
sample confusion matrix. The columns show the actual classes, and the rows show the
predicted classes. Therefore the diagonal shows all the correct predictions. In the confusion
matrix, you can see that our model predicted 38 of the 46 Class B’s correctly, but
misclassified 8 of them: two as Class A and six as Class C. This is much more informative
than simply telling us an overall accuracy rate of 82% (123 correct classifications out of 150
cases). © 1999 Two Crows Corporation 29 Prediction
Class C Class A
6 Class C
40 Figure 9. Confusion matrix. In particular, if there are different costs associated with different errors, a model with a lower
overall accuracy may be preferable to one with higher accuracy but a greater cost to the
organization due to the types of errors it makes. For example, suppose in the above confusion
matrix each correct answer had a value of $10 and each incorrect answer for class A had a
cost of $5, for class B a cost of $10, and for class C a cost of $20. Then the net value of the
matrix would be :
(123 * $10) – (5 * $5) – (12 * $10) – (10 * $20) = $885. But consider the following confusion matrix (Figure 10). The accuracy has dropped to 79%
(118/150). However when we apply the costs from above to this confusion matrix the net
(118 * $10) – (22 * $5) – (7 * $10) – (3 * $20) = $940. Prediction
Class C Class A
1 Class C
40 Figure 10. Another confusion matrix. Thus, if you wanted to maximize the value of the model, you would be better off choosing the
less accurate model that has a higher net value. 30 © 1999 Two Crows Corporation The lift (gain) chart (Figure 11) is also a big help in evaluating the usefulness of a model. It
shows how responses (e.g., to a direct mail solicitation or a surgical treatment) are changed
by applying the model. This change ratio is called the lift. For example, instead of a 10%
View Full Document
This note was uploaded on 01/19/2014 for the course STATS 315B taught by Professor Friedman during the Winter '08 term at Stanford.
- Winter '08