Finally a model is built using all the data the mean

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: averaged to give a better estimate of the true accuracy of the model built on all the data. 28 © 1999 Two Crows Corporation Typically, the more general n-fold cross validation is used. In this method, the data is randomly divided into n disjoint groups. For example, suppose the data is divided into ten groups. The first group is set aside for testing and the other nine are lumped together for model building. The model built on the 90% group is then used to predict the group that was set aside. This process is repeated a total of 10 times as each group in turn is set aside, the model is built on the remaining 90% of the data, and then that model is used to predict the set-aside group. Finally, a model is built using all the data. The mean of the 10 independent error rate predictions is used as the error rate for this last model. Bootstrapping is another technique for estimating the error of a model; it is primarily used with very small data sets. As in cross validation, the model is built on the entire dataset. Then numerous data sets called bootstrap samples are created by sampling from the original data set. After each case is sampled, it is replaced and a case is selected again until the entire bootstrap sample is created. Note that records may occur more than once in the data sets thus created. A model is built on this data set, and its error rate is calculated. This is called the resubstitution error. Many bootstrap samples (sometimes over 1,000) are created. The final error estimate for the model built on the whole data set is calculated by taking the average of the estimates from each of the bootstrap samples. Based upon the results of your model building, you may want to build another model using the same technique but different parameters, or perhaps try other algorithms or tools. For example, another approach may increase your accuracy. No tool or technique is perfect for all data, and it is difficult if not impossible to be sure before you start which technique will work the best. It is quite common to build numerous models before finding a satisfactory one. 6. Evaluation and interpretation. a. Model Validation. After building a model, you must evaluate its results and interpret their significance. Remember that the accuracy rate found during testing applies only to the data on which the model was built. In practice, the accuracy may vary if the data to which the model is applied differs in important and unknowable ways from the original data. More importantly, accuracy by itself is not necessarily the right metric for selecting the best model. You need to know more about the type of errors and the costs associated with them. Confusion matrices. For classification problems, a confusion matrix is a very useful tool for understanding results. A confusion matrix (Figure 9) shows the counts of the actual versus predicted class values. It shows not only how well the model predicts, but also presents the details needed to see exactly where things may have gone wrong. The following table is a sample confusion matrix. The columns show the actual classes, and the rows show the predicted classes. Therefore the diagonal shows all the correct predictions. In the confusion matrix, you can see that our model predicted 38 of the 46 Class B’s correctly, but misclassified 8 of them: two as Class A and six as Class C. This is much more informative than simply telling us an overall accuracy rate of 82% (123 correct classifications out of 150 cases). © 1999 Two Crows Corporation 29 Prediction Class A Class B Class C Class A 45 10 4 Actual Class B 2 38 6 Class C 3 2 40 Figure 9. Confusion matrix. In particular, if there are different costs associated with different errors, a model with a lower overall accuracy may be preferable to one with higher accuracy but a greater cost to the organization due to the types of errors it makes. For example, suppose in the above confusion matrix each correct answer had a value of $10 and each incorrect answer for class A had a cost of $5, for class B a cost of $10, and for class C a cost of $20. Then the net value of the matrix would be : (123 * $10) – (5 * $5) – (12 * $10) – (10 * $20) = $885. But consider the following confusion matrix (Figure 10). The accuracy has dropped to 79% (118/150). However when we apply the costs from above to this confusion matrix the net value is: (118 * $10) – (22 * $5) – (7 * $10) – (3 * $20) = $940. Prediction Class A Class B Class C Class A 40 6 2 Actual Class B 12 38 1 Class C 10 1 40 Figure 10. Another confusion matrix. Thus, if you wanted to maximize the value of the model, you would be better off choosing the less accurate model that has a higher net value. 30 © 1999 Two Crows Corporation The lift (gain) chart (Figure 11) is also a big help in evaluating the usefulness of a model. It shows how responses (e.g., to a direct mail solicitation or a surgical treatment) are changed by applying the model. This change ratio is called the lift. For example, instead of a 10% response...
View Full Document

This note was uploaded on 01/19/2014 for the course STATS 315B taught by Professor Friedman during the Winter '08 term at Stanford.

Ask a homework question - tutors are online