DataMining

# note that beyond the 8th decile 80 the roi of the

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: rate when a random 10% of the population is treated, the response rate of a scored 10% of the population is over 30%. The lift is 3 in this case. 100 90 Per cent response 80 70 60 Random Scored 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 Decile Figure 11. Lift chart. Another important component of interpretation is to assess the value of the model. Again, a pattern may be interesting, but acting on it may cost more than the revenue or savings it generates. The ROI (Return on Investment) chart in Figure 12 is a good example of how attaching values to a response and costs to a program can provide additional guidance to decision making. (Here, ROI is defined as ratio of profit to cost.) Note that beyond the 8th decile (80%), the ROI of the scored model becomes negative. It is at a maximum at the 2nd decile (20%). 100.00% 80.00% ROI 60.00% 40.00% Random 20.00% Scored 0.00% -20.00% -40.00% Decile Figure 12. ROI chart. © 1999 Two Crows Corporation 31 Alternatively you may want to look at the profitability of a model (profit = revenue minus cost), as shown in the following chart (Figure 13). 300 250 200 150 100 Profit Random 50 Profit Scored (50) 1 2 3 4 5 6 7 8 9 10 (100) (150) Figure 13. Profit Chart Note that in the example we’ve been using, the maximum lift (for the 10 deciles) was achieved at the 1st decile (10%), the maximum ROI at the 2nd decile (20%), and the maximum profit at the 3rd and 4th deciles. Ideally, you can act on the results of a model in a profitable way. But remember, there may be no practical means to take advantage of the knowledge gained. b. External validation. As pointed out above, no matter how good the accuracy of a model is estimated to be, there is no guarantee that it reflects the real world. A valid model is not necessarily a correct model. One of the main reasons for this problem is that there are always assumptions implicit in the model. For example, the inflation rate may not have been included as a variable in a model that predicts the propensity of an individual to buy, but a jump in inflation from 3% to 17% will certainly affect people’s behavior. Also, the data used to build the model may fail to match the real world in some unknown way, leading to an incorrect model. Therefore it is important to test a model in the real world. If a model is used to select a subset of a mailing list, do a test mailing to verify the model. If a model is used to predict credit risk, try the model on a small set of applicants before full deployment. The higher the risk associated with an incorrect model, the more important it is to construct an experiment to check the model results. 32 © 1999 Two Crows Corporation 7. Deploy the model and results. Once a data mining model is built and validated, it can be used in one of two main ways. The first way is for an analyst to recommend actions based on simply viewing the model and its results. For example, the analyst may look at the clusters the model has identified, the rules that define the model, or the lift and ROI charts that depict the effect of the model. The second way is to apply the model to different data sets. The model could be used to flag records based on their classification, or assign a score such as the probability of an action (e.g., responding to a direct mail solicitation). Or the model can select some records from the database and subject these to further analyses with an OLAP tool. Often the models are part of a business process such as risk analysis, credit authorization or fraud detection. In these cases the model is incorporated into an application. For instance, a predictive model may be integrated into a mortgage loan application to aid a loan officer in evaluating the applicant. Or a model might be embedded in an application such as an inventory ordering system that automatically generates an order when the forecast inventory levels drop below a threshold. The data mining model is often applied to one event or transaction at a time, such as scoring a loan application for risk. The amount of time to process each new transaction, and the rate at which new transactions arrive, will determine whether a parallelized algorithm is needed. Thus, while loan applications can easily be evaluated on modest-sized computers, monitoring credit card transactions or cellular telephone calls for fraud would require a parallel system to deal with the high transaction rate. When delivering a complex application, data mining is often only a small, albeit critical, part of the final product. For example, knowledge discovered through data mining may be combined with the knowledge of domain experts and applied to data in the database and incoming transactions. In a fraud detection system, known patterns of fraud may be combined with discovered patterns. When suspected cases of fraud are passed on to fraud investigators for evaluation, the investigators may need to access database records about other claims filed by the claimant as well as other claims in which the same doctors...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online