This preview shows page 1. Sign up to view the full content.
Unformatted text preview: rate when a random 10% of the population is treated, the response rate of a scored
10% of the population is over 30%. The lift is 3 in this case.
Per cent response 80
1 2 3 4 5 6 7 8 9 10 Decile Figure 11. Lift chart. Another important component of interpretation is to assess the value of the model. Again, a
pattern may be interesting, but acting on it may cost more than the revenue or savings it
generates. The ROI (Return on Investment) chart in Figure 12 is a good example of how
attaching values to a response and costs to a program can provide additional guidance to
decision making. (Here, ROI is defined as ratio of profit to cost.) Note that beyond the 8th
decile (80%), the ROI of the scored model becomes negative. It is at a maximum at the 2nd
decile (20%). 100.00%
80.00% ROI 60.00%
40.00% Random 20.00% Scored 0.00%
Decile Figure 12. ROI chart. © 1999 Two Crows Corporation 31 Alternatively you may want to look at the profitability of a model (profit = revenue minus
cost), as shown in the following chart (Figure 13). 300
100 Profit Random 50 Profit Scored (50) 1 2 3 4 5 6 7 8 9 10 (100)
(150) Figure 13. Profit Chart Note that in the example we’ve been using, the maximum lift (for the 10 deciles) was
achieved at the 1st decile (10%), the maximum ROI at the 2nd decile (20%), and the
maximum profit at the 3rd and 4th deciles.
Ideally, you can act on the results of a model in a profitable way. But remember, there may be
no practical means to take advantage of the knowledge gained.
b. External validation. As pointed out above, no matter how good the accuracy of a model is
estimated to be, there is no guarantee that it reflects the real world. A valid model is not
necessarily a correct model. One of the main reasons for this problem is that there are always
assumptions implicit in the model. For example, the inflation rate may not have been included
as a variable in a model that predicts the propensity of an individual to buy, but a jump in
inflation from 3% to 17% will certainly affect people’s behavior. Also, the data used to build
the model may fail to match the real world in some unknown way, leading to an incorrect
Therefore it is important to test a model in the real world. If a model is used to select a subset
of a mailing list, do a test mailing to verify the model. If a model is used to predict credit risk,
try the model on a small set of applicants before full deployment. The higher the risk
associated with an incorrect model, the more important it is to construct an experiment to
check the model results. 32 © 1999 Two Crows Corporation 7. Deploy the model and results. Once a data mining model is built and validated, it can be used in
one of two main ways. The first way is for an analyst to recommend actions based on simply
viewing the model and its results. For example, the analyst may look at the clusters the model has
identified, the rules that define the model, or the lift and ROI charts that depict the effect of the
The second way is to apply the model to different data sets. The model could be used to flag
records based on their classification, or assign a score such as the probability of an action (e.g.,
responding to a direct mail solicitation). Or the model can select some records from the database
and subject these to further analyses with an OLAP tool.
Often the models are part of a business process such as risk analysis, credit authorization or fraud
detection. In these cases the model is incorporated into an application. For instance, a predictive
model may be integrated into a mortgage loan application to aid a loan officer in evaluating the
applicant. Or a model might be embedded in an application such as an inventory ordering system
that automatically generates an order when the forecast inventory levels drop below a threshold.
The data mining model is often applied to one event or transaction at a time, such as scoring a
loan application for risk. The amount of time to process each new transaction, and the rate at
which new transactions arrive, will determine whether a parallelized algorithm is needed. Thus,
while loan applications can easily be evaluated on modest-sized computers, monitoring credit
card transactions or cellular telephone calls for fraud would require a parallel system to deal with
the high transaction rate.
When delivering a complex application, data mining is often only a small, albeit critical, part of
the final product. For example, knowledge discovered through data mining may be combined
with the knowledge of domain experts and applied to data in the database and incoming
transactions. In a fraud detection system, known patterns of fraud may be combined with
discovered patterns. When suspected cases of fraud are passed on to fraud investigators for
evaluation, the investigators may need to access database records about other claims filed by the
claimant as well as other claims in which the same doctors...
View Full Document
- Winter '08