This preview shows page 1. Sign up to view the full content.
Unformatted text preview: inal data preparation step before building models
and the step where the most “art” comes in. There are four main parts to this step:
First you want to select the variables on which to build the model. Ideally, you would take all
the variables you have, feed them to the data mining tool and let it find those which are the
best predictors. In practice, this doesn’t work very well. One reason is that the time it takes to
build a model increases with the number of variables. Another reason is that blindly including
extraneous columns can lead to models with less rather than more predictive power. 7 The next step is to construct new predictors derived from the raw data. For example,
forecasting credit risk using a debt-to-income ratio rather than just debt and income as
predictor variables may yield more accurate results that are also easier to understand.
Next you may decide to select a subset or sample of your data on which to build models. If
you have a lot of data, however, using all your data may take too long or require buying a
bigger computer than you would like. Working with a properly selected random sample
usually results in no loss of information for most CRM problems. Given a choice of either
investigating a few models built on all the data or investigating more models built on a
sample, the latter approach will usually help you develop a more accurate and robust model.
of the problem.
Last, you will need to transform variables in accordance with the requirements of the
algorithm you choose for building your model.
5. Data mining model building. The most important thing to remember about model building
is that it is an iterative process. You will need to explore alternative models to find the one
that is most useful in solving your business problem. What you learn in searching for a good
model may lead you to go back and make some changes to the data you are using or even
modify your problem statement.
Most CRM applications are based on a protocol called supervised learning. You start with
customer information for which the desired outcome is already known. For example, you may
have historical data because you previously mailed to a list very similar to the one you are
using. Or you may have to conduct a test mailing to determine how people will respond to an
offer. You then split this data into two groups. On the first group you train or estimate your
model. You then test it on the remainder of the data. A model is built when the cycle of
training and testing is completed.
6. Evaluate your results
Perhaps the most overrated metric for evaluating your results is accuracy. Suppose you have
an offer to which only 1% of the people will respond. A model that predicts “nobody will
respond” is 99% accurate and 100% useless. Another measure that is frequently used is lift.
Lift measures the improvement achieved by a predictive model. However, lift does not take
into account cost and revenue, so it is often preferable to look at profit or ROI. Depending on
whether you choose to maximize lift, profit, or ROI, you will choose a different percentage of
View Full Document
- Spring '10