Ideally you would take all the variables you have

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: inal data preparation step before building models and the step where the most “art” comes in. There are four main parts to this step: First you want to select the variables on which to build the model. Ideally, you would take all the variables you have, feed them to the data mining tool and let it find those which are the best predictors. In practice, this doesn’t work very well. One reason is that the time it takes to build a model increases with the number of variables. Another reason is that blindly including extraneous columns can lead to models with less rather than more predictive power. 7 The next step is to construct new predictors derived from the raw data. For example, forecasting credit risk using a debt-to-income ratio rather than just debt and income as predictor variables may yield more accurate results that are also easier to understand. Next you may decide to select a subset or sample of your data on which to build models. If you have a lot of data, however, using all your data may take too long or require buying a bigger computer than you would like. Working with a properly selected random sample usually results in no loss of information for most CRM problems. Given a choice of either investigating a few models built on all the data or investigating more models built on a sample, the latter approach will usually help you develop a more accurate and robust model. of the problem. Last, you will need to transform variables in accordance with the requirements of the algorithm you choose for building your model. 5. Data mining model building. The most important thing to remember about model building is that it is an iterative process. You will need to explore alternative models to find the one that is most useful in solving your business problem. What you learn in searching for a good model may lead you to go back and make some changes to the data you are using or even modify your problem statement. Most CRM applications are based on a protocol called supervised learning. You start with customer information for which the desired outcome is already known. For example, you may have historical data because you previously mailed to a list very similar to the one you are using. Or you may have to conduct a test mailing to determine how people will respond to an offer. You then split this data into two groups. On the first group you train or estimate your model. You then test it on the remainder of the data. A model is built when the cycle of training and testing is completed. 6. Evaluate your results Perhaps the most overrated metric for evaluating your results is accuracy. Suppose you have an offer to which only 1% of the people will respond. A model that predicts “nobody will respond” is 99% accurate and 100% useless. Another measure that is frequently used is lift. Lift measures the improvement achieved by a predictive model. However, lift does not take into account cost and revenue, so it is often preferable to look at profit or ROI. Depending on whether you choose to maximize lift, profit, or ROI, you will choose a different percentage of yo...
View Full Document

This note was uploaded on 11/25/2010 for the course CENG ceng taught by Professor Ceng during the Spring '10 term at Universidad Europea de Madrid.

Ask a homework question - tutors are online