This preview shows page 1. Sign up to view the full content.
Unformatted text preview: f the predictors as long as there are not discontinuous steps. For example, suppose that
payment delinquency is a rather complicated function of income where the probability of delinquency
initially declines as income increases. It then turns around and starts to increase again for moderate
income, finally peaking before coming down again for higher income card-holders. In such a case, a
linear model may fail to see any relationship between income and delinquency due to the non-linear
behavior. GAM, using computer power in place of theory or knowledge of the functional form, will
produce a smooth curve, summarizing the relationship as described above. The most common
estimation procedure is backfitting. Instead of estimating large numbers of parameters as neural nets
do, GAM goes one step further and estimates a value of the output for each value of the input — one
point, one estimate. As with the neural net, GAM generates a curve automatically, choosing the
amount of complexity based on the data.
If you were to build a model using one sample of data, and then build a new model using the same
algorithm but on a different sample, you might get a different result. After validating the two models,
you could choose the one that best met your objectives. Even better results might be achieved if you
built several models and let them vote, making a prediction based on what the majority recommended.
Of course, any interpretability of the prediction would be lost, but the improved results might be
This is exactly the approach taken by boosting, a technique first published by Freund and Schapire in
1996. Basically, boosting takes multiple random samples from the data and builds a classification
model for each. The training set is changed based on the result of the previous models. The final
classification is the class assigned most often by the models. The exact algorithms for boosting have
evolved from the original, but the underlying idea is the same.
Boosting has become a very popular addition to data mining packages. 20 © 1999 Two Crows Corporation Genetic algorithms
Genetic algorithms are not used to find patterns per se, but rather to guide the learning process of data
mining algorithms such as neural nets. Essentially, genetic algorithms act as a method for performing
a guided search for good models in the solution space.
They are called genetic algorithms because they loosely follow the pattern of biological evolution in
which the members of one generation (of models) compete to pass on their characteristics to the next
generation (of models), until the best (model) is found. The information to be passed on is contained
in “chromosomes,” which contain the parameters for building the model.
For example, in building a neural net, genetic algorithms can replace backpropagation as a way to
adjust the weights. The chromosome in this case would contain the weights. Alternatively, genetic
algorithms might be used to find the best architecture, and the chromosomes would contain the
number of hidden layers and the number of nodes in each layer.
While genetic algorithms are an interesting approach to optimizing models, they add a lot of
computational overhead. © 1999 Two Crows Corporation 21 THE DATA MINING PROCESS Process Models
Recognizing that a systematic approach is essential to successful data mining, many vendor and
consulting organizations have specified a process model designed to guide the user (especially
someone new to building predictive models) through a sequence of steps that will lead to good results.
SPSS uses the 5A’s — Assess, Access, Analyze, Act and Automate — and SAS uses SEMMA —
Sample, Explore, Modify, Model, Assess.
Recently, a consortium of vendors and users consisting of NCR Systems Engineering Copenhagen
(Denmark), Daimler-Benz AG (Germany), SPSS/Integral Solutions Ltd. (England) and OHRA
Verzekeringen en Bank Groep B.V (The Netherlands) has been developing a specification called
CRISP-DM — Cross-Industry Standard Process for Data Mining. CRISP-DM is similar to process
models from other companies including the one from Two Crows Corporation. As of September 1999,
CRISP-DM is a work in progress. It is a good start in helping people to understand the necessary
steps in successful data mining.
The Two Crows Process Model
The Two Crows data mining process model described below is derived from the Two Crows process
model discussed in the previous edition of this document, and also takes advantage of some insights
Keep in mind that while the steps appear in a list, the data mining process is not linear — you will
inevitably need to loop back to previous steps. For example, what you learn in the “explore data” step
may require you to add new data to the data mining database. The initial models you build may
provide insights that lead you to create new variables.
The basic steps of data mining for knowledge discovery are:
1. Define business problem
2. Build data mining database
3. Explore data
4. Prepare data for modeling
5. Build model
6. Evaluate model
7. Deploy model and res...
View Full Document
This note was uploaded on 01/19/2014 for the course STATS 315B taught by Professor Friedman during the Winter '08 term at Stanford.
- Winter '08