Unformatted text preview: and lawyers were involved.
Model monitoring. You must, of course, measure how well your model has worked after you use
it. However, even when you think you’re finished because your model works well, you must
continually monitor the performance of the model. Over time, all systems evolve. Salespeople
know that purchasing patterns change over time. External variables such as inflation rate may
change enough to alter the way people behave. Thus, from time to time the model will have to be
retested, retrained and possibly completely rebuilt. Charts of the residual differences between
forecasted and observed values are an excellent way to monitor model results. Such charts are
easy to use and understand, not computationally intensive, and could be built into the software
that implements the model. Thus, the system could monitor itself. © 1999 Two Crows Corporation 33 SELECTING DATA MINING PRODUCTS
In evaluating data mining tools you must look at a whole constellation of features, described below.
You cannot put data mining tools into simple categories such as “high-end” versus “low-end” because
the products are too rich in functionality to divide along just one dimension.
There are three main types of data mining products. First are tools that are analysis aids for OLAP.
They help OLAP users identify the most important dimensions and segments on which they should
focus attention. Leading tools in this category include Business Objects Business Miner and Cognos
The next category includes the “pure” data mining products. These are horizontal tools aimed at data
mining analysts concerned with solving a broad range of problems. Leading tools in this category
include (in alphabetical order) IBM Intelligent Miner, Oracle Darwin, SAS Enterprise Miner, SGI
MineSet, and SPSS Clementine.
The last category is analytic applications which implement specific business processes for which data
mining is an integral part. For example, while you can use a horizontal data mining tool as part of the
solution of many customer relationship management problems, you can also buy customized
packages with the data mining imbedded. However, even packaged solutions require you to build and
tune models that match your data. In some cases, the package requires a complete model development
phase that can take months.
The following discussion of product selection applies both to horizontal tools and to the data mining
component of analytic applications. But no matter how comprehensive the list of capabilities and
features you develop for describing a data mining product, nothing substitutes for actual hands-on
experience. While feature checklists are an essential part of the purchase decision, they can only rule
out products that fall short of your requirements. Actually using a product in a pilot project is
necessary to determine if it is the best match for your problem and your organization.
Depending on your particular circumstances — system architecture, staff resources, database size,
problem complexity — some data mining products will be better suited than others to meet your
needs. Evaluating a data mining product involves learning about its capabilities in a number of key
areas (below) that may not be addressed in standard marketing materials. The Two Crows publication
Data Mining ’99: Technology Report contains the responses of 24 vendors to detailed questionnaires
that cover these topics in depth for 26 leading products.
System architecture. Is it designed to work on a stand-alone desktop machine or a client-server
architecture? But note that the size of the machine on which a product runs is not a reliable
indicator of the complexity of problems it can address. Very sophisticated products that can solve
complex problems and require skilled users may run on a desktop computer or on a large MPP
system in a client-server architecture.
Data preparation. Data preparation is by far the most time-consuming aspect of data mining.
Everything a tool can do to ease this process will greatly expedite model development. Some of
the functions that a product may provide include: 34 © 1999 Two Crows Corporation • Data cleanup, such as handling missing data or identifying integrity violations.
• Data description, such as row and value counts or distribution of values.
• Data transformations, such as adding new columns, performing calculations on existing
columns, grouping continuous variables into ranges, or exploding categorical variables into
• Data sampling for model building or for the creation of training and validation data sets.
• Selecting predictors from the space of variables, and identifying collinear columns.
Data access. Some data mining tools require data to be extracted from target databases into an
internal file format, whereas others will go directly into the native database. A data mining tool
will benefit from being able to directly access the data mart DBMS using the native SQL of the
database server, in order to maximize performance and take advantage of individual serv...
View Full Document
- Winter '08
- Data Mining, .........