Such charts are easy to use and understand not

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: and lawyers were involved. Model monitoring. You must, of course, measure how well your model has worked after you use it. However, even when you think you’re finished because your model works well, you must continually monitor the performance of the model. Over time, all systems evolve. Salespeople know that purchasing patterns change over time. External variables such as inflation rate may change enough to alter the way people behave. Thus, from time to time the model will have to be retested, retrained and possibly completely rebuilt. Charts of the residual differences between forecasted and observed values are an excellent way to monitor model results. Such charts are easy to use and understand, not computationally intensive, and could be built into the software that implements the model. Thus, the system could monitor itself. © 1999 Two Crows Corporation 33 SELECTING DATA MINING PRODUCTS Categories In evaluating data mining tools you must look at a whole constellation of features, described below. You cannot put data mining tools into simple categories such as “high-end” versus “low-end” because the products are too rich in functionality to divide along just one dimension. There are three main types of data mining products. First are tools that are analysis aids for OLAP. They help OLAP users identify the most important dimensions and segments on which they should focus attention. Leading tools in this category include Business Objects Business Miner and Cognos Scenario. The next category includes the “pure” data mining products. These are horizontal tools aimed at data mining analysts concerned with solving a broad range of problems. Leading tools in this category include (in alphabetical order) IBM Intelligent Miner, Oracle Darwin, SAS Enterprise Miner, SGI MineSet, and SPSS Clementine. The last category is analytic applications which implement specific business processes for which data mining is an integral part. For example, while you can use a horizontal data mining tool as part of the solution of many customer relationship management problems, you can also buy customized packages with the data mining imbedded. However, even packaged solutions require you to build and tune models that match your data. In some cases, the package requires a complete model development phase that can take months. The following discussion of product selection applies both to horizontal tools and to the data mining component of analytic applications. But no matter how comprehensive the list of capabilities and features you develop for describing a data mining product, nothing substitutes for actual hands-on experience. While feature checklists are an essential part of the purchase decision, they can only rule out products that fall short of your requirements. Actually using a product in a pilot project is necessary to determine if it is the best match for your problem and your organization. Basic capabilities Depending on your particular circumstances — system architecture, staff resources, database size, problem complexity — some data mining products will be better suited than others to meet your needs. Evaluating a data mining product involves learning about its capabilities in a number of key areas (below) that may not be addressed in standard marketing materials. The Two Crows publication Data Mining ’99: Technology Report contains the responses of 24 vendors to detailed questionnaires that cover these topics in depth for 26 leading products. System architecture. Is it designed to work on a stand-alone desktop machine or a client-server architecture? But note that the size of the machine on which a product runs is not a reliable indicator of the complexity of problems it can address. Very sophisticated products that can solve complex problems and require skilled users may run on a desktop computer or on a large MPP system in a client-server architecture. Data preparation. Data preparation is by far the most time-consuming aspect of data mining. Everything a tool can do to ease this process will greatly expedite model development. Some of the functions that a product may provide include: 34 © 1999 Two Crows Corporation • Data cleanup, such as handling missing data or identifying integrity violations. • Data description, such as row and value counts or distribution of values. • Data transformations, such as adding new columns, performing calculations on existing columns, grouping continuous variables into ranges, or exploding categorical variables into dichotomous variables. • Data sampling for model building or for the creation of training and validation data sets. • Selecting predictors from the space of variables, and identifying collinear columns. Data access. Some data mining tools require data to be extracted from target databases into an internal file format, whereas others will go directly into the native database. A data mining tool will benefit from being able to directly access the data mart DBMS using the native SQL of the database server, in order to maximize performance and take advantage of individual serv...
View Full Document

Ask a homework question - tutors are online