36 related readings data mining 99 technology report

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: .................................................. 36 RELATED READINGS Data Mining ’99: Technology Report, Two Crows Corporation, 1999 M. Berry and G. Linoff, Data Mining Techniques, John Wiley, 1997 William S. Cleveland, The Elements of Graphing Data, revised, Hobart Press, 1994 Howard Wainer, Visual Revelations, Copernicus, 1997 R. Kennedy, Lee, Reed, and Van Roy, Solving Pattern Recognition Problems, Prentice-Hall, 1998 U. Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy, Advances in Knowledge Discovery and Data Mining, MIT Press, 1996 Dorian Pyle, Data Preparation for Data Mining, Morgan Kaufmann, 1999 C. Westphal and T. Blaxton, Data Mining Solutions, John Wiley, 1998 Vasant Dhar and Roger Stein, Seven Methods for Transforming Corporate Data into Business Intelligence, Prentice Hall 1997 Brieman, Freidman, Olshen, and Stone, Classification and Regression Trees, Wadsworth, 1984 J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 1992 Introduction to Data Mining and Knowledge Discovery INTRODUCTION Data mining: In brief Databases today can range in size into the terabytes — more than 1,000,000,000,000 bytes of data. Within these masses of data lies hidden information of strategic importance. But when there are so many trees, how do you draw meaningful conclusions about the forest? The newest answer is data mining, which is being used both to increase revenues and to reduce costs. The potential returns are enormous. Innovative organizations worldwide are already using data mining to locate and appeal to higher-value customers, to reconfigure their product offerings to increase sales, and to minimize losses due to error or fraud. Data mining is a process that uses a variety of data analysis tools to discover patterns and relationships in data that may be used to make valid predictions. The first and simplest analytical step in data mining is to describe the data — summarize its statistical attributes (such as means and standard deviations), visually review it using charts and graphs, and look for potentially meaningful links among variables (such as values that often occur together). As emphasized in the section on THE DATA MINING PROCESS, collecting, exploring and selecting the right data are critically important. But data description alone cannot provide an action plan. You must build a predictive model based on patterns determined from known results, then test that model on results outside the original sample. A good model should never be confused with reality (you know a road map isn’t a perfect representation of the actual road), but it can be a useful guide to understanding your business. The final step is to empirically verify the model. For example, from a database of customers who have already responded to a particular offer, you’ve built a model predicting which prospects are likeliest to respond to the same offer. Can you rely on this prediction? Send a mailing to a portion of the new list and see what results you get. Data mining: What it can’t do Data mining is a tool, not a magic wand. It won’t sit in your database watching what happens and send you e-mail to get your attention when it sees an interesting pattern. It doesn’t eliminate the need to know your business, to understand your data, or to understand analytical methods. Data mining assists business analysts with finding patterns and relationships in the data — it does not tell you the value of the patterns to the organization. Furthermore, the patterns uncovered by data mining must be verified in the real world. Remember that the predictive relationships found via data mining are not necessarily causes of an action or behavior. For example, data mining might determine that males with incomes between $50,000 and $65,000 who subscribe to certain magazines are likely purchasers of a product you want to sell. While you can take advantage of this pattern, say by aiming your marketing at people who fit the pattern, you should not assume that any of these factors cause them to buy your product. © 1999 Two Crows Corporation 1 To ensure meaningful results, it’s vital that you understand your data. The quality of your output will often be sensitive to outliers (data values that are very different from the typical values in your database), irrelevant columns or columns that vary together (such as age and date of birth), the way you encode your data, and the data you leave in and the data you exclude. Algorithms vary in their sensitivity to such data issues, but it is unwise to depend on a data mining product to make all the right decisions on its own. Data mining will not automatically discover solutions without guidance. Rather than setting the vague goal, “Help improve the response to my direct mail solicitation,” you might use data mining to find the characteristics of people who (1) respond to your solicitation, or (2) respond AND make a large purchase. The patterns data mining finds for those two goals may be very different. Although a good data mining tool shelters you from the intricacies of statistical techniques, it requires you to understand th...
View Full Document

This note was uploaded on 01/19/2014 for the course STATS 315B taught by Professor Friedman during the Winter '08 term at Stanford.

Ask a homework question - tutors are online