1dmintro - 1 1 An Introduction to Data Mining Kurt...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 1 An Introduction to Data Mining Kurt Thearling, Ph.D. www.thearling.com 2 Outline — Overview of data mining — What is data mining? — Predictive models and data scoring — Real-world issues — Gentle discussion of the core algorithms and processes — Commercial data mining software applications — Who are the players? — Review the leading data mining applications — Presentation & Understanding — Data visualization: More than eye candy — Build trust in analytic results 2 3 Resources — Good overview book: — Data Mining Techniques by Michael Berry and Gordon Linoff — Web: — My web site (recommended books, useful links, white papers, …) > http://www.thearling.com — Knowledge Discovery Nuggets > http://www.kdnuggets.com — DataMine Mailing List — [email protected] — send message “subscribe datamine-l” 4 A Problem... — You are a marketing manager for a brokerage company — Problem: Churn is too high > Turnover (after six month introductory period ends) is 40% — Customers receive incentives (average cost: $160) when account is opened — Giving new incentives to everyone who might leave is very expensive (as well as wasteful) — Bringing back a customer after they leave is both difficult and costly 3 5 — One month before the end of the introductory period is over, predict which customers will leave — If you want to keep a customer that is predicted to churn, offer them something based on their predicted value > The ones that are not predicted to churn need no attention — If you don’t want to keep the customer, do nothing — How can you predict future behavior? — Tarot Cards — Magic 8 Ball … A Solution 6 The Big Picture — Lots of hype & misinformation about data mining out there — Data mining is part of a much larger process — 10% of 10% of 10% of 10% — Accuracy not always the most important measure of data mining — The data itself is critical — Algorithms aren’t as important as some people think — If you can’t understand the patterns discovered with data mining, you are unlikely to act on them (or convince others to act) 4 7 — The automated extraction of predictive information from (large) databases — Two key words: ? Automated ? Predictive — Implicit is a statistical methodology — Data mining lets you be proactive — Prospective rather than Retrospective Defining Data Mining 8 Goal of Data Mining — Simplification and automation of the overall statistical process, from data source(s) to model application — Changed over the years — Replace statistician ? Better models, less grunge work — 1 + 1 = 0 — Many different data mining algorithms / tools available — Statistical expertise required to compare different techniques — Build intelligence into the software 5 9 Data Mining Is… • Decision Trees • Nearest Neighbor Classification Neural Networks • Rule Induction • K-means Clustering If. . . . . If. . . . ....
View Full Document

This note was uploaded on 09/17/2009 for the course IT it771 taught by Professor Jenisha during the Fall '09 term at University of Advancing Technology.

Page1 / 47

1dmintro - 1 1 An Introduction to Data Mining Kurt...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online