Chap2_Overview - Overview © Galit Shmueli and Peter Bruce...

Info iconThis preview shows pages 1–13. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Overview © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce Core Ideas in Data Mining Classification Prediction Association Rules Data Reduction Data Exploration Visualization Supervised Learning Goal: predict a single “target” or “outcome” variable Training data where target value is known Score to data where value is not known Methods: Classification and Prediction Unsupervised Learning Goal: segment data into meaningful segments; detect patterns There is no target (outcome) variable to predict or classify Methods: Association rules, data reduction & exploration, visualization Supervised: Classification Goal: predict categorical target (outcome) variable Examples: Purchase/no purchase, fraud/no fraud, creditworthy/not creditworthy… Each row is a case (customer, tax return, applicant) Each column is a variable Target variable is often binary (yes/no) Supervised: Prediction Goal: predict numerical target (outcome) variable Examples: sales, revenue, performance As in classification: Each row is a case (customer, tax return, applicant) Each column is a variable Taken together, classification and prediction constitute “predictive analytics” Unsupervised: Association Rules Goal: produce rules that define “what goes with what” Example: “If X was purchased, Y was also purchased” Rows are transactions Used in recommender systems – “Our records show you bought X, you may also like Y” Also called “affinity analysis” Unsupervised: Data Reduction Distillation of complex/large data into simpler/smaller data Reducing the number of variables/columns (e.g., principal components) Reducing the number of records/rows (e.g., clustering) Unsupervised: Data Visualization Graphs and plots of data Histograms, boxplots, bar charts, scatterplots Especially useful to examine relationships between pairs of variables Data Exploration Data sets are typically large, complex & messy Need to review the data to help refine the task Use techniques of Reduction and Visualization The Process of Data Mining Steps in Data Mining 1. Define/understand purpose 2. Obtain data (may involve random sampling) 3. Explore, clean, pre-process data 4. Reduce the data; if supervised DM, partition it 5. Specify task (classification, clustering, etc.) 6. Choose the techniques (regression, CART, neural networks, etc.) 7. Iterative implementation and “tuning” 8. Assess results – compare models 9. Deploy best model Obtaining Data: Sampling...
View Full Document

This note was uploaded on 11/09/2011 for the course MAR 08 taught by Professor Staff during the Spring '08 term at Youngstown State University.

Page1 / 37

Chap2_Overview - Overview © Galit Shmueli and Peter Bruce...

This preview shows document pages 1 - 13. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online