8/17/2012
Chapter 4 Dimension Reduction
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Exploring the data
Statistical summary of data: common metrics
Average
Median
Minimum
Maximum
Standard deviation
K Nearest Neighbors
Supervised statistical learning
method
A simple classification method used to predict the class of a
categorical variable
Assign a new example to the class that is most common among its k
nearest neighbors in the training samples.
S
Logistic Regression
Logistic Regression
Regression model where the dependent (output) variable is
categorical.
If a binary variable is a function of a continuous input variable ,
logistic regression may be used to estimate the conditional
distribution u
Wage data Example
Set your working directory in RStudio to the directory that contains the file "wagedata.csv"
Read data from "wagedata.csv" into a dataframe called wagedf
wagedf = read.csv("wagedata.csv")
Inspect the stucture of the data frame wagedf
s
8/17/2012
Chapter 17 Smoothing Methods
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Smoothing is data driven
Regression methods assume underlying
unchanging structure (linear, exponential,
polynomial)
8/17/2012
Chapter 16 Regression Based
Forecasting
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Main ideas
Fit linear trend, time as predictor
Modify & use also for non-linear trends
Exponential
Polyn
8/17/2012
Chapter 15 Handling Time Series
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Main ideas
Forecast future values of a time series
Distinction between forecasting (main focus) and
describing/exp
8/17/2012
Chapter 14 Cluster Analysis
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Clustering: The Main Idea
Goal: Form groups (clusters) of similar records
Used for segmenting markets into groups of
sim
8/17/2012
Chapter 13 Association Rules
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
What are Association Rules?
Study of what goes with what
Customers who bought X also bought Y
What symptoms go with
Decision Tree Inductive
Learning
Decision Trees
Given a set of training examples in the form of a set of attribute
values as inputs and classes as outputs, a tree is constructed such
that:
Each non-terminal node is an attribute.
Each arc from a node co
Artificial Neural
Networks
Feedforward Neural Network
Predicted output
Y
Output node
Hidden
nodes
bias
1
X1
Input nodes
X2
Weights associated with the connections
are iteratively adjusted during training
to decrease the prediction error.
Training stops
8/17/2012
Chapter 11 Neural Nets
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Basic Idea
Combine input information in a complex & flexible
neural net model
Model coefficients are continually tweaked in
8/17/2012
Overview
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Core Ideas in Data Mining
Classification
Prediction
Association Rules
Data Reduction
Data Exploration
Visualization
1
8/17/2012
Super
9/14/2015
Chapter 3 Data Visualization
Data Mining for Business Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Graphs for Data Exploration
Basic Plots
Line Graphs
Bar Charts
Scatterplots
Distribution Plots
Boxplots
Histograms
1
9/
8/17/2012
Chapter 5 Evaluating Classification
& Predictive Performance
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Why Evaluate?
Multiple methods are available to classify or
predict
For each method,
4/6/2015
Chapter 6: Multiple Linear
Regression
Data Mining for Business Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Topics
Explanatory vs. predictive modeling with regression
Example: prices of Toyota Corollas
Fitting a pred
9/28/2015
Chapter 7 K-Nearest-Neighbor
Data Mining for Business Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Characteristics
Data-driven, not model-driven
Makes no assumptions about the data
1
9/28/2015
Basic Idea
For a given re
6/17/2015
Bayesian Classifier
Bayesian Classifier
1
Nave Bayes Classifier
Nave Bayes Classifier:
Nave Bayes classifier is based on the Bayes Theorem by Thomas Bayes (1702- 1761).
Nave Bayes classifiers are a family of simple probabilistic classifiers base
8/17/2012
Chapter 9 Classification and
Regression Trees
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Trees and Rules
Goal: Classify or predict an outcome based on a
set of predictors
The output is a set
8/17/2012
Chapter 10 Logistic Regression
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
Galit Shmueli and Peter Bruce 2010
Logistic Regression
Extends idea of linear regression to situation
where outcome variable is categorical
Widely use
Introduction and Overview
Junping Sun
Data Mining
1-1
Data Mining
Data Mining:
Extracting useful information from large data sets. (Hand et al. 2001)
It is a process of non-trivial extraction of implicit, previously unknown, and
potentially useful infor