Chap3_Exploration - Chapter 3 Data Exploration and...

Info iconThis preview shows pages 1–14. Sign up to view the full content.

View Full Document Right Arrow Icon
Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Exploring the data Statistical summary of data: common metrics Average Median Minimum Maximum Standard deviation Counts & percentages
Background image of page 2
Summary Statistics – Boston Housing
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Correlations Between Pairs of Variables: Correlation Matrix from Excel PTRATIO B LSTAT MEDV PTRATIO 1 B -0.17738 1 LSTAT 0.374044 -0.36609 1 MEDV -0.50779 0.333461 -0.73766 1
Background image of page 4
Summarize Using Pivot Tables Count of MEDV CHAS Total 0 471 1 35 Grand Total 506 Counts & percentages are useful for summarizing categorical data Boston Housing example: 471 neighborhoods border the Charles River (1) 35 neighborhoods do not (0)
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Pivot Tables - cont. In Boston Housing example: Compare average home values in neighborhoods that border Charles River (1) and those that do not (0) Average of MEDV CHAS Total 0 22.09 1 28.44 Grand Total 22.53 Averages are useful for summarizing grouped numerical data
Background image of page 6
Pivot Tables, cont. Group by multiple criteria: By # rooms and location E.g., neighborhoods on the Charles with 6-7 rooms have average house value of 25.92 ($000) Average of MEDV CHAS RM 0 1 Grand Total 3-4 25.30 25.30 4-5 16.02 16.02 5-6 17.13 22.22 17.49 6-7 21.77 25.92 22.02 7-8 35.96 44.07 36.92 8-9 45.70 35.95 44.20 Grand Total 22.09 28.44 22.53
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Graphs
Background image of page 8
Histograms Histogram shows the distribution of the outcome variable (median house value) 0 20 40 60 80 100 120 140 160 180 5 10 15 20 25 30 35 40 45 50 Frequency MEDV Histogram Boston Housing example:
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Boxplots Boston Housing Example: Display distribution of outcome variable (MEDV) for neighborhoods on Charles (1) and not on Charles (0) 0 1 0 10 20 30 40 50 60 Y Values CHAS Box Plot MEDV Side-by-side boxplots are useful for comparing subgroups
Background image of page 10
Box Plot Top outliers defined as those above Q3+1.5(Q3-Q1). “max” is the maximum of non-outliers Analogous definitions for bottom outliers and for “min” Details may differ across software Media n Quartile 1 “max “min” outliers mea n outlier Quartile 3
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Correlation Analysis Below: Correlation matrix for portion of Boston Housing data Shows correlation between variable pairs CRIM ZN INDUS CHAS NOX RM CRIM 1 ZN -0.20047 1 INDUS 0.406583 -0.53383 1 CHAS -0.05589 -0.0427 0.062938 1 NOX 0.420972 -0.5166 0.763651 0.091203 1 RM -0.21925 0.311991 -0.39168 0.091251 -0.30219 1
Background image of page 12
Matrix Plot Shows scatterplots for variable pairs Example: scatterplots for 3 Boston Housing variables
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 14
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 11/09/2011 for the course MAR 08 taught by Professor Staff during the Spring '08 term at Youngstown State University.

Page1 / 29

Chap3_Exploration - Chapter 3 Data Exploration and...

This preview shows document pages 1 - 14. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online