Data Visualization
Section 1 Introduction
Page
1
Section 2 Numerical Measurements for One Variable
Page
9
Numerical Measures for Location Parameter
Page
9
Numerical Measures for Scale Parameter
Page 10
Section 3 Graphical Methods for One Variable
Page 12
Histogram
Page 12
Box Plot
Page 13
Density Plot
Page 16
Symmetry Plot
Page 17
Quantile Plot
Page 17
Normal Quantile Plot
Page 18
Univariate Procedure
Page 20
Section 4 Numerical Measurements for Two Variables
Page 22
Pearson Product Moment Correlation
Page 22
Spearman Rank-Order Correlation
Page 22
Kendall's Tau-b Correlation Coefficient
Page 23
Hoeffding Dependence Coefficient
Page 23
Section 5 Graphical Methods for Two Variables
Page 24
Scatter Plot
Page 24
Scatter Plot with Imposing Lines
Page 26
Logit Plot
Page 29
Section 6 Graphical Methods for Multiple Variables
Page 31
Contour Plot
Page 31
Scatter Plot Matrices
Page 33
Rotating Three dimension Plot
Page 34
Appendix 1 Robust Estimation
Page 35
Appendix 2 Ozone Data
Page 38
Appendix 3 Speed of Light Data
Page 39
Appendix 4 SAS Code to Calculate Quantile
Page 40
Appendix 5 SAS Code to Calculate Normal Quantile
Page 41
Appendix 6 SAS MACRO to Calculate Logit and Log(Logit)
Page 42
Appendix 7 References
Page 43

This
** preview**
has intentionally

**sections.**

*blurred***to view the full version.**

*Sign up*2
Section 1 Introduction
Data visualization is a very important technique used in statistical data mining.
In statistical data
mining, one can says that a well designed graph says more than a million numbers can.
Visuals
can be used in communicating statistical findings to audients without any statistical knowledge
(Anderson Wallgren, Britt Wallgren, Rolf Persson, Ulf Jorner, and Jan-Aage Haaland, 1996).
Also, visuals can be used to discover information and knowledge hided inside the data.
In this
lecture, we will focus our discussion on how to use visuals to discover hidden information and
knowledge.
However, we have to emphasize that visualization methods are suggestive but not
necessarily definitive.
They can give the data miner a sense of what the data set looks like
without overwhelming miner with massive tables of numbing numbers.
It is a possibility that we
might end up detecting an effect (or effects) that are actually nothing more than mere random
noise.
This unfortunate state of events tends to be emphasized by professional statisticians who
warn of the use of sophisticated statistical methods by novices.
This section brings attention to
these issues as a warning to the user but not to discourage the use of data mining tools in general
nor visualization tools specifically.
The following example was used to illustrate the limitation of data visualization method.
Example 1 (Limitations of Visualization Methods)
The problem of characterizing random
noise as a genuine effect is similar to the problem of over fitting in data mining.
The following
plot is based on twenty simulated random samples with each sample containing one hundred
values drawn from a standard normal distribution with mean 0 and variance 1.

This is the end of the preview.
Sign up
to
access the rest of the document.