# Lecture2 - Data Visualization Section 1 Introduction...

This preview shows pages 1–3. Sign up to view the full content.

Data Visualization Section 1 Introduction Page 1 Section 2 Numerical Measurements for One Variable Page 9 Numerical Measures for Location Parameter Page 9 Numerical Measures for Scale Parameter Page 10 Section 3 Graphical Methods for One Variable Page 12 Histogram Page 12 Box Plot Page 13 Density Plot Page 16 Symmetry Plot Page 17 Quantile Plot Page 17 Normal Quantile Plot Page 18 Univariate Procedure Page 20 Section 4 Numerical Measurements for Two Variables Page 22 Pearson Product Moment Correlation Page 22 Spearman Rank-Order Correlation Page 22 Kendall's Tau-b Correlation Coefficient Page 23 Hoeffding Dependence Coefficient Page 23 Section 5 Graphical Methods for Two Variables Page 24 Scatter Plot Page 24 Scatter Plot with Imposing Lines Page 26 Logit Plot Page 29 Section 6 Graphical Methods for Multiple Variables Page 31 Contour Plot Page 31 Scatter Plot Matrices Page 33 Rotating Three dimension Plot Page 34 Appendix 1 Robust Estimation Page 35 Appendix 2 Ozone Data Page 38 Appendix 3 Speed of Light Data Page 39 Appendix 4 SAS Code to Calculate Quantile Page 40 Appendix 5 SAS Code to Calculate Normal Quantile Page 41 Appendix 6 SAS MACRO to Calculate Logit and Log(Logit) Page 42 Appendix 7 References Page 43

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 Section 1 Introduction Data visualization is a very important technique used in statistical data mining. In statistical data mining, one can says that a well designed graph says more than a million numbers can. Visuals can be used in communicating statistical findings to audients without any statistical knowledge (Anderson Wallgren, Britt Wallgren, Rolf Persson, Ulf Jorner, and Jan-Aage Haaland, 1996). Also, visuals can be used to discover information and knowledge hided inside the data. In this lecture, we will focus our discussion on how to use visuals to discover hidden information and knowledge. However, we have to emphasize that visualization methods are suggestive but not necessarily definitive. They can give the data miner a sense of what the data set looks like without overwhelming miner with massive tables of numbing numbers. It is a possibility that we might end up detecting an effect (or effects) that are actually nothing more than mere random noise. This unfortunate state of events tends to be emphasized by professional statisticians who warn of the use of sophisticated statistical methods by novices. This section brings attention to these issues as a warning to the user but not to discourage the use of data mining tools in general nor visualization tools specifically. The following example was used to illustrate the limitation of data visualization method. Example 1 (Limitations of Visualization Methods) The problem of characterizing random noise as a genuine effect is similar to the problem of over fitting in data mining. The following plot is based on twenty simulated random samples with each sample containing one hundred values drawn from a standard normal distribution with mean 0 and variance 1.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 43

Lecture2 - Data Visualization Section 1 Introduction...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online