This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: STA2006 20072008 Term 2 Chapter 1: Exploratory Data Analysis (HT Section 6.1) Exploratory Data Analysis (EDA) is the first step of a professional statis tical data analysis. It consists of (1) plotting some graphs such as histograms and box plots and (2) computing some simple summary statistics such as: mean, mode, median, upper and lower quartiles, maximum and minimum, standard deviation, range and interquartile range. EDA is essential because: 1. It helps to identify input or measurement error in the data. 2. It shows some characteristics of the population distribution. For ex ample, skewness and symmetry. (Assume there are sufficient data to manifest these characteristics of the population.) 3. It provides some representations of the data (the measures of central tendency). 4. It provides some measures of variation. 1 Graphical Techniques 1.1 Histograms and Bar Charts For continuous data, a and b are the variable minimum and maximum, re spectively. M is the number of classes and h is the class width. h = b a M 1 M can be chosen freely. A useful rule of thumb M = [log 2 ( n )] where n is the total number of observations and [ x ] is the smallest integer not exceeding x (also known as the floor of x ). For discrete data, the variable distribution can be presented using bar chart if the number of levels is small. For a variable with large number of levels, histogram is more appropriate. In that case, one should take a to be the minimum of the variable minus 0.5 and b to be the maximum plus 0.5. Note that the yaxis can be frequency or the relative frequency which is defined as Relative Frequency = Frequency n 1.2 Stemandleaf Diagram Stemandleaf diagram is a quick way of drawing histogram when the number of observations is small, say, n is less than 50. For example, the resale price (in thousands of US dollars) of 25 condominium units is listed in the followings....
View
Full Document
 Spring '11
 Ho
 Statistics, Histograms, Standard Deviation, Mean, Interquartile range

Click to edit the document details