sta 2006 0708_chapter_1

# sta 2006 0708_chapter_1 - STA2006 2007-2008 Term 2 Chapter...

This preview shows pages 1–3. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: STA2006 2007-2008 Term 2 Chapter 1: Exploratory Data Analysis (HT Section 6.1) Exploratory Data Analysis (EDA) is the first step of a professional statis- tical data analysis. It consists of (1) plotting some graphs such as histograms and box plots and (2) computing some simple summary statistics such as: mean, mode, median, upper and lower quartiles, maximum and minimum, standard deviation, range and inter-quartile range. EDA is essential because: 1. It helps to identify input or measurement error in the data. 2. It shows some characteristics of the population distribution. For ex- ample, skewness and symmetry. (Assume there are sufficient data to manifest these characteristics of the population.) 3. It provides some representations of the data (the measures of central tendency). 4. It provides some measures of variation. 1 Graphical Techniques 1.1 Histograms and Bar Charts For continuous data, a and b are the variable minimum and maximum, re- spectively. M is the number of classes and h is the class width. h = b- a M 1 M can be chosen freely. A useful rule of thumb M = [log 2 ( n )] where n is the total number of observations and [ x ] is the smallest integer not exceeding x (also known as the floor of x ). For discrete data, the variable distribution can be presented using bar chart if the number of levels is small. For a variable with large number of levels, histogram is more appropriate. In that case, one should take a to be the minimum of the variable minus 0.5 and b to be the maximum plus 0.5. Note that the y-axis can be frequency or the relative frequency which is defined as Relative Frequency = Frequency n 1.2 Stem-and-leaf Diagram Stem-and-leaf diagram is a quick way of drawing histogram when the number of observations is small, say, n is less than 50. For example, the resale price (in thousands of US dollars) of 25 condominium units is listed in the followings....
View Full Document

{[ snackBarMessage ]}

### Page1 / 9

sta 2006 0708_chapter_1 - STA2006 2007-2008 Term 2 Chapter...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online