{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Class%20Notes-1

# Class%20Notes-1 - Class Notes Data Set 1 cfw_46 49 52 57...

This preview shows pages 1–3. Sign up to view the full content.

11-09-06 Class Notes Data Set 1 {46, 49, 52, 57} Mean (46 + 49 + 52 + 57)/4 = 51 Standard Deviation ( s x ) s x = Data Set 2 {24, 34, 55, 91} Mean = 51 Standard Deviation 29.631 SHARP EL 531 MODE STAT SD 46 M+ 49 M+ 52 M+ 57 M+ RCL (mean) 51 RCL s x 4.690 CASIO MODE MODE SD 46 M+ 49 M+ 52 M+ 57 M+ [2 nd function] (upper left) S-var Exploratory Data Analysis (EDA) John Tukey The Median The median of a data set is found by putting the numbers into numerical order and finding: i. The middle number if n is odd ii. The average of the two middle numbers if n is even The median is called the 50 th percentile. Tukey states that the median divides the data set into a lower half and a higher half. If n is odd, Tukey considers the median to belong both to the lower half and the upper half. Tukey’s lower hinge is the median of the lower half, and the upper hinge is the median of the upper half. Lower hinge ≈ 25 th percentile = Q 1 = first quartile Upper hinge ≈ 75 th percentile = Q 3 = third quartile Tukey’s h -spread = upper hinge – lower hinge ≈ interquartile range =Q 3 – Q 1 Interquartile range helps protect against outliers. Application (to data sets from worksheet) Question 2 (stem-and-leaf notation) n = 45 mean = 87.04444 s x = 18.764 median (23 rd number) = 85 mean > median This reflects the positive skew. NB: These numbers are meaningful for large data sets but are relatively meaningless for small data sets. Lower hinge (12 th number) = 74 Upper hinge (12 th number from end) = 99 Tukey’s h -spread = 99 – 74 = 25 Interquartile range Tukey’s Fences The lower inner fence is: Lower hinge minus 1.5 * h -spread The upper inner fence is: Upper hinge plus 1.5 * h -spread NB: 1.5 is an arbitrary number according to Prof. But Tukey claimed that after spending 10 years in the computer lab that this was a good number for his purposes. A number in the data set is called a Tukey outlier if either it is greater than the upper inner fence or less than the lower inner fence. For the data in example 2, Lower inner fence = 74-1.5 * 25 = 36.5 No low outliers Upper inner fence = 99 + 1.5 * 25 = 136.5 138 is an outlier Potential exam question – why do statisticians care about finding outliers? Outlier data are fairly often incorrect data. Even if sometimes they are correct, statisticians sometimes choose to exclude them from their analyses because they throw everything off. For this reason, “average” income has been replaced with “median” income when a city’s incomes are reported. Range = highest number, minus lowest number

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
= 138 – 51 = 87 Tukey’s Adjacent Values Tukey’s lower adjacent value = lowest non-outlier in the data set Tukey’s upper adjacent value = highest non-outlier in the data set Tukey’s Five-Point Summary The five-point summary: Lower adjacent value Lower hinge Median Upper hinge Upper adjacent value Box-and-Whisker Plot See class handout page for example Boundaries of box are the lower and upper hinge; mark in middle of box is the median; whiskers extend out to the upper and lower adjacent values; outliers are indicated by asterisks.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 32

Class%20Notes-1 - Class Notes Data Set 1 cfw_46 49 52 57...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online