L06_220_six

Course: ECO 220, Fall 2009
School: Toledo
Word Count: 650

of Measures Variability Describing Data: Measures of Variability &amp; Relative Standing Part 2 of 4 Lecture 6 Sections 4.2 4.3 1 Summarize data variability with statistics: Range: the difference between the largest and smallest observation How is the range a measure of variability? How many observations in the data are used to calculate this sample statistic? Is it sensitive to outliers? Variance...

of Measures Variability Describing Data: Measures of Variability & Relative Standing Part 2 of 4 Lecture 6 Sections 4.2 4.3 1 Summarize data variability with statistics: Range: the difference between the largest and smallest observation How is the range a measure of variability? How many observations in the data are used to calculate this sample statistic? Is it sensitive to outliers? Variance Standard Deviation 2 Range Density 0 .1 .2 .3 .4 .5 Density .05 .1 .15 n: 60 n: 60 Variance Variance: the sum of the squared deviations from the mean divided by the degrees of freedom -1 0 1 2 -5 0 5 10 2 = i =1 ( xi ) 2 N N s 2 = i =1 ( xi x ) 2 n 1 4 n Which histogram shows data with a larger range? Approximately, what is the sample range for each? 3 0 Which is a parameter? A statistic? Breaking Down Variance Numerator: total sum of squares: TSS Obs. far from mean increase TSS a lot TSS & s2 if xi = 5 for all i? Standard Deviation Standard deviation (s.d.): the square root of the variance s 2 = i =1 n ( xi x ) 2 n 1 n = 2 s = s2 Denominator: degrees of freedom (df, ) Only n 1 free obs left after calculate mean Why n in denominator? TSS = ( xi x ) 2 i =1 s.d. is reported more often than variance s.d. measured in same units as original variable whereas variance measured in units squared Ex: In sample of 450 people, income has variance of 100 million dollars-squared and s.d. of \$10,000 6 = n 1 5 1 n: 900, mean: 0.0, sd: 1.0 .5 .4 67% within 1 s.d. .3 95% within 2 s.d. .2 100% within 3 s.d. .1 0 -4 -2 0 2 4 X n: 900, mean: 0.0, sd: 1.0 .4 .3 .2 59% within 1 s.d. 100% within 2 s.d. .1 100% within 3 s.d. 0 -2 -1 0 1 2 X n: 900, mean: 0.0, sd: 1.0 .8 80% within 1 s.d. .6 95% within 2 s.d. 98% within 3 s.d. .4 .2 0 -2 0 2 4 6 X 1 .8 .6 .4 .2 0 n: 900, mean: 0.0, sd: 1.0 88% within 1 s.d. 95% within 2 s.d. 98% within 3 s.d. Density Density Empirical Rule If sample from a normal population then: About 68.3% of all obs. within 1 s.d. of mean About 95.4% of all obs. within 2 s.d. of mean About 99.7% of all obs. within 3 s.d. of mean Density Density Empirical Rule only applies if normal Why does the Empirical rule say about? 0 7 8 -10 -8 -6 -4 -2 X % of obs in interval from (mean #*sd) to (mean + #*sd) .4 n: 60, mean: 6.0, sd: 1.0 How are the percentages calculated? Does the Empirical Rule hold? Chebysheffs Theorem At least 11/k2 of obs. lie within k s.d.s of the mean for k>1 At least 75% of obs. lie within 2 s.d. of mean 1 1/22 = 3/4 Density .1 .2 .3 63.3% In general does s.d. = bin width? At least 89% of obs. lie within 3 s.d. of mean 1 - 1/32 = 8/9 93.3% Can 100% 3 4 5 6 7 8 9 9 be applied to all samples no matter how population is distributed More general? What about within one s.d.? 10 0 Measures of Variability & Relative Standing Summarize data variability with statistics: Range Variance & Standard Deviation Coefficient of Variation HW: Study CV on your own Percentiles, Median & Interquartile Percentile: Pth percentile is value at cut-off between bottom P% & top (1-P)% of obs. 90th percentile of checkout time is 20 minutes The 50th percentile is the median Measures of relative standing also give a sense of the distribution of the data Percentiles 11 Interquartile range: 75th percentile minus 25th percentile Measures spread of middle observations Central tendency, variability, relative standing? Sensitive to outliers? 12 2 Trips Freq. Percent 0 294 35.85 1 76 9.27 2 66 8.05 3 58 7.07 4 47 5.73 5 47 5.73 6 36 4.39 7 30 3.66 8 28 3.41 9 15 1.83 10 9 1.10 11 16 1.95 12 25 3.05 13 9 1.10 14 5 0.61 15 9 1.10 16 5 0.61 17 6 0.73 18 4 0.49 contd Cum. 35.85 45.12 53.17 60.24 65.98 71.71 76.10 79.76 83.17 85.00 86.10 88.05 91.10 92.20 92.80 93.90 94.51 95.24 95.73 Trips Freq. Percent Cum. 19 1 0.12 95.85 20 3 0.37 96.22 21 2 0.24 96.46 22 4 0.49 96.95 23 1 0.12 97.07 24 4 0.49 97.56 97.80 25 2 0.24 26 4 0.49 98.29 27 2 0.24 98.54 28 3 0.37 98.90 30 1 0.12 99.02 34 1 0.12 99.15 35 1 0.12 99.27 36 1 0.12 99.39 41 1 0.12 99.51 43 1 0.12 99.63 44 1 0.12 99.76 45 1 0.12 99.88 50 1 0.12 100.00 Total 820 100.00 Reading STATA Output What is the median? . summarize Number_of_Trips, detail; Number_of_Trip...

