# Are saying something about the variability of the

• Assignment
• 453
• 63% (16) 10 out of 16 people found this document helpful

This preview shows page 140 - 144 out of 453 pages.

are saying something about the variability of the distribution. The following histograms (Figure 10.3) illustrate the distinction between central tendency and variability. 0 50 100 150 0.0 0.02 0.06 0.10 Low Center - Low Variability 0 50 100 150 0.0 0.01 0.02 0.03 Low Center - High Variability 0 50 100 150 0.0 0.02 0.06 High Center - Low Variability 0 50 100 150 0.0 0.01 0.02 0.03 0.04 High Center - High Variability Figure 10.3: Histograms illustrating variability in central tendency and variability, Section 10.3. The same trends can be illustrated using density curves (Figure 10.4).
CHAPTER 10. PROPERTIES OF DATA 140 Low Center - Low Variability 0 50 100 150 0.0 0.04 0.08 0.12 Low Center - High Variability 0 50 100 150 0.0 0.01 0.02 0.03 0.04 High Center - Low Variability 0 50 100 150 0.0 0.04 0.08 0.12 High Center - High Variability 0 50 100 150 0.0 0.01 0.02 0.03 0.04 Figure 10.4: Densities illustrating variability in central tendency and variability, Section 10.3. We saw in the last section a number of ways to numerically express central tendency using the mean, median and trimmed mean. The interquartile range (IQR) can be used as a measure of variability. But the most common measure of variability in data is the sample variance . X 1 ,X 2 ,...,X n The formula for the (sample) variance is S 2 n = n i = 1 ( X i - ¯ X n ) 2 n - 1 where ¯ X n is the sample mean (note that we have now added the subscript n to the notation ¯ X ). Intuitively, to calculate S 2 n we take an observation X i and subtract the mean ¯ X n , giving the distance between X i and ¯ X n . Note, however, that this can be positive or negative depending on whether
CHAPTER 10. PROPERTIES OF DATA 141 X i is above or below ¯ X n . If we square the distance, then the result will always be positive. We do this for each observation X i then sum the results. Finally we divide by n - 1 (we’ll see why we don’t divide by n after some probability theory). Example 10.1. Suppose we want to calculate the sample variance of the following data. 50.3 52.5 58.6 62.9 64.0 We first need the calculate the mean. ¯ X 5 = 50 . 3 + 52 . 5 + 58 . 6 + 62 . 9 + 64 . 0 5 = 57 . 66 . The calculations needed are done in the following table i X i X i - ¯ X 5 ( X i - ¯ X 5 ) 2 1 50.3 -7.36 54.17 2 52.5 -5.16 26.63 3 58.6 0.94 0.89 4 62.9 5.24 27.46 5 64.0 6.34 40.20 total 149.35 To complete the calculation we have from the table 5 i = 1 ( X i - ¯ X 5 ) 2 = 149 . 35 so that, with n = 5 S 2 n = 149 . 35 5 - 1 = 37 . 3 which gives the variance. There are various ways to calculate S 2 n , given by S 2 n = n i = 1 ( X i - ¯ X n ) 2 n - 1 (10.1) = n i = 1 X 2 i - n ¯ X 2 n n - 1 (10.2) = n i = 1 X 2 i - ( n i = 1 X i ) 2 n n - 1 (10.3) We already used formula (10.1). Formulae (10.2) and (10.3) tend to be simpler to use. To see this we’ll repeat the last calculation using (10.2).
CHAPTER 10. PROPERTIES OF DATA 142 Example 10.2. To repeat the calculation of the previous example using formula (10.2), we can use much the same technique, except that a step is eliminated whereby we subtract ¯ X n from X i . i X i X 2 i 1 50.3 2530.09 2 52.5 2756.25 3 58.6 3433.96 4 62.9 3956.41 5 64.0 4096.00 total 16772.71 From the table we have 5 i = 1 X 2 i = 16772 . 71 . We already know ¯ X 5 = 57 . 66 and that n = 5. Then, substituting into formula (2) gives S 2 n = 16772 . 71 - 5 × 57 . 66 2 5 - 1 = 37 . 3 which is the same value previously obtained.