are saying something about the variability of the distribution.
The following histograms (Figure 10.3) illustrate the distinction between central tendency and
variability.
0
50
100
150
0.0
0.02
0.06
0.10
Low Center  Low Variability
0
50
100
150
0.0
0.01
0.02
0.03
Low Center  High Variability
0
50
100
150
0.0
0.02
0.06
High Center  Low Variability
0
50
100
150
0.0
0.01
0.02
0.03
0.04
High Center  High Variability
Figure 10.3: Histograms illustrating variability in central tendency and variability, Section 10.3.
The same trends can be illustrated using density curves (Figure 10.4).
CHAPTER 10.
PROPERTIES OF DATA
140
Low Center  Low Variability
0
50
100
150
0.0
0.04
0.08
0.12
Low Center  High Variability
0
50
100
150
0.0
0.01
0.02
0.03
0.04
High Center  Low Variability
0
50
100
150
0.0
0.04
0.08
0.12
High Center  High Variability
0
50
100
150
0.0
0.01
0.02
0.03
0.04
Figure 10.4: Densities illustrating variability in central tendency and variability, Section 10.3.
We saw in the last section a number of ways to numerically express central tendency using the
mean, median and trimmed mean.
The interquartile range (IQR) can be used as a measure of
variability. But the most common measure of variability in data is the
sample variance
.
X
1
,X
2
,...,X
n
The formula for the (sample) variance is
S
2
n
=
∑
n
i
=
1
(
X
i

¯
X
n
)
2
n

1
where
¯
X
n
is the sample mean (note that we have now added the subscript
n
to the notation
¯
X
).
Intuitively, to calculate
S
2
n
we take an observation
X
i
and subtract the mean
¯
X
n
, giving the distance
between
X
i
and
¯
X
n
. Note, however, that this can be positive or negative depending on whether
CHAPTER 10.
PROPERTIES OF DATA
141
X
i
is above or below
¯
X
n
. If we square the distance, then the result will always be positive. We
do this for each observation
X
i
then sum the results. Finally we divide by
n

1 (we’ll see why we
don’t divide by
n
after some probability theory).
Example 10.1.
Suppose we want to calculate the sample variance of the following data.
50.3
52.5
58.6
62.9
64.0
We first need the calculate the mean.
¯
X
5
=
50
.
3
+
52
.
5
+
58
.
6
+
62
.
9
+
64
.
0
5
=
57
.
66
.
The calculations needed are done in the following table
i
X
i
X
i

¯
X
5
(
X
i

¯
X
5
)
2
1
50.3
7.36
54.17
2
52.5
5.16
26.63
3
58.6
0.94
0.89
4
62.9
5.24
27.46
5
64.0
6.34
40.20
total
149.35
To complete the calculation we have from the table
5
∑
i
=
1
(
X
i

¯
X
5
)
2
=
149
.
35
so that, with
n
=
5
S
2
n
=
149
.
35
5

1
=
37
.
3
which gives the variance.
∎
There are various ways to calculate
S
2
n
, given by
S
2
n
=
∑
n
i
=
1
(
X
i

¯
X
n
)
2
n

1
(10.1)
=
∑
n
i
=
1
X
2
i

n
¯
X
2
n
n

1
(10.2)
=
∑
n
i
=
1
X
2
i

(
∑
n
i
=
1
X
i
)
2
n
n

1
(10.3)
We already used formula (10.1). Formulae (10.2) and (10.3) tend to be simpler to use. To see this
we’ll repeat the last calculation using (10.2).
CHAPTER 10.
PROPERTIES OF DATA
142
Example 10.2.
To repeat the calculation of the previous example using formula (10.2), we can
use much the same technique, except that a step is eliminated whereby we subtract
¯
X
n
from
X
i
.
i
X
i
X
2
i
1
50.3
2530.09
2
52.5
2756.25
3
58.6
3433.96
4
62.9
3956.41
5
64.0
4096.00
total
16772.71
From the table we have
5
∑
i
=
1
X
2
i
=
16772
.
71
.
We already know
¯
X
5
=
57
.
66
and that
n
=
5. Then, substituting into formula (2) gives
S
2
n
=
16772
.
71

5
×
57
.
66
2
5

1
=
37
.
3
which is the same value previously obtained.