STAT 2513. Statistics for Biological and Health Sciences
§
1.2 Numerical Summary for Univariate Variable
Peng Zeng (Auburn University, Summer 2019)
This section discusses how to summarize data with numbers. The two most important
summaries of data for a quantitative variable are those for the
center
and the
spread
.
Appropriate choices for these measures are discussed here as well as the effects of a linear
transformation of the data on these measures. This section also explains the
fivenumber
summary
and its graphical representation, the
boxplot
.
Finally, the
percentile
is ex
plained as a measure of relative standing.
Measures of Center
In a class of 15 students, the scores of exam 1 are:
55, 60, 65, 68, 72, 64, 83, 63, 66, 72, 95, 62, 83, 58, 72.
How would you describe the center of these numbers?
A
typ
i
cal
ap
proach
is
to
de
scribe
the
cen
ter
of
data
by
mean
. The word “mean”
is how we call “average” in Statistics. Calculate the sample mean ¯
x
by the formula
¯
x
=
1
n
n
X
i
=1
x
i
=
x
1
+
x
2
+
· · ·
+
x
n
n
,
where
n
denotes the
sample size
, or the number of individuals in a dataset. Here
n
= 15,
and
x
i
denotes the score for the
i
th student. For the example, the mean is
¯
x
=
55 + 60 + 65 + 68 + 72 + 64 + 83 + 63 + 66 + 72 + 95 + 62 + 83 + 58 + 72
15
= 69
.
2
.
An
other
ap
proach
for
de
scrib
ing
the
cen
ter
is
to
cal
cu
late
the
me
dian
. The word
“median” roughly means “middle number”, that is, we sort the data and then pick the
number in the middle. Below are the data sorted in ascending order.
55, 58, 60, 62, 63, 64, 65, 66, 68, 72, 72, 72, 83, 83, 95.
The middle number is exactly the 8th score, because seven scores are smaller than it and
seven scores are larger than it. Hence the median is 66, which is the 8th score.
Suppose for example that there are only
n
= 4 scores. There is no single “middle number”.
In this case, we average the two middle numbers. For example, the sample median of 55, 58,
60, 62 is (58 + 60)/2 = 59. As a summary,
the
sam
ple
me
dian
is
the
mid
dle
num
ber,
pro
vided
there
is
one,
or
the
av
er
age
of
the
two
mid
dle
num
bers
.
Care has to be taken when interpreting a measure of center. Average life expectancy, for
example, has substantially increased during the last two decades in the United States. This
poses problems for the long term financing of Social Security. Based on this information one
1
might find an increase of minimum retirement age to be the obvious solution. It turns out,
however, that life expectancy has increased much more for wealthy people than for the poor.
Should the poor, who depend on Social Security, have to work longer because the wealthy,
who don’t need it, live longer? More generally, just because some procedure or decision is
best in the “typical” case, does not necessarily make it the best choice in a particular case.
Measures of Spread
How spread apart is the data? One idea for measuring the spread of the data is to calculate
the average distance from its center. For the data:
55, 58, 60, 62, 63, 64, 65, 66, 68, 72, 72, 72, 83, 83, 95
the center, as measured by the sample mean, is 69.2. So we subtract from each score 69.2
and obtain

14
.
2,

11
.
2,

9
.
2,

7
.
2,

6
.
2,

5
.
2,

4
.
2,

3
.
2,

1
.
2, 2.8, 2.8, 2.8, 13.8, 13.8, 25.8.