4
1.2
Parameters and Statistics
A
population parameter
is a (typically unknown) numerical constant that describes the population
of interest.
A
sample statistic
is a numerical summary of data that comes from the sample.
Typically, we use Greek symbols to denote population parameters and the ”hat” symbol to denote
the sample statistic.
For example, the population mean is denoted by
μ
and the sample mean is denoted by ˆ
μ
= ¯
x
1.3
Summary Statistics
Let
x
1
, x
2
, ..., x
n
denote
n
observations sampled from a population.
1.3.1
Measures of Location
•
The
mode
of the data is the most frequently encountered value. Note that data can have multiple
modes. Data with two modes are called
bimodal
and data with three are called
trimodal
.
•
The
mean
of the data is calculated by taking the
arithmetic average
of the data. We use the
equation
¯
x
=
1
n
n
X
i
=1
x
i
•
The
median
of the data is found by sorting the data in increasing order and then choosing the
observation that divides the data into two equal parts.
If
n
is odd, the index of the sorted data that accomplishes this is (
n
+ 1)
/
2.
If
n
is even, the median is found by taking the average of the numbers in positions
n/
2 and
n/
2 + 1.
•
The
p
th percentile
is the value that divides the sorted data such that
p
% of the data are less
than that value and (1

p
)% of the data are greater than it. To find this value, first sort the
data. Then, compute the quantity (
p/
100)(
n
+ 1). If this value is an integer, then the data point
in this position is the
p
th percentile. Otherwise, take the average of the nearest data point to
the left and to the right of this number.
Note that the 50th percentile is the median.
Other important percentiles are the 25th and
75th. Together, the 25th, 50th, and 75th percentiles make up the first, second, and third quar
tiles, respectively.
Note:
Some statistical packages use different methods of calculating percentiles, such as us
ing a weighted average instead of the simple average used above. Thus, results from computers
may be different than the results you obtain, but they should be close.
5
Example 1.3.1.
The following values of fracture stress (in megapascals) were measured for a sample of 24 mixtures of
hot mixed asphalt (HMA).
30
75
79
80
80
105
126
138
149
179
179
191
223
232
232
236
240
242
245
247
254
274
384
470
1. Find the mode of the data.
2. Find the mean of the data.
3. Find the 25th percentile (first quartile) of the data.
6
1.3.2
Measures of Spread
•
The
sample variance
is a measure of spread that calculates the sum of
squared deviations
from
the center (as measured by the mean).
s
2
=
1
n

1
n
X
i
=1
(
x
i

¯
x
)
2
An equivalent formula, which is often easier to use, is
s
2
=
1
n

1
n
X
i
=1
x
2
i

n
¯
x
2
!
•
The
sample standard deviation
is simply the square root of the sample variance. That is,
s
=
√
s
2
=
v
u
u
t
1
n

1
n
X
i
=1
(
x
i

¯
x
)
2
Question:
Why should we take the square root?