7. Expectation, Averages, Variability
7.1
Summarizing Data on Random Variables
When we return midterm tests, someone almost always asks what the average was. While we could
list out all marks to give a picture of how students performed, this would be tedious. It would also
give more detail than could be immediately digested. If we summarize the results by telling a class
the average mark, students immediately get a sense of how well the class performed. For this reason,
“summary statistics” are often more helpful than giving full details of every outcome.
To illustrate some of the ideas involved, suppose we were to observe cars crossing a toll bridge, and
record the number,
X
, of people in each car. Suppose in a small study
10
data on 25 cars were collected.
We could list out all 25 numbers observed, but a more helpful way of presenting the data would be in
terms of the
frequency distribution
below, which gives the number of times (the “frequency”) each
value of
X
occurred.
X
Frequency Count
Frequency
1
   

6
2
   
  
8
3
   
5
4
  
3
5

2
6
1
We could also draw a
frequency
histogram of these frequencies:
Frequency distributions or histograms are good summaries of data because they show the variability
in the observed outcomes very clearly. Sometimes, however, we might prefer a singlenumber sum
10
"Study without desire spoils the memory, and it retains nothing that it takes in." Leonardo da Vinci
93
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document94
1
2
3
4
5
6
0
1
2
3
4
5
6
7
8
Figure 7.1: Frequency Histogram
mary. The most common such summary is the average, or arithmetic mean of the outcomes. The mean
of
n
outcomes
x
1
,...,x
n
for a random variable
X
is
n
P
i
=1
x
i
/n
, and is denoted by
¯
x
.
The arithmetic
mean for the example above can be calculated as
(6
×
1) + (8
×
2) + (5
×
3) + (3
×
4) + (2
×
5) + (1
×
6)
25
=
65
25
=2
.
60
That is, there was an average of 2.6 persons per car. A set of observed outcomes
x
1
n
for a
random variable
X
is termed a
sample
in probability and statistics. To reﬂect the fact that this is the
average for a particular sample, we refer to it as the
sample mean
. Unless somebody deliberately
“cooked” the study, we would not expect to get precisely the same sample mean if we repeated it
another time. Note also that
¯
x
is not in general an integer, even though
X
is.
Two other common summary statistics are the median and mode.
Deﬁnition 12
The
median
of a sample is a value such that half the results are below it and half above
it, when the results are arranged in numerical order.
If these 25 results were written in order, the
13
th
outcome would be a 2. So the median is 2. By
convention, we go half way between the middle two values if there are an even number of observations.
Deﬁnition 13
The
mode
of the sample is the value which occurs most often. In this case the mode is 2.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '07
 A
 Probability theory, µ, CN, PROBABILITY FUNCTION

Click to edit the document details