‐ This is true even if the word
average
is not in the question (e.g., “do
college graduates earn more money than non‐ college graduates?”)
•
This is also true for many relational RQs
‐ E.g., Is there an association between mother’s education level and
students’ academic performance? Or does smoking cause lung cancer?
33
Most of our RQs are about the “
average
”
attitude/behavior and average effect

10/04/2019
12
Typically, we measure “average” attitude/behavior
with mean
1. Mean
•
The sum total of observed values divided by the
number of observations
•
Normally, the “mean value” of your sample gives you a
sense of what the sample is like
34
1
n
i
i
X
X
n
35
But sometimes mean gives you a
poor
sense of the
“average” value
“Bill Gates walks into a bar example”
Name
Income
Tom
$32,000
Larry
$36,000
Susan
$39,000
Paul
$41,000
Marcus
$45,000
Randy
$50,000
Sandy
$57,000
Tim
$60,000
Pam
$65,000
Jack
$80,000
What is the average (mean)
annual income for the group?
$50,500
Name
Income
Tom
$32,000
Larry
$36,000
Susan
$39,000
Paul
$41,000
Marcus
$45,000
Randy
$50,000
Sandy
$57,000
Tim
$60,000
Pam
$65,000
Jack
$80,000
Bill Gates
$1,000,000,000
What is the new average
(mean) income for the group?
> $100 million
Hence we need an alternative measure –
median
is
to the rescue
Median
•
Middle value
when data arranged in ascending or
descending order
•
If even data points ‐ average of two middle values
•
Is also called the
50th percentile
*Another situation where you can only use median but not mean is when you
work with rank order data (
ordinal scaled data such as income groups
)
36

10/04/2019
13
37
Back to the
“Bill Gates walks into a bar example”
Name
Income
Tom
$32,000
Larry
$36,000
Susan
$39,000
Paul
$41,000
Marcus
$45,000
Randy
$50,000
Sandy
$57,000
Tim
$60,000
Pam
$65,000
Jack
$80,000
Median income of the group
Median = ($45,000 + $50,000)/2
= $47,500
Name
Income
Tom
$32,000
Larry
$36,000
Susan
$39,000
Paul
$41,000
Marcus
$45,000
Randy
$50,000
Sandy
$57,000
Tim
$60,000
Pam
$65,000
Jack
$80,000
Bill Gates
$1,000,000,000
Median income of the new group
Median = $50,000
38
Application: why average (mean) income doesn’t represent “average
Joe’s” income?
Average (mean) household income: close to
70% of households make less than this
•
Outliers ‐ an observation that is well outside of the expected
range of values
•
Outliers are typically spotted by examining the distribution of
the variable
•
You need to be very careful with outliers because they can mess
up your mean calculation
•
There is no scientific way to determine how far is too far out.
Mostly a judgement call
39
Metric variables – outliers

10/04/2019
14
40
Metric data outliers – commute time to Caulfield
Statistics
Q6
N
Valid
107
Missin
g
0
Mean
38.21
Median
30.00
Std. Deviation
58.319
Minimum
5
Maximum
600
Q6
Frequency
Percent
Valid
Percent
Cumulative
Percent
Valid 5
5
4.7
4.7
4.7
8
1
.9
.9
5.6
10
6
5.6
5.6
11.2
15
10
9.3
9.3
20.6
20
16
15.0
15.0
35.5
25
8
7.5
7.5
43.0
30
18
16.8
16.8
59.8
32
1
.9
.9
60.7
35
4
3.7
3.7
64.5
38
1
.9
.9
65.4
40
10
9.3
9.3
74.8
45
6
5.6
5.6
80.4
50
2
1.9
1.9
82.2
53
2
1.9
1.9
84.1
60
11
10.3
10.3
94.4
70
2
1.9
1.9
96.3
90
2
1.9
1.9
98.1
120
1
.9
.9
99.1
600
1
.9
.9
100.0
Total
107
100.0
100.0
41
Metric data outliers – commute time to Caulfield
42
Mean is very sensitive to the existence of outlier
values, but not the median
Including outlier
Excluding outlier
Mean
38.2
32.9
Median
30.0
30.0
Example ‐ “Average” Commute time to Caulfield

10/04/2019
15
•
You need to make changes to the collected data for your
research
‐ For example, to create new ways to group your respondents for