Lesson 8: More on Shapes and the Normal Distribution
Motivation and objective
: The shape of data is an important aspect in statistical analysis. It will
tell us what measures of center and spread are best to use and what analysis procedure can be
used. In this lesson, we will discuss 1) some more ideas relating to measures of center and spread,
2) how shape influences which measures of center and spread are best to use to summarize the
data, 3) what a density curve is and properties of density curve, and 4) the normal density curve
(which is the most common shape of data) and properties of the normal density curve.
More on Measures of Spread and Center
1.
Effect of shifting or rescaling data
:
Go to page 6-4 in the Lesson Book
:
Learn About Changing the Baseline
2.
How shape influences which measures of center and spread to use
:
Go to page 6-3 in the Lesson Book
:
Work with Dotplots to Compare Centers and Spreads
Notes, comments, and other information:
In previous lessons, some ActivStats tutorials mentioned the word
resistant
.
Question:
List 2 situations when the median is a better measure of the center than the mean.
Answer:
1) when there are outliers and 2) when the data are skewed. The mean is affected
more than the median by outliers and skewed data. With skewed data and outliers,
the mean is “pulled” in the direction of the skewness or outliers. Because the
median is not affected (or not affected as much), we say the median is
resistant
to
skewed data and outliers.
Example 8.1
:
Calculate the mean and median of the following 5 numbers: 1, 1, 1, 1, 1000. Notice how the one
outlier (1000) is affecting the mean (mean = 200.8) while it’s not affecting the median at all (median =
1). The median does a better job describing the center of the data while the mean is not describing
any data in this data set.
A statistic is
resistant
if it is not affected by unusual values or skewness.
This
preview
has intentionally blurred sections.
Sign up to view the full version.
Question:
Describe a situation when neither the median nor the mean should be used to
describe the center of the data.
Answer:
when the data are symmetric and bimodal, such as this:
Example 8.2:
Suppose there are 6 people in a room with ages 8, 9, 10, 39, 40, and 41. Calculate the mean and the
median. Clearly, neither the mean (mean = 24.5) nor the median (median = 24.5) are representative
of any ages in the room. These ages form a bimodal shape with the first mode representing children
and the second mode representing adults. Further more, this bimodal distribution is symmetric. (Note
how the mean and median are equal.) With such a shape, neither the mean nor the median should
be used to describe the center as these summary measures could mislead people in thinking that
most people in the room are around 25 years old. A bimodal distribution is probably an indication of a
third categorical variable having an influence on an analysis. In this case, that third variable is “age”
which could be categorized into children and adults. So, instead of analyzing everyone together, it is
best to report a mean (or median) for the children (mean = median = 9 years old) and a mean (or
median) for the adults (mean = median = 40 years old).

This is the end of the preview.
Sign up
to
access the rest of the document.