Section 1.2
Describing Distributions with Numbers
Section 1.2
1
Recall  Exploratory Data Analysis
•
Use some statistical tools to describe some of the
main features
of the dataset. This process is
known as
exploratory data analysis
.
•
Begin by examining each variable by itself
(Chapter 1).
Then look at relationships between variables
(Chapter 2).
•
Begin by visualizing important features with
graphs (Section 1.1).
Then focus on speciﬁcs by using numerical
summaries (Section 1.2).
Section 1.2
2
Recall  Describing Distributions
In section 1.1, we deﬁned some aspects of the
overall pattern
of
the distribution of a quantitative variable and how to identify
them on a graph.
In section 1.2, we will specify some numerical descriptions.
•
Shape
•
Graphical: Symmetric or skewed
•
Numerical?
•
Center
•
Graphical: Histogram peak
•
Numerical: Mean and median
•
Spread
•
Graphical: Extent of histogram classes
•
Numerical: Standard deviation, range, quartiles
Section 1.2
3
Measuring Center  Mean
•
The
mean
is the arithmetic average of a
distribution. The mean is the most commonly
used measure of center.
•
To ﬁnd the mean of a set of observations,
compute the sum of their values and divide by the
number of observations.
¯
x
=
x
1
+
x
2
+
. . .
+
x
n
n
¯
x
=
1
n
n
∑
i
=
1
x
i
Section 1.2
4
Notation
•
The
∑
symbol is the Greek capital letter sigma. In
mathematics, it is commonly used to represent
summation
(add all elements together).
•
The subscripts
i
=
1
,
2
, . . . ,
n
on the
x
i
are simply
a way of identifying each individual distinctly. The
subscripts do not necessarily indicate any kind of
order.
•
The overbar
¯
x
is used to symbolize the mean of
x
, the variable of interest. Often, the mean is
referred to as “xbar.”
Section 1.2
5
Measuring Center  Median
•
The
median
M
of a distribution is the number such that half
of the observations are smaller and half are larger. This can
be thought of as the
midpoint
of the distribution.
•
How do we ﬁnd the median?
1. Arrange the observations from smallest to largest.
2. If the number of observations
n
is odd, then the median is the
center observation in the ordered list. Count
(
n
+
1
)
/
2
observations from the bottom to ﬁnd it.
3. If the number of observations
n
is even, then the median is the
average of the center two observations in the ordered list. Count
n
/
2 and
(
n
/
2
)+
1 observations from the bottom to ﬁnd these.
Section 1.2
6
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentExample  200506 Tuition
200506 Resident Tuition at Land Grant Universities
Iowa State
5,634
Connecticut
7,912
SUNY Binghamton
5,840
Michigan State
7,923
SUNY Buffalo
5,863
Ohio State
8,082
SUNY Albany
5,880
UC Davis
8,129
Wisconsin
6,220
Illinois
8,670
Texas A&M
6,234
Minnesota
8,822
Purdue
6,458
Michigan
9,213
Indiana
7,112
Massachusetts
9,456
Virginia
7,370
Vermont
10,748
California
7,434
Penn State
11,024
Texas
7,438
Section 1.2
7
Measuring Center  Tuition
200506 Resident Tuition
Tuition ($)
Count
6000
7000
8000
9000
10000
11000
12000
0
1
2
3
4
5
6
•
What is the mean for the
tuition dataset?
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 ABBEY
 Standard Deviation, Quartile, · Center

Click to edit the document details