H
OLLOMAN
’
S
AP
S
TATISTICS
APS
N
OTES
01,
P
AGE
1
OF
13
Exploratory Data Analysis
Variables
Definition
The big idea of statistics is that we have a question about some large group (the population)
that can be answered through measurement. That characteristic which we measure is called the
variable
. Perhaps we want to know the average mass of a kumquat—
mass
is the variable.
Perhaps we want to know the proportion of pink VW's in the U.S.—
color
is the variable. The
things that we measure (kumquats, VW's) are called
individuals
. The collection of all
individuals is called the
population
.
Quantitative vs. Qualitative
Variables come in two basic categories—
quantitative
and
qualitative
(this isn't the only way
to classify variables—just the only distinction that's important to us).
Quantitative variables measure
quantities
—mass, time, charge, number, length, etc.
Qualitative variables measure
qualities
—color, flavor, opinions, etc.
Discrete vs. Continuous
Quantitative variables can be broken down into two further categories—
discrete
and
continuous
.
Discrete variables have gaps in their possible values—they can only take on discrete (certain)
values. The set of Integers (
ℤ
) is an example of a discrete set. Discrete variables will almost
always measure the
number
of some thing—the number of houses; the number of people; the
number of cars, etc.
Continuous variables have no gaps in their possible values. The set of Real numbers (
ℝ
) is an
example of a continuous set. Continuous variables will typically measure physical phenomena—
mass, length, volume, ratio, etc.
The Distribution of a Variable
Definition
The
Distribution of a Variable
is a list (chart, picture—something) that shows what values
the variable can take, and how often it takes each value. It turns out that most of the calculations
that we'll make this year depend on knowing some things about the distributions of various
variables.
Main Points
There are three main features of a distribution that we want to know—
center
,
spread
and
shape
.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
H
OLLOMAN
’
S
AP
S
TATISTICS
APS
N
OTES
01,
P
AGE
2
OF
13
The center can be described as the typical value of the variable; or the most common value;
or…well, there are lots of ways to say this. More on this later.
The spread can be described as the range of possible values; how wide is the distribution?
Again, there are many ways to say this. Again, more on this later.
The shape is a feature that can only be seen. There are two categories of shape that are
important to us:
symmetric
and
skew
. Symmetric is selfexplanatory. Skew means that one end
is larger (taller) than the other. The side that is smaller is the direction of the skew. For example,
the distribution in Figure 1 is Skew Right, while Figure 2 shows a fairly symmetric distribution.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '08
 O
 Statistics, Standard Deviation, Mean, HOLLOMAN’S AP, APS NOTES, HOLLOMAN

Click to edit the document details