STA309 Exam I
Examining distribution with graphs:
•
Variables:
the characteristics of the individuals.
Individuals may be people or objects.
A variable can take different values for different individuals.
o
Quantitative variables:
takes numerical values for which arithmetic makes sense.
o
Categorical variables:
record which category a person or thing falls into.
•
Distributions:
Describes a variable’s pattern of variation. Tells the values & how often they occur.
•
Categorical variables
:
Distribution lists the categories and gives either the count or the percent of individuals in each category:
•
Quantitative variables
o
Histograms
: Divide the range of data into classes of equal width. Count the number of observations in each class. Rows = cases; columns = variables.
Use excel only
for shape; data very wrong; esp. 1
st
interval.
skewness
(+) right skewed; () left skewed; > 1 pretty skewed
o
Stem plots
: Separate each observation into a Stem  all but final digit Leaf – the final digit Write the stems from top (smallest) to bottom (largest) Write each leaf to
the right of the stem in increasing order.
Advantages of Stem plots: Used for small data sets. Quicker to make. Presents more detailed information.
o
Time plots
: Illustrates measurements taken over time Xaxis  time Yaxis measurement. Used to describe trends, seasonality, fluctuations, and cycles
•
Examining distributions
:
Look for the overall pattern and striking deviations from the pattern (such as outliers)
o
Use histograms and stemplots to describe:
Shape:
Symmetric
 right and left sides are approximately mirror images (don't expect perfection in real data);
Skewed
 one tail extends longer than the
other side.
Right skewed
long right (quite common);
Bimodal:
two humps (ex. Old Faithful’s eruptions)
Describing distributions with Numbers:
•
Comparing median and mean:
The mean is influenced by extreme observations and the median is not.
Median is said to be
restrictive
b/c it is not affected much by outliers.
Symmetric
mean = median; otherwise, median better measure of center.
The median is said to be “resistant”.
•
Measuring the Spread:
o
Quartiles:
The
p
th percentile of a distribution is the value such that
p
percent of the observations fall at or below it. The median
is
just the 50
th
percentile, the first quartile is the 25th percentile, and the third quartile is the 75th percentile.
IQR
= range of the middle half of the data
IQR = Q3  Q1
;
IQR is not affected by outliers
Standard deviation:
The standard deviation is most common measure of spread.
The standard deviation has the
same units as the original measurements.
Standard deviation is strongly influenced by outliers.
•
Summarizing a Distribution:
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '07
 Gemberling
 Normal Distribution, Standard Deviation, Bimodal, Old Faithful

Click to edit the document details