Categorical Variables #s are just labels and values are arbitrary
Quantitative Variable measured units (income, height, weight, age)
Ordial values are not categorical but not quite quantitative
Context Ideally tells Who What How Where When Why
Data systematically recorded information, whether numbers or labels, together
with its context
Case an individual about whom or which we have data
Variable holds information about the same characteristic for many cases
Chap. 3
Simpson’s Paradox when averages are taken across different groups, they can
appear to be contradictory
Frequency table lists the categories in a categorical variable and gives the counts or
%s of observations of each category.
Distribution gives the possible values and relative frequency of the variable
Area Principle each data value should be represented by the same amount of area
Contingency table displays counts and percentages of individuals falling into
named categories on 2 or more variables.
Marginal distribution In a contingency table, the distribution of either variable
alone. The counts or percentages are the totals found in the margins of the table.
Conditional distribution restricting the Who to consider only a smaller group of
individuals.
Independence if the conditional distribution of one variable is the same for each
category of the other.
Chap. 4
Histogram uses adjacent bars to show the distribution of values in a quantitative
variable. Each bar represents the frequency of values fallings in an interval of
values.
Stem and Leaf shows quantitative data values in a way that sketches the
distribution of the data.
Dotplot graphs a dot for each case against a single axis.
Shape to describe the shape,
single v. multiple modes
symmetry v skewness
outliers, clusters, or gaps
Mode a hump or local high point in the shape of the distribution of a variable.
Uniformroughly flat distribution
Chap. 5
IQR difference between quartiles
Upper Q Lower Q
IQR a reasonable summary of spread, but because it only uses 2 quartiles of data, it
ignores much of the information how individual values vary.
Variance the sum of squared deviations from the mean, divided by count minus 1
*Formula on Back
Standard Deviationthe square root of the variance
*Formula on Back
Center the mean or median
Median the middle value with half of the data above and half below.
Mean the sum of all the data values divided by the count
Spread standard deviation, IQR, and range.
Range.Range = MaxMin
5# Summary the min, the max, Q1, Q3, and the median
Boxplots displays the 5# summary as a central box with whiskers.
SHAPE, CENTER, and SPREAD
If shape is skewed, report the median & IQR. The fact that the mean and median
don’t agree is a sign that the distribution may be skewed.
If shape is symmetric report the mean and standard deviation and possibly the
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '06
 VELLEMANP
 Standard Deviation, representative

Click to edit the document details