 Chapter 2

Categorical/Quantitative variables

Frequency tables

Marginal distribution

Conditional distribution

Bar graph

Pie chart
Categorical variable

Outcomes fall into different categories (%)

Nominal
: no natural ordering (arts vs. science)  no order to it

Ordinal
: has a clear ordering (A vs. B vs. C vs. F)
Quantitative variable

Outcomes can be measured on a numerical scale (mean, median,..)
Frequency table/ Relative frequency table

Display all categories of a
single
categorical variable with relative frequencies

Their percentages should add up to 100%
Bar Chart

Each bar represents a category
Pie Chart

Can get confusing!
Contingency Table

Shows the relationship between 2 categorical variable

Displays two categorical variables simultaneously

Marginal total
→
subtotal (each column and row)

Grand total
→
total of everything (both column and row)
*note: always compare % not counts as the total is not equal!
Marginal distribution

Displaying distribution of one variable only (only looking at one set of subtotal on a
diff table)
Conditional distribution

Displaying a distribution of one variable, satisfying a condition of other variables
*note: Not a relationship
→
all the % should be the same
A relationship
→
different % between variables
Simpson’s Paradox

A trend appears in several different groups of data but disappears when these
groups are combined
 Chapter 3

Histogram

Distributions of data

Stem and leaf plots

Box plot

mean/median

Measures of spread

Variance

Standard deviation (SD)

Interquartile range (IQR)

Five number summary

Outlier formula
Histogram

Graphical display of the distribution of quantitative variables

Frequency is among Yaxis
Stem and leaf display

Useful display of quantitative data

Better for smaller data sets

Stem: one or more
leading
digits

Leaf: single trailing digit
Distribution for quantitative data
1.
Shape:

Unimodal, bimodal, multimodal (# of peak)

Symmetric or skewed?

Skewed to the right: positively skewed; long righthand tail

Skewed to the left: negatively skewed; long lefthand tail

Any outliers? (unusually large or small observations)
2.
Center
3.
Spread
Measures of Centre
Mean
Median

Middle number observed in data

Divides data into two equal parts
**Odd: 1,3,5,...

(middle number)
→
number equals the # of observations to look at
**Even: 2,4,6,...

Average of both middle numbers
Measures of Spread
Range
Interquartile range (IQR)

The pth percentile (0<p<100)

p% of the observations fall below it, and (100p)% fall above it

Quartiles (Q1, Q2, Q3)

Q1 = first quartile = 25th percentile

Q2 = second quartile = 50th percentile = median

Q3 = third quartile = 75th percentile

IQR = Q3  Q1
*note: third quantile is the median of bottom half of data
Variance
Standard deviation (SD)
Properties of Variance and SD

Nonnegative

The more spread out the observations, the bigger the variance and Sd

SD has the same
unit
as observations

If observations are all equal, both variance and SD are equal to 0
The 5number Summary and boxplots

Include: minimum, Q1, Q2, Q3, maximum

Boxplot provides a graphical display of five number summary

Identifying
outliers
in box plots:
