-Samples produce different b
0
and b
1
, but
sampling distribution model of regression slope
is centered at β1 (the slope of idealized regression line)
-Standardize Slopes by subtracting the model mean and dividing by SE
Student’s t with n-2 df
t (df = n – 2) = (b
1
- β1) / SE (b
1
)
Usual H0: β1 = 0, because if slope = 0, there is no linear association btwn 2 variables
CI for regression slope
: b
1
± t* (df = n – 2) x SE (b
1
)
***Regression estimates Rate of Change—CANNOT TELL CAUSATION
***
SE increases, t-score decreases (less significant results)
-Very low P-value means association you see in the data is unlikely to have occurred by chance
reject H0
-Can Also predict mean y value for all cases
OR y-value for a particular case
**MORE PRECISION PREDICTING MEANS
(difference is all in SE—
the farther from center of our data, the LESS precise—
SE for INDIVIDUAL predicted value is LARGER than SE for MEAN
—extra variability)
Predicting for new individual (not part of original data set)
“x sub new” = xv
ŷ v = b
0
+ b
1
x v
CI for mean predicted value
: ŷv ± t* (df = n – 2) x SE (μv)
“mean y value for all with that value for x”
Narrower CI and smaller SE
Prediction Interval for individual
:
ŷv ± t* (df = n – 2) x SE (ŷv)
“exact y value for particular individual with that x”
Wider CI and larger SE
A CI—has 95% chance of capturing the true y value of a randomly selected individual with the given x-value
Watch OUT
1. High influence points & outliers
2. Extrapolation
3. Make sure errors are Normal
4. Watch out for plot thickening
5. Don’t fit linear regression to data that aren’t straight
Sample (statistics):
Latin Letters
ybar
Mean
μ
Population (parameters):
Greek Letters
S
Stand Dev
σ
R
Correlation
ρ
Phat
proportion
p
Categorical Data
– Frequency Tables, Bar & Pie charts, Contingency Tables
Quantitative Data
– Histograms, Stem & leaf, dot plot, boxplot, scatterplots
Marginal Distribution
–distribution of either variable alone; also the counts or percentages are the totals found in the margins (last row / column) of table
Data
– information w/ a context (Who, what
W’s of data / When, where, Why
good to have)
Who – called “
cases
”
MAKE A PICTURE w/ data
Relative Frequencies / Proportions depend on whether taken from column total, row total, or grand total (marginal total)
5 Number S
ummary
– min, max, Q1, Q3, median
4
Measures of Spread
Standard Dev, IQR
Measures of Position
Mean, Median, Quartiles
Independence
– in contingency table, when the distribution of
one
variable is the same for all categories of another
Simpson’s paradox
– when averages are taken across different groups (not related), they may appear contradictory.
CHANGING CENTER
adding a constant to each value adds same amount to Mean, Median, and Quartiles, but DOES NOT change Stand Dev or IQR
CHANGING SCALE
multiplying each data value by a constant changes the measures of position (Mean, Median, Quartiles), and measures of spread too
HISTOGRAMS:
Each column in histogram is a
bin
—represents a case
1. Shape
(General Trend) – a. Unimodal / Bimodal / Multimodal