h
Stat 201  Exam
2 
Topics
List
Spring
2010
My
answers/thoughts
to her guide are in
italics.
If
there's
any
incorrect information, email
me
at
[email protected]
Chapter 8: Linear
Regression
Four
conditions
for valid
regression
o
Quantitative
variable
s
...
•
does not work for categorical
data
o
Straightenough conditio
n
...
•
Ensure linearity, meaning they have an
association/correlation.
Lower pvalues
indicate
better lines; higher R squared indicate better
lines.
o
Outlier conditio
n
...
•
make sure outliers don't have
a'1Y
influence(change the
fit
line noticeably)
or
leverage(significant
'x' value difference) on the fit
line
o
"Does
the plot
thicken?" condition
•
Not referenced
a lot in the slides, but there
is
an inference that states the purpose
is
to
check the residuals to back up your original conclusions from the
scatterplot
What is
special
about the
regression
("least
squares")
line
compared
to any other line
drawn
through
the
data
o
The "least squares" line
is
special because the sum of all the residuals will add to zero (or
as
close
as
possible
)
Given JMP output, write out the
regression
model with actual
variable names
o
For example, fat(!')=
6.8
+
.97
*protein,
is
the model, so you're answer would
be: For
every
gram of protein, the model says you'll have
.
97
additional grams of
fat.
It also
states
that when
there
is
0
grams of protein, you'll have
6.8
grams of fat.
Interpret regression coefficients
(bO and
bl
)
o
Using the formula above,
y
(!
'
)
=
b(O)
+
b(l)
*
x, you can see that b(O)
is
theyintercept and
b(l)
is
the slope of the regression
model.
Know when
bO
has no logical
interpretation
o
For example, GasMileage
=
4.2
+
2.87
*
FuelAdditive(mL). You could say that
for
every
additional mL of fuel additive, you'll get an
additional2.87
mpg; however, would it
make
sense
to say that you only get 4.2mpg with no fuel
additive?...the
model I've used
is
completely made
up, but the idea
is
the same as long as the numbers don't reflect a
possible
scenario.
Know the
difference between
y
andyhat
o
Actual observed values= y; predicted values= y(!'), or
yhat
Be able to use a
regression equation
to make an estimate of y for a given value of
x
o
In the
fat
&
protein regression model, let's say
we're
told a sandwich has lOg of
protein.
Thereforey
=
6.8
+
.97
*
10 =16.5. According to this regression model, we believe
a
sandwich with lOg of protein will have 16.5g of fat.
Be able to
calculate
and
interpret
a
residual
o
Residuals are calculated by subtracting the the actual value from the predicted value, or
y
y
(
A
).
If the residual
is
positive, it
is
said the model underestimated the value. If the residual
is
negative,
it
is
said the model overestimated the
value.
Be able to
interpret
a
residuals
plot and spot
problems
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
o
Problems in a residual mqy be outliers or
trend
s
...
thi
s
alternate view of a scatterplot
can
show trends or problems not visible before. The hope in these residual plots
is
to find
nothing.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '09
 Linear Regression, Regression Analysis, Standard Deviation, Probability theory, regression model

Click to edit the document details