183
Stat 250 Gunderson Lecture Notes
Chapters 3 and 14: Regression Analysis
The invalid assumption that correlation implies cause is probably among the two or
three most serious and common errors of human reasoning.
‐
‐
Stephen Jay Gould, The Mismeasure of Man
Describing and assessing the significance of
relationships between variables
is very important
in research. We will first learn how to do this in the case when the two variables are
quantitative. Quantitative variables have numerical values that can be ordered according to
those values. We will study the material from Chapters 3 and 14 together. We will merge the
two chapters together into one overall discussion of these ideas.
Main idea
We wish to study the relationship between two quantitative variables.
Generally one variable is the ____
RESPONSE
______
variable
, denoted by
y
.
This variable measures the outcome of the study
and is also called the ______
DEPENDENT
_______ variable.
(thought to depend on x)
The other variable is the ____
EXPLANATORY
____
variable
, denoted by
x
.
It is the variable that is thought to explain the changes we see in the response variable. The
explanatory variable is also called the ____
INDEPENDENT
__ variable.
The first step in examining the relationship is to use a graph
‐
a
scatterplot
‐
to display the
relationship. We will look for an overall pattern and see if there are any departures from this
overall pattern.
If a
linear
relationship appears to be reasonable from the scatterplot, we will take the next step
of finding a model (an equation of a line) to summarize the relationship. The resulting equation
may be used for predicting the response for various values of the explanatory variable. If
certain assumptions hold, we can assess the significance of the linear relationship and make
some confidence intervals for our estimations and predictions.
Let's begin with an example that we will carry throughout our discussions.
This
preview
has intentionally blurred sections.
Sign up to view the full version.
184
Graphing the Relationship: Exam 2 versus Final Scores
How well does the exam 2 score for a Stats 350 student predict their final exam score?
Below are the scores for a random sample of
n
= 6 students from a previous term.
Exam 2 Score
33
65
44
64
60
40
Final Exam Score
53
80
78
93
88
58
Response
(dependent) variable
y
=
FINAL EXAM SCORE
.
Explanatory
(independent) variable
x
=
___
EXAM 2 SCORE
.
Step 1: Examine the data graphically with a scatterplot.
Add the points to the scatterplot below:
Interpret the scatterplot
in terms of ...
overall form
(is the average pattern look like a straight line or is it curved?)
direction
of association (positive or negative)
strength
of association (how much do the points vary around the average pattern?)
any
deviations
from the overall form?
None here!
x =
y =
185
Describing a Linear Relationship with a Regression Line
Regression analysis
is the area of statistics used to examine the relationship between a
quantitative response variable and one or more explanatory variables. A key element is the
estimation of an equation
that describes how, on average, the response variable is related to
the explanatory variables. A regression equation can also be used to make predictions.