LINEAR REGRESSION
You learned how a scatterplot can be used to determine if there is a relationship between two
quantitative variables
X
and
Y
.
In the simplest examples, the points in the scatterplot suggest that
X
and
Y
have a linear relationship, and the correlation quantifies the strength and direction of that
linear relationship.
In linear regression, we describe the linear relationship with an equation that
describes how the value of the dependent or response
variable
Y
depends linearly on the
independent, explanatory, or predictor
variable
X
.
The regression equation can then be used to
predict the value of
Y
for a given specified value of
X
.
Suppose that we want to predict a person’s height from his or her arm span.
In this case, the height
Y
is the response and the arm span
X
is the predictor.
We saw that there is a strong positive linear
relationship between
X
and
Y
.
The next step is to model the relationship with an equation of a line.
LeastSquares Regression Equation and Line
:
We seek the equation of the line that best fits the
data points in the leastsquares sense.
The fit is determined by the degree to which the points
deviate from the line in the vertical (
Y
) direction.
Think of the vertical
deviation of a point from the
line as a fitting or prediction error
and the square of the deviation as a squared error.
The sum of the
squared errors (
SSE
) then serves as a measure of fit of the line to the data; the smaller the
SSE
, the
better is the fit.
Therefore, we seek the line with the smallest possible
SSE
and that is the least
squares regression line
.
(It turns out that there is only one such line.)
Calculus techniques are used
to find the formulas for the slope and
y
intercept that determine this unique regression line.
Note
: The fitting errors for the leastsquares line are called residuals
.
So a residual is
ˆ
Y
Y

.
It can
be shown that the sum of the residuals is always 0.
But the sum of the squares of the residuals is
precisely the
SSE
, which is only 0 when all the points fall exactly on a straight line.
From the data, we calculate the slope
b
and the yintercept
a
of the line using these two formulas:
Y
X
S
b
r
S
=
(the slope).
a
Y
bX
=

(the
y
intercept).
The leastsquares regression equation
is:
ˆ
Y
a
bX
=
+
.
To predict
the value of
Y
for a given specified
value of
X
, simply plug the value of
X
into the regression equation and calculate
ˆ
Y
.
Example
: Recall that for the height and arm span data, the correlation was
.
972.
It was actually
.
97178.
The table below displays the means and standard deviations of the two variables.
Descriptive Statistics
64.6525
8.35181
40
63.2875
8.90846
40
HEIGHT
ARMSPAN
Mean
Std.
Deviation
N
So the slope is
8.35181
.97178
.91106
.911
8.90846
b
�
�
=
=
�
�
�
�
;
.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 sugahara
 Linear Regression, Regression Analysis, regression line, arm span

Click to edit the document details