Regression
Scatterplot…now what?
We next attempt to establish some relationship between the explanatory variable
X
and the
response variable
Y.
A straight line of the form
Y
=
mX
+
c
, where
m
= gradient and
c
= yintercept, will be used to
relate the variables
X
and
Y
.
Having an equation relating
X
and
Y
would allow us to make predictions for the response
Y
given
some value of
X
.
: It is only natural that our prediction for the response of the mean of x should be mean of
y and so our curve should pass through the center
(centroid)
of the data set, (
x
,
y
).
Fitting a straight line to data is called Linear Regression Analysis and the straight line is referred
to as the
regression line
.
LSR is a computational method that finds the “best fit” curve (model) that passes through the
centroid (
x
,
y
) of the data set while having the smallest error between this estimated curve and
the actual data points.
To minimize error, the curve/line must pass as close as possible to points on the scatterplot. This
“closeness” is measured by looking at the residuals.
(Note
: Linear Regression (Straight lines) is the focus in this course.)
The difference between the observed data value of y and our predicted value for y, known as
y
ˆ
is
called the residual, denoted by e
. There is a residual for each y in our data.
Residual:
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '10
 LYNNDUDLEY
 Regression Analysis, regression line

Click to edit the document details