Now this brings us to
linear regression
. This is a way of
1. Determining if there is a linear relationship between 2 variables
2. What this relationship is
Correlation only tells us that there is a relation.
Regression also tells us
what that relation is.
It does not tell us whether there is
causality
however.
Liner regression is a way of drawing a straight line through the data and
finding the “best fit”.
By itself it cannot determine causality!
So for 2 variables X and Y,
we estimate:
Y =
a
+
bX
+
e
Where a is a constant
b is the coefficient on X or the slope of the linear relationship
e is the residuals or the amount by which the values vary from the predicted
values
Diagrammatically we draw a straight line through the scatter gram and
compare the predicted values with the actual values where the difference is
the error term.
We want to minimize these deviations from the line so that:
i
n
i
c
i
i
e
Y
Y
∑
∑
=

=
)
(
1
is minimized where Y
c
is the predicted or calculated value of Y for the value
of X.
This is usually done by squaring the error terms and minimizing the sum of
the squares – this is usually called the least squares method.
(Why do we use squared values to minimize?
What this does is put more
weight on the larger errors.
We will not prove it, but it can be proven that
the leastsquares method gives the best, linear, unbiased estimate of the
relationship.
It also lets us invoke the Pythagorean rule for rightangled
triangles)
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentSo we have our computed line for the relationship between Y and X.
If Yc
This is the end of the preview.
Sign up
to
access the rest of the document.
 Winter '11
 sinclair
 Regression Analysis, linear relationship, yc, independent variable Yc

Click to edit the document details