Regression
We shall be looking at regression solely as a descriptive statistic: what is the line which lies 'closest' to a
given set of points. 'Closest' shall be defined as minimizing the sum of the squared y (vertical) distance of
the points from the regression line (which is more fully called the least squares regression line). We shall
not derive the formula, merely present it and then use it. Data is given as a set of points in the plane, i.e.,
as ordered pairs of x and y values.
•
Formulæ
•
Example
Formulæ
•
xbar =
*sum*
x(i)/n
This is just the
mean
of the x values.
•
ybar =
*sum*
y(i)/n
This is just the
mean
of the y values.
•
SS_xx =
*sum*
(x(i)(xbar))^2
This is sometimes written as SS_x (_ denotes a subscript following).
•
SS_yy =
*sum*
(y(i)(ybar))^2
This is sometimes written as SS_y.
•
SS_xy =
*sum*
(x(i)(xbar))(y(i)(ybar))
•
b_1 = (SS_xy)/(SS_xx) (_ denotes a subscript following)
•
b_0 = (ybar)  (b_1) × (xbar)
•
The least squares regression lilne is:
yhat (lowercase y with a caret circumflex) = (b_0) + (b_1) × x
Example
What is the least squares regression line for the data set {(1,1), (2,3), (4,6), (5,6)}?
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Summer '17
 Statistics, Least Squares, Linear Regression, Standard Deviation, Errors and residuals in statistics, squares regression line

Click to edit the document details