This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: LINEAR REGRESSION You learned how a scatterplot can be used to determine if there is a relationship between two quantitative variables X and Y . In the simplest examples, the points in the scatterplot suggest that X and Y have a linear relationship, and the correlation quantifies the strength and direction of that linear relationship. In linear regression, we describe the linear relationship with an equation that describes how the value of the dependent or response variable Y depends linearly on the independent, explanatory, or predictor variable X . The regression equation can then be used to predict the value of Y for a given specified value of X . Suppose that we want to predict a persons height from his or her arm span. In this case, the height Y is the response and the arm span X is the predictor. We saw that there is a strong positive linear relationship between X and Y . The next step is to model the relationship with an equation of a line. Least-Squares Regression Equation and Line : We seek the equation of the line that best fits the data points in the least-squares sense. The fit is determined by the degree to which the points deviate from the line in the vertical ( Y ) direction. Think of the vertical deviation of a point from the line as a fitting or prediction error and the square of the deviation as a squared error. The sum of the squared errors ( SSE ) then serves as a measure of fit of the line to the data; the smaller the SSE , the better is the fit. Therefore, we seek the line with the smallest possible SSE and that is the least- squares regression line . (It turns out that there is only one such line.) Calculus techniques are used to find the formulas for the slope and y-intercept that determine this unique regression line. Note : The fitting errors for the least-squares line are called residuals . So a residual is Y Y- . It can be shown that the sum of the residuals is always 0. But the sum of the squares of the residuals is precisely the SSE , which is only 0 when all the points fall exactly on a straight line. From the data, we calculate the slope b and the y-intercept a of the line using these two formulas: Y X S b r S = (the slope). a Y bX =- (the y-intercept). The least-squares regression equation is: Y a bX = + . To predict the value of Y for a given specified value of X , simply plug the value of X into the regression equation and calculate Y . Example : Recall that for the height and arm span data, the correlation was . 972. It was actually . 97178. The table below displays the means and standard deviations of the two variables. Descriptive Statistics 64.6525 8.35181 40 63.2875 8.90846 40 HEIGHT ARMSPAN Mean Std....
View Full Document
This note was uploaded on 07/08/2008 for the course STAT 1 taught by Professor Sugahara during the Spring '08 term at University of California, Berkeley.
- Spring '08
- Linear Regression