Copyright 2018 president fellows of harvard college

This preview shows page 11 - 13 out of 17 pages.

Copyright © 2018 President & Fellows of Harvard College. This document is authorized for use only by Pradeepan parthiban and may not be reproduced, posted, ortransmitted without the express permission of Harvard Business School.
BUSINESS ANALYTICS MODULE 4 Single Variable Linear Regression Single Variable Linear Regression | Page 1 of 3 We use regression analysis for two primary purposes: Studying the magnitude and structure of a relationship between two variables. Forecasting a variable based on its relationship with another variable. The structure of the single variable linear regression line is ࠵?=a+bx. ࠵?is the expected value of y, the dependent variable, for a given value of x. xis the independent variable, the variable we are using to help us predict or better understand the dependent variable. ais the y-intercept, the point at which the regression line intersects the vertical axis. This is the value of ywhen the independent variable, x, is set equal to 0. bis the slope, the average change in the dependent variable y as the independent variable x increases by one. The true relationship between two variables is described by the equation y=α+βx+࠵?, where ࠵?is the error term y–y.The idealized equation that describes the true regression line is y=α+βx. We determine a point forecastby entering the desired value of x into the regression equation. We must be extremely cautious about using regression to forecast for values outside of the historically observed range of the independent variable (x-values). Instead of predicting a single point, we can construct a prediction interval, an interval around the point forecast that is likely to contain, for example, the actual selling price of a house of a given size. The width of a prediction interval varies based on the standard deviation of the regression (the standard error of the regression), the desired level of confidence, and the location of the x-value of interest in relation to the historical values of the independent variable. It is important to evaluate several metrics in order to determine whether a single variable linear regression model is a good fit for a data set, rather than looking at single metrics in isolation. R2 measures the percent of total variation in the dependent variable, y, that is explained by the regression line. R2 = !"#$"%$&’!"#$%&’!(!"!"#!"#!"$$%&’!"#$!"#$%!"#$"%$&’= !"#$"%%&’(!"#!"!"#$%&’!"#$%!"#!"!"#$%&’0R21 For a single variable linear regression, R2 is equal to the square of the correlation coefficient. In addition to analyzing R2, we must test whether the relationship between the dependent and independent variable is significant and whether the linear model is a good fit for the data. We do this by analyzing the p-value (or confidence interval) associated with the independent variable and the regression’s residual plot.

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture