10/6/2020Multiple Linear Regression1/5Multiple Linear RegressionWenyuan TangLecture 10Multiple linear regressionThe multiple linear regression model takes the formwhere represents the th predictor and quantifies the association between that variable and the response.We interpret as the averageeffect on of a one unit increase in , holding all other predicators fixed.Given the training data , defineAgain, we use the least squares approach to estimate the coefficients:Solving the optimization problemwe obtain the least squares coefficient estimatesSince the RSS always decreases as more variables are added to the model, the always increases as morevariables are added. If we use the to select the best model, we will always end up with a model involving all ofthe variables (overfitting). The problem is that a high indicates a model with a low training error, whereas wewish to choose a model that has a low test error. Therefore, we introduce the adjustedstatistic:The adjusted is always smaller than . The adjusted can be negative. The adjusted increases onlywhen the decrease in RSS (due to the inclusion of a new variable) is more than one would expect to see bychance. In theory, the model with the largest adjusted will have only correct variables and no noise variables.
10/6/2020Multiple Linear Regression2/5Extensions of the linear modelThe assumptions in the standard linear regression model can be relaxed:Removing the additiveassumption. Consider the standard linear regression model with two variables:In addition to and which represent the main effects, we can extend the model by including aninteraction termwhich represent the interaction effect:It is sometimes the case that an interaction term has a very small -value, but the associated main effectsdo not. The hierarchical principlestates that if we include an interaction in a model, we should also includethe main effects, even if the -values associated with their coefficients are not significant.Removing the linearassumption. We can extend the linear model to accommodate nonlinear relationshipsthrough transformations of quantitative inputs, such as log, square-root, or square. For example, we cancapture a quadraticrelationship by using polynomial regression:which is still a linear model by considering and .