This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: September 9, 2010 The assumptions of linear regression Introduction Multiple regression is used to estimate relationships between a response (dependent) variable and one or more explanatory (independent) variables. The multiple regression model with k predictor variables , ) 2 , ( NID ~ i ki X k ..... i 2 X 2 i 1 X 1 i y σ ε ε + β + + β + β + β = is summarized in matrix notation as ε + β = X y ε ~ NID (0 σ 2 ) where y is an (n x 1) vector for the response variable, X is an (n x (k + 1)) matrix of k predictor variables and a column of 1’s for the intercept, β is a ((k +1) x 1) vector of unknown population parameters, and ε is an (n x 1) vector of population errors, or departures of data points from the prediction equation. The matrices are as follows: y = n y y y 2 1 X = kn n n k k X X X X X X X X X 2 1 2 22 12 1 21 11 1 1 1 β = k β β β 1 ε = k ε ε ε 2 1 . If we were to regress the y – vector against the X’s for the whole population, we would obtain the population prediction equation ki X k ..... i 2 X 2 i 1 X 1 i y ˆ β + + β + β + β = where k , 2 , 1 , β β β β are all population parameters . The parameter β 1 in the context of the equation ki X k ..... i 2 X 2 i 1 X 1 i y ˆ β + + β + β + β = is shorthand notation for the relationship between y and X 1 , controlling for X 2 ,. . . .X k . In brief: k X X X yX ...... , 1 1 3 2 | β β = K y . ,.. 3 , 2 1 | β = The vector of errors, ε, is the difference between the observed value of i y and the value of y predicted by the equation ( i y ˆ ); that is, i y ˆ i y i- = ε . One characteristic of population parameters is that the values k , 2 , 1 , β β β β are constant for a specific population (i.e., a closed block of people, communities or organizations at a fixed point in time). This is not true of samples. If we select a sample from the population, the resulting regression prediction equation is of the form: ki X k b ..... i 2 X 2 b i 1 X 1 b b i y ˆ + + + + = 1 or equivalently in matrix notation, Xb y ˆ = where y ˆ is an (n x 1) vector of predicted values of y and the vector of errors is ) i y ˆ i y ( i e- = . But this time, because we have a sample from the population rather than the whole population, the elements of the vector ] [ 2 1 k T b b b b b = are statistics used to estimate the parameters ] , , , [ 2 1 k T β β β β β = ....
View Full Document
This note was uploaded on 02/18/2012 for the course STAT 404 taught by Professor Staff during the Spring '08 term at Iowa State.
- Spring '08
- Linear Regression