This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: September 9, 2010 The assumptions of linear regression Introduction Multiple regression is used to estimate relationships between a response (dependent) variable and one or more explanatory (independent) variables. The multiple regression model with k predictor variables , ) 2 , ( NID ~ i ki X k ..... i 2 X 2 i 1 X 1 i y + + + + + = is summarized in matrix notation as + = X y ~ NID (0 2 ) where y is an (n x 1) vector for the response variable, X is an (n x (k + 1)) matrix of k predictor variables and a column of 1s for the intercept, is a ((k +1) x 1) vector of unknown population parameters, and is an (n x 1) vector of population errors, or departures of data points from the prediction equation. The matrices are as follows: y = n y y y 2 1 X = kn n n k k X X X X X X X X X 2 1 2 22 12 1 21 11 1 1 1 = k 1 = k 2 1 . If we were to regress the y vector against the Xs for the whole population, we would obtain the population prediction equation ki X k ..... i 2 X 2 i 1 X 1 i y + + + + = where k , 2 , 1 , are all population parameters . The parameter 1 in the context of the equation ki X k ..... i 2 X 2 i 1 X 1 i y + + + + = is shorthand notation for the relationship between y and X 1 , controlling for X 2 ,. . . .X k . In brief: k X X X yX ...... , 1 1 3 2 | = K y . ,.. 3 , 2 1 | = The vector of errors, , is the difference between the observed value of i y and the value of y predicted by the equation ( i y ); that is, i y i y i- = . One characteristic of population parameters is that the values k , 2 , 1 , are constant for a specific population (i.e., a closed block of people, communities or organizations at a fixed point in time). This is not true of samples. If we select a sample from the population, the resulting regression prediction equation is of the form: ki X k b ..... i 2 X 2 b i 1 X 1 b b i y + + + + = 1 or equivalently in matrix notation, Xb y = where y is an (n x 1) vector of predicted values of y and the vector of errors is ) i y i y ( i e- = . But this time, because we have a sample from the population rather than the whole population, the elements of the vector ] [ 2 1 k T b b b b b = are statistics used to estimate the parameters ] , , , [ 2 1 k T = ....
View Full Document
- Spring '08
- Linear Regression