This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: September 9, 2010 The assumptions of linear regression Introduction Multiple regression is used to estimate relationships between a response (dependent) variable and one or more explanatory (independent) variables. The multiple regression model with k predictor variables , ) 2 , ( NID ~ i ki X k ..... i 2 X 2 i 1 X 1 i y + + + + + = is summarized in matrix notation as + = X y ~ NID (0 2 ) where y is an (n x 1) vector for the response variable, X is an (n x (k + 1)) matrix of k predictor variables and a column of 1s for the intercept, is a ((k +1) x 1) vector of unknown population parameters, and is an (n x 1) vector of population errors, or departures of data points from the prediction equation. The matrices are as follows: y = n y y y 2 1 X = kn n n k k X X X X X X X X X 2 1 2 22 12 1 21 11 1 1 1 = k 1 = k 2 1 . If we were to regress the y vector against the Xs for the whole population, we would obtain the population prediction equation ki X k ..... i 2 X 2 i 1 X 1 i y + + + + = where k , 2 , 1 , are all population parameters . The parameter 1 in the context of the equation ki X k ..... i 2 X 2 i 1 X 1 i y + + + + = is shorthand notation for the relationship between y and X 1 , controlling for X 2 ,. . . .X k . In brief: k X X X yX ...... , 1 1 3 2  = K y . ,.. 3 , 2 1  = The vector of errors, , is the difference between the observed value of i y and the value of y predicted by the equation ( i y ); that is, i y i y i = . One characteristic of population parameters is that the values k , 2 , 1 , are constant for a specific population (i.e., a closed block of people, communities or organizations at a fixed point in time). This is not true of samples. If we select a sample from the population, the resulting regression prediction equation is of the form: ki X k b ..... i 2 X 2 b i 1 X 1 b b i y + + + + = 1 or equivalently in matrix notation, Xb y = where y is an (n x 1) vector of predicted values of y and the vector of errors is ) i y i y ( i e = . But this time, because we have a sample from the population rather than the whole population, the elements of the vector ] [ 2 1 k T b b b b b = are statistics used to estimate the parameters ] , , , [ 2 1 k T = ....
View Full
Document
 Spring '08
 Staff
 Linear Regression

Click to edit the document details