{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Handout 3 Assumptions of regression

# Handout 3 Assumptions of regression - September 9 2010 The...

This preview shows pages 1–3. Sign up to view the full content.

September 9, 2010 The assumptions of linear regression Introduction Multiple regression is used to estimate relationships between a response (dependent) variable and one or more explanatory (independent) variables. The multiple regression model with k predictor variables , ) 2 , 0 ( NID ~ i ki X k ..... i 2 X 2 i 1 X 1 0 i y σ ε ε + β + + β + β + β = is summarized in matrix notation as ε + β = X y ε ~ NID (0 σ 2 ) where y is an (n x 1) vector for the response variable, X is an (n x (k + 1)) matrix of k predictor variables and a column of 1’s for the intercept, β is a ((k +1) x 1) vector of unknown population parameters, and ε is an (n x 1) vector of population errors, or departures of data points from the prediction equation. The matrices are as follows: y = n y y y 2 1 X = kn n n k k X X X X X X X X X 2 1 2 22 12 1 21 11 1 1 1 β = k β β β 1 0 ε = k ε ε ε 2 1 . If we were to regress the y – vector against the X’s for the whole population, we would obtain the population prediction equation ki X k ..... i 2 X 2 i 1 X 1 0 i y ˆ β + + β + β + β = where k , 2 , 1 , 0 β β β β are all population parameters . The parameter β 1 in the context of the equation ki X k ..... i 2 X 2 i 1 X 1 0 i y ˆ β + + β + β + β = is shorthand notation for the relationship between y and X 1 , controlling for X 2 ,. . . .X k . In brief: k X X X yX ...... , 1 1 3 2 | β β = K y . ,.. 3 , 2 1 | β = The vector of errors, ε, is the difference between the observed value of i y and the value of y predicted by the equation ( i y ˆ ); that is, i y ˆ i y i - = ε . One characteristic of population parameters is that the values k , 2 , 1 , 0 β β β β are constant for a specific population (i.e., a closed block of people, communities or organizations at a fixed point in time). This is not true of samples. If we select a sample from the population, the resulting regression prediction equation is of the form: ki X k b ..... i 2 X 2 b i 1 X 1 b 0 b i y ˆ + + + + = 1

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
or equivalently in matrix notation, Xb y ˆ = where y ˆ is an (n x 1) vector of predicted values of y and the vector of errors is ) i y ˆ i y ( i e - = . But this time, because we have a sample from the population rather than the whole population, the elements of the vector ] [ 2 1 0 k T b b b b b = are statistics used to estimate the parameters ] , , , [ 2 1 0 k T β β β β β = . Similarly, ) i y ˆ i y ( i e - = is a vector of errors based on the sample statistics k b b b b , , , 2 1 0 rather than the population parameters k , 2 , 1 , 0 β β β β . The relationship between the statistics ( k b 2 b 1 b 0 b ) and the parameters ( k , 2 , 1 , 0 β β β β ) is important. If we returned to the population and selected a second sample, we would get a second, different prediction equation ki k i i i X b X b X b b y ' ..... ' ' ' ' ˆ 2 2 1 1 0 + + + + = with a second, different vector of errors, ) ˆ ( ` i i i y y e - = . If we drew a 3 rd
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}