EXST7015 Fall2011 Lect08

EXST7015 Fall2011 Lect08 - Statistical Techniques II Page...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Statistical Techniques II Page 29 Solving the matrix equations we get, B=(X'X)–1X'Y This equation is the matrix algebra solution for a simple linear regression. Note that there is no such thing as matrix “division”. The solution requires multiplying by the inverse matrix. As with the algebraic values, if we multiply the B values (B matrix) by the X values we get the predicted values. XB=X(X'X)–1X'Y = Yhat vector What do we need to know about these matrix calculations? We need to know that the solution to the problem using matrix algebra involves the same values as for the simple linear regression. We need to know that the (X'X)–1 is a key component to this solution. We need to know that the predicted values require the matrix segment X(X'X) –1X' times the Y vector (MAIN DIAGONAL). Why? We can get the matrices from SAS, but we want to understand what we have. (X'X) –1 is a key component, not only of the solution for the regression coefficients, but also for the variance-covariance matrix. The X(X'X) –1X' matrix main diagonal is a diagnostic that we will use (hat diag). But the most important reason for using matrices is that the solution for simple linear and multiple regression are the same. Basically, matrix algebra is the ONLY way to solve multiple regressions. So, what do we get from SAS? If the options XPX and I are placed on the model statement, we can get the X'X matrix and the (X'X) –1 matrix. For the simple linear regression that we saw for the tree weights and diameters, these options produce the following output. Model Cross-products X'X X'Y Y'Y X'X INTERCEP INTERCEP 47 DBH 289.2 WEIGHT 17359 DBH 289.2 1981.98 142968.3 WEIGHT 17359 142968.3 13537551 X'X Inverse, Parameter Estimates, and SSE INTERCEP DBH INTERCEP 0.2082694963 -0.030389579 DBH -0.030389579 0.004938832 WEIGHT -729.3963003 178.56371409 WEIGHT -729.3963003 178.56371409 670190.7322 The first two rows and columns of numbers contain the X'X matrix, which has the values for n Xi, i 1 n X i 1 2 i , n. X'X INTERCEP DBH INTERCEP 47 289.2 The last column has X'Y (values for DBH 289.2 1981.98 n Yi , i 1 n n i 1 i 1 X iYi ) and the last value is Y'Y ( Yi 2 ). Model Cross-products X'X X'Y Y'Y James P. Geaghan - Copyright 2011 Statistical Techniques II Page 30 X'X INTERCEP DBH WEIGHT WEIGHT 17359 142968.3 13537551 In the X'X inverse matrix section, the first two rows and columns of numbers contain the (X'X)–1 matrix and the value in the third row and third column is the SSE. The other values are b0 and b1. INTERCEP DBH WEIGHT INTERCEP 0.2082694963 -0.030389579 -729.3963003 DBH -0.030389579 0.004938832 178.56371409 WEIGHT -729.3963003 178.56371409 670190.7322 You will be responsible only for knowing where the 6 intermediate values are for simple linear regression, and where to find the (X'X)–1 matrix. Model Cross-products X'X X'Y Y'Y X'X INTERCEP INTERCEP 47 DBH 289.2 WEIGHT 17359 DBH 289.2 1981.98 142968.3 WEIGHT 17359 142968.3 13537551 X'X Inverse, Parameter Estimates, and SSE INTERCEP DBH INTERCEP 0.2082694963 -0.030389579 DBH -0.030389579 0.004938832 WEIGHT -729.3963003 178.56371409 WEIGHT -729.3963003 178.56371409 670190.7322 Multiple Regression with matrix algebra The only difference between simple linear regression and multiple regression is the fact that multiple regression has several independent variables (Xi variables). There for the matrix X'X will be larger. For a simple linear regression, X'X is 2×2. For a 3 factor multiple regression (X1, X2, X3 and an intercept) the X'X matrix will be 4×4. Y1 Y 2 Y3 Y Y4 Y5 Y6 Y 7 1 1 1 X 1 1 1 1 X11 X 21 X12 X13 X 22 X 23 X14 X 24 X15 X16 X 25 X 26 X17 X 27 X31 X32 X33 X34 X35 X36 X37 James P. Geaghan - Copyright 2011 Statistical Techniques II Page 31 The values contained in the matrix include all of the sums, sums of squares and cross-products for all of the Xi variables (plus n in the upper left hand corner). n n X1i XX i=1 n X 2i i=1 n X 3i i=1 n X 1i i=1 n X 2 1i i=1 n X1i X 2i i=1 n X 1i i=1 n X i=1 n X 1i X 2i i=1 n X 2 2i i=1 n X 3i X 2i n Yi i=1 i=1 n n X1i X3i X1i Yi i=1 XY i=1 n n X 2i X3i X 2i Yi i=1 i=1 n n 2 X 3i Yi X3i i=1 i=1 n 2i X 3i i=1 X 3i n YY= Yi2 i=1 Basically everything works the same in multiple regression as in simple linear regression if we use matrix algebra. For the multiple regression the solution is still given by B=(X'X)–1X'Y The predicted values are still given by XB=X(X'X) –1X'Y = Yhat vector The residuals are given by residual = ei = Yi – X(X'X) –1X'Y The X(X'X) –1X' matrix main diagonal is a diagnostic that we will use (hat diag). One last piece of the puzzle. We will need some estimates of variances and covariances. As usual these will all involve the MSE (Mean Square Error). We need all of the variances and covariances for the regression coefficients. These variances and covariances are obtained by multiplying the (X'X)–1 matrix by the MSE. The resulting matrix contains all of the variances and covariances for the regression coefficients. The variances of regression coefficients are on the main diagonal. The square root of these values gives the standard error used for confidence intervals and testing of the bi values. The (X'X) –1 matrix can be obtained from SAS. When multiplied by the MSE value this gives the Variance-Covariance matrix. This can also be obtained from SAS. A note on the assumption of independence. We assume that the ei values are independent of each other. We assume that the ei are independent of the Yhat values ( b0 b1 X i ). But WE DO NOT ASSUME THAT THE VARIOUS REGRESSION COEFFICIENTS ARE INDEPENDENT OF EACH OTHER. Calculations do not assume zero covariance; they employ the Variance - Covariance matrix. Summary (see Appendix 5 for sweepout example) Matrix solutions to regression begin with the matrices X'X, X'Y and Y'Y. The values in these matrices are the sums, sums of squares and cross-products, just as with simple linear regression. The normal equations are solved just as for SLR. The solution is B X X X Y . 1 Using matrix algebra we can obtain predicted values and residuals, diagnostics, variances and covariances and all of the other values needed for testing and interpretation of the multiple regression. James P. Geaghan - Copyright 2011 ...
View Full Document

Ask a homework question - tutors are online