This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Statistical Techniques II Page 29 Solving the matrix equations we get, B=(X'X)–1X'Y
This equation is the matrix algebra solution for a simple linear regression. Note that there is no
such thing as matrix “division”. The solution requires multiplying by the inverse matrix.
As with the algebraic values, if we multiply the B values (B matrix) by the X values we get the
predicted values.
XB=X(X'X)–1X'Y = Yhat vector
What do we need to know about these matrix calculations?
We need to know that the solution to the problem using matrix algebra involves the same
values as for the simple linear regression.
We need to know that the (X'X)–1 is a key component to this solution.
We need to know that the predicted values require the matrix segment X(X'X) –1X' times the Y
vector (MAIN DIAGONAL).
Why? We can get the matrices from SAS, but we want to understand what we have. (X'X) –1 is
a key component, not only of the solution for the regression coefficients, but also for the
variancecovariance matrix. The X(X'X) –1X' matrix main diagonal is a diagnostic that we will
use (hat diag).
But the most important reason for using matrices is that the solution for simple linear and
multiple regression are the same. Basically, matrix algebra is the ONLY way to solve multiple
regressions.
So, what do we get from SAS? If the options XPX and I are placed on the model statement, we can get
the X'X matrix and the (X'X) –1 matrix.
For the simple linear regression that we saw for the tree weights and diameters, these options produce
the following output.
Model Crossproducts X'X X'Y Y'Y
X'X
INTERCEP
INTERCEP
47
DBH
289.2
WEIGHT
17359 DBH
289.2
1981.98
142968.3 WEIGHT
17359
142968.3
13537551 X'X Inverse, Parameter Estimates, and SSE
INTERCEP
DBH
INTERCEP
0.2082694963
0.030389579
DBH
0.030389579
0.004938832
WEIGHT
729.3963003
178.56371409 WEIGHT
729.3963003
178.56371409
670190.7322 The first two rows and columns of numbers contain the X'X matrix, which has the values for
n Xi,
i 1 n X
i 1 2
i , n. X'X
INTERCEP
DBH INTERCEP
47
289.2 The last column has X'Y (values for DBH
289.2
1981.98
n Yi ,
i 1 n n i 1 i 1 X iYi ) and the last value is Y'Y ( Yi 2 ). Model Crossproducts X'X X'Y Y'Y
James P. Geaghan  Copyright 2011 Statistical Techniques II Page 30 X'X
INTERCEP
DBH
WEIGHT WEIGHT
17359
142968.3
13537551 In the X'X inverse matrix section, the first two rows and columns of numbers contain the (X'X)–1
matrix and the value in the third row and third column is the SSE. The other values are b0 and b1.
INTERCEP
DBH
WEIGHT INTERCEP
0.2082694963
0.030389579
729.3963003 DBH
0.030389579
0.004938832
178.56371409 WEIGHT
729.3963003
178.56371409
670190.7322 You will be responsible only for knowing where the 6 intermediate values are for simple linear
regression, and where to find the (X'X)–1 matrix.
Model Crossproducts X'X X'Y Y'Y
X'X
INTERCEP
INTERCEP
47
DBH
289.2
WEIGHT
17359 DBH
289.2
1981.98
142968.3 WEIGHT
17359
142968.3
13537551 X'X Inverse, Parameter Estimates, and SSE
INTERCEP
DBH
INTERCEP
0.2082694963
0.030389579
DBH
0.030389579
0.004938832
WEIGHT
729.3963003
178.56371409 WEIGHT
729.3963003
178.56371409
670190.7322 Multiple Regression with matrix algebra
The only difference between simple linear regression and multiple regression is the fact that multiple
regression has several independent variables (Xi variables).
There for the matrix X'X will be larger. For a simple linear regression, X'X is 2×2. For a 3 factor
multiple regression (X1, X2, X3 and an intercept) the X'X matrix will be 4×4. Y1 Y 2 Y3 Y Y4 Y5 Y6 Y 7 1
1 1 X 1
1 1
1 X11 X 21 X12
X13 X 22
X 23 X14 X 24 X15
X16 X 25
X 26 X17 X 27 X31 X32 X33 X34 X35 X36 X37 James P. Geaghan  Copyright 2011 Statistical Techniques II Page 31 The values contained in the matrix include all of the sums, sums of squares and crossproducts for all
of the Xi variables (plus n in the upper left hand corner). n n X1i
XX i=1 n X 2i i=1 n X 3i i=1 n X 1i i=1
n X 2
1i i=1 n X1i X 2i
i=1
n X 1i i=1 n X
i=1 n X 1i X 2i i=1 n X 2
2i i=1 n X 3i X 2i n Yi i=1 i=1 n n X1i X3i X1i Yi i=1 XY i=1 n n X 2i X3i X 2i Yi i=1 i=1 n n 2 X 3i Yi X3i i=1 i=1 n 2i X 3i i=1 X 3i n YY= Yi2 i=1 Basically everything works the same in multiple regression as in simple linear regression if we use
matrix algebra. For the multiple regression the solution is still given by B=(X'X)–1X'Y
The predicted values are still given by XB=X(X'X) –1X'Y = Yhat vector
The residuals are given by residual = ei = Yi – X(X'X) –1X'Y
The X(X'X) –1X' matrix main diagonal is a diagnostic that we will use (hat diag). One last piece of the puzzle. We will need some estimates of variances and covariances. As usual
these will all involve the MSE (Mean Square Error).
We need all of the variances and covariances for the regression coefficients. These variances
and covariances are obtained by multiplying the (X'X)–1 matrix by the MSE. The resulting
matrix contains all of the variances and covariances for the regression coefficients. The
variances of regression coefficients are on the main diagonal. The square root of these values
gives the standard error used for confidence intervals and testing of the bi values.
The (X'X) –1 matrix can be obtained from SAS. When multiplied by the MSE value this gives
the VarianceCovariance matrix. This can also be obtained from SAS.
A note on the assumption of independence.
We assume that the ei values are independent of each other.
We assume that the ei are independent of the Yhat values ( b0 b1 X i ).
But WE DO NOT ASSUME THAT THE VARIOUS REGRESSION COEFFICIENTS ARE
INDEPENDENT OF EACH OTHER.
Calculations do not assume zero covariance; they employ the Variance  Covariance matrix. Summary (see Appendix 5 for sweepout example)
Matrix solutions to regression begin with the matrices X'X, X'Y and Y'Y. The values in these
matrices are the sums, sums of squares and crossproducts, just as with simple linear regression.
The normal equations are solved just as for SLR. The solution is B X X X Y .
1 Using matrix algebra we can obtain predicted values and residuals, diagnostics, variances and
covariances and all of the other values needed for testing and interpretation of the multiple
regression.
James P. Geaghan  Copyright 2011 ...
View Full
Document
 Fall '08
 Wang,J

Click to edit the document details