EXST7015 Fall2011 Appendices (4, 5)

# EXST7015 Fall2011 Appendices (4, 5) - Statistical...

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Statistical Techniques II Matrix Algebra Handout (Part 1) Appendix 4 Supplemental Page 165 A. MATRIX STRUCTURE AND NOTATION 1) A matrix is a rectangular arrangement of numbers. The matrix is usually denoted by a capital letter. A= LM4 1 D= M MM3 N2 LM1 3OP N7 9 Q OP P 5P 0P Q 2 4 6 0 0 3 2) The dimensions of a matrix are given by the number of rows and columns in the matrix (i.e. the dimensions are r by c). For the matrices above, A is 2 by 2 D is 4 by 3 3) The individual elements of a matrix can be referred to by specifying the row and column in which it occurs. Lower case numbers are used to represent individual elements, and should match the upper case letter used to denote matrix. For example, individual elements from matrices A and D above can be referred to as, a11 = 1 a21 = 7 d22 = 6 d12 = 2 B. TYPES OF MATRICES 1) Square matrix - the number of rows and columns are equal. Matrix A above is a square matrix (2 by 2), matrix D is not (4 by 3). A symmetric matrix is an important variation of the square matrix. In a symmetric matrix, the value in position “ij” equals the value in position “ji” (where i j). For example, if c31 = 5 then c13 is also 5. 2) Scalar - a single number can be thought of as a 1 by 1 matrix and is called a scalar. 3) Vector - a single column or single row of numbers is called a vector. The dimensions of a row vector are (1 by c), where “c” is the number of columns, and the dimensions of a column vector (r by 1), where “r” is the number of rows. 4) Identity matrix - this special square matrix consists of all ones on the main diagonal, or principal diagonal, and zeros in all the off diagonal positions. The following are examples of identity matrices, LM1 E = 0 MM0 N OP P 1P Q 0 0 1 0 0 LM1 0 F = M MM0 N0 OP P 0P 1P Q 0 0 0 1 0 0 0 1 0 0 The diagonal matrix is a generalization of the identity matrix. A diagonal matrix can have any value on the main diagonal, but also has zeros in the off diagonal positions. James P. Geaghan - Copyright 2011 Statistical Techniques II Matrix Algebra Handout (Part 1) Appendix 4 Supplemental Page 166 C. MATRIX TRANSPOSE The transpose of a matrix consists of a new matrix such that the rows of the original matrix become the columns of the transpose matrix. The transpose matrix is denoted with the same letter as the original matrix followed by a prime (e.g. the transpose of X is X). LM4 1 D= M MM3 N2 OP P 5P 0P Q 2 4 6 0 0 3 LM4 D' = 2 MM4 N OP PP 0Q 1 3 2 6 0 3 0 5 D. MATRIX ADDITION AND SUBTRACTION Matrices to be added or subtracted must be of the same dimensions. Each element of the first matrix, (a) is added (or subtracted) from the corresponding element of the second matrix, (b). LM1 A= 3 MM9 N OP PP 0Q 2 4 LM 1 B= 1 MM4 N OP PP 4Q LM 1 1 A+B = 3 1 MM9 4 N 4 4 LM MM N OP PP Q 2 4 2 44 = 4 04 5 OP PP 4Q 2 8 E. MATRIX MULTIPLICATION Multiplication by a scalar - in this type of multiplication each element of the matrix is simply multiplied, element by element, by the scalar value. LM1 A= 3 MM9 N OP P 0P Q 2 4 LM1 A*B = 7* 3 MM9 N B = [7] OP PP Q LM MM N 2 7 14 4 = 21 28 0 63 0 OP PP Q Element by element multiplication - matrix multiplication is not usually done by matching each i,jth element of one matrix with the corresponding ijth element of the second matrix. This is called elementwise multiplication and it is not the normal mode of matrix multiplication and should not be used unless specifically requested. The standard method of matrix multiplication requires that the number of columns in the first matrix equal the number of rows in the second matrix. If the first matrix is (r by c) and the second is (r by c), in order to multiply the matrices, c must equal r. The resulting matrix will have the dimensions (r by c). Multiplication is accomplished by summing the cross products of each row of the first matrix and each column of the second matrix. LM1 A = 3 MM9 N OP P 0P Q 2 4 X = LM1 2OP N3 4 Q Since A is 3 rows by 2 columns, and X is 2 by 2, then the columns of the first matrix equals the rows of the second matrix, and the matrices may be multiplied. LM1 A*X = 3 MM9 N OP L PP MN Q LM MM N OP PP Q LM MM N OP PP Q 2 (1*1) + (-2 * 3) (1* -2) + (-2 * 4) 5 10 1 2 4 * = (3*1) + (4 * 3) (3* -2) + (4 * 4) = 15 10 3 4 0 (9 *1) + (0 * 3) (9 * -2) + (0 * 4) 9 18 OP Q James P. Geaghan - Copyright 2011 Statistical Techniques II Matrix Algebra Handout (Part 1) Appendix 4 Supplemental Page 167 the new dimensions for the product of A * X are, (3 x 2) (2 x 3) must be equal x new dimensions Note that though we can multiply A * X, we could not have done the multiplication the other way (i.e. X * A), since the dimensions would not have matched. That is, we could pre-multiply by A, but could not pre-multiply by X. F. SIMPLE MATRIX INVERSION (2 by 2 matrix only) Matrices are not “divided”, but may be inverted. Instead of “dividing” A by B, one would multiply A by the inverse of B. The inverse of a (2 by 2) matrix is given by, a b c d A–1 = A = d -b 1 a×d - b×c -c a The scalar value resulting from the calculation “(ad) – (bc)” is called the determinant. The matrix cannot be inverted unless the inverse of the determinant exists (is defined). It will not exist in a case such as the one below since (10) is not defined. A = LM1 4OP N2 8Q then 1 1 1 Determinant of A 1 8 2 4 0 a f a f This occurs in regression when two variables are linearly related. An example of the inversion of a 2 * 2 matrix is given below. B = LM2 3OP N1 4Q B–1 = OP LM Q N LM a f a fN OP LM Q N 4 3 1 4 3 0.8 0.6 1 0.2 0.4 2 4 1 3 1 2 5 1 2 OP Q Note that a matrix times its inverse (i.e. B B–1) results in an identity matrix. By definition, the inverse of a matrix G is a matrix which when multiplied by G produces an identity matrix, or GG–1=I. G. SIMPLE LINEAR REGRESSION Solving a simple linear regression with matrices requires the same values used for an algebraic solution from summation notation formulas. These are; n Xi , n, i=1 n Yi , i=1 n Xi2 , i=1 n Yi2 , i=1 n X Y i i i=1 where n is the size of the sample of data. To obtain these values in the matrix form we start with the matrix equivalent of the individual values of X and Y, the raw data matrices. LM1 MM1 1 MM1 X MM1 MM1 N1 OP X P X P X P P X P P X P X P Q X1 2 3 4 5 6 7 LM Y OP MMY PP Y MMY PP Y MMY PP MMY PP NY Q 1 2 3 4 5 6 7 James P. Geaghan - Copyright 2011 Statistical Techniques II Matrix Algebra Handout (Part 1) Appendix 4 Supplemental Page 168 The column of ones is necessary, and represents the intercept. Omitting this column would force the regression through the origin. The next step in the calculations is to obtain the XX, XY and YY matrices. These calculations provide the sums of squares and cross products. LM1 X OP MM1 X PP L 1 X 1 OM PP = MM n 1 X X PM QM1 X P MM X MM1 X PP N MN1 X PQ LMY OP MMY PP L O Y 1 OM P M Y P Y X PM P M QMY P = MM X Y PPP Q MMY PP N MNY PQ 1 2 L1 XX M NX 1 1 X2 1 1 X3 X 4 1 X5 3 1 X6 4 n 7 i 5 i 1 O PP X P Q n X P i i 1 n 2 i i 1 6 7 1 2 X Y LM 1 NX 1 1 X2 1 X3 1 X4 1 X5 1 X6 n 3 i i 1 n 4 7 i 5 i i 1 6 7 YY Y1 Y2 Y3 Y4 Y5 Y6 Y1 Y 2 Y3 n Y7 Y4 Yi2 Y5 i=1 Y6 Y 7 The regression coefficients, b0 and b1, are then given by, B = (XX)–1XY, where (X X) -1 1 n X n X 2 i 2 i X i2 Xi X i n and since 1 Determinant XX 1 1 n X X 2 i i 2 1 nSXX where Sxx is the corrected sum of squares of X. Then X i2 nSXX -1 (X X) X i nSXX X i nSXX n nSXX James P. Geaghan - Copyright 2011 Statistical Techniques II Matrix Algebra Handout (Part 1) Appendix 4 Supplemental Page 169 and the regression coefficients can be obtained by, LM X aXXf XY MM nS X MN nS X i 2 i 1 XX XX i nSXX n nSXX OP LM X Y PP LM XYY OP = MM nS X PQ NM QP MN nS Y Y-b1X X Y - Xi Yi = i i n 2 Xi 2 Xi n 2 i i i i XX i i i XX = Xi O PP PQ X Y P i nSXX n XiYi nSXX i LM Y b X OP LY b XO Lb O MM cX XhcY Yh PP = MM S PP = MNb PQ N cX Xh Q N S Q 1 i 1 i XY 2 XX i 0 1 The remaining calculations usually needed to complete the compliment of calculations for the simple linear regression is the sum of squared deviations or error term. The matrix formula is SSE = YY – BXY = Y2 – [b0 b1] LM Y OP NXYQ = Y2 – (b0*Y + b1*XY) = UCSSTotal – UCSSReg These calculations produce the same algebraic equations for b0, b1, and SSE that are given in most statistics texts. The advantage of using the matrix version of the formulas is that the matrix equations given above will work equally well for multiple regression with two or more independent variables. The ANOVA table calculated with matrix formulas is Source Regression Error Total Uncorrected d.f. 2 n–2 n Sum of Squares BXY YY–BXY YY Corrected d.f. 1 n–2 n–1 where the correction factor is calculated as usual, CF bYg n Sum of Squares BXY – CF YY – BXY YY – CF 2 nY 2 . SSRegression B X Y - CF , and is often expressed as a percent. SSTotal Y Y - CF Note that this calculation employs corrected sums of squares for both SSRegression and SSTotal. The value for R2 is calculated as The Mean Squares (MS) for the SSRegression and SSError are calculated by dividing the SS (corrected sum of squares) by their d.f. (degrees of freedom). The test of hypotheses for [H0:= 1] is then calculated as; BXY-CF MSRegression dfReg F= = MSTotal YY-CF dfError or James P. Geaghan - Copyright 2011 Statistical Techniques II Matrix Algebra Handout (Part 1) t= bb 0g 1 Sb1 Appendix 4 Supplemental Page 170 F value where S b is obtained from the VARIANCE COVARIANCE matrix. 1 The VARIANCE COVARIANCE matrix is calculated as from the (XX)–1 matrix. bX Xg LMNc c 1 00 10 c01 c11 OP Q where the cij values are called Gaussian multipliers. The VARIANCE-COVARIANCE matrix is then calculated from this matrix by multiplying by the MSError. b g LMNMSEc MSEc MSE X X 1 00 10 MSEc01 MSEc11 OP Q The individual values then provide the variances and covariances such that MSE*c00 = Variance of b0 = VAR(b0) MSE*c11 = Variance of b1 = VAR(b1), so Sb1 MSE*c11 MSE*c01 = MSE*c10 = Covariance of b0 and b0\1 = COV(b0,b1) It is important to note that the variances and covariances calculated from the (XX)–1 are for the bi (i estimates), not for the Xi values. Also, COV(b0,b1) COV(X0,X1). REFERENCE: Goodnight, J. H. 1978. The Sweep Operator: Its importance in statistical computing. in Proc. Eleventh Annual Symposium on the INTERFACE. Gallant A. R. and T. M. Gerig (ed), Inst. Statistics, N. C. State University, Raleigh, N. C. James P. Geaghan - Copyright 2011 Statistical Techniques II Matrix Algebra Handout (Part 1) Appendix 4 Supplemental Page 171 Application of matrix procedures to multiple regression first requires calculation of the XX, XY and YY matrices, where for dependent variable Y and independent variables X1 and X2. For a 2 factor multiple regression, these matrices are; n n n n n X1i X 2i X 3i Yi i=1 i=1 i=1 i=1 n n n n n 2 X1i X1i X 2i X1i X3i X1i X1i Yi n i=1 i=1 i=1 XY i=1 YY= Yi2 XX i=1 n n n n n i=1 X 22i X 2i X3i X 2i X1i X 2i X 2i Yi i=1 i=1 i=1 i=1 i=1 n n n n n 2 X 3i X1i X 3i X 2i X 3i X 3i Yi X3i i=1 i=1 i=1 i=1 i=1 As with the simple linear regression, these sums, sums of squares and cross products are required by any method of fitting multiple regression. Once these values are obtained, application of formulas for an algebraic solution is relatively easy for a two-factor model. However, matrix procedures are more easily expanded to more than two independent variables than are summation notation formulas. The inversion technique we will use is called the sweepout technique, and it requires the application of “row operations”. Row operations consist of (1) multiplying any row by a scalar value, and (2) adding or subtracting any row from any other row. These are the only operations required to complete the sweepout technique after the matrices have been obtained and augmented. Obtaining a maximum of information from the technique requires reducing the XX matrix one column at a time to an identity matrix. However, values of the regression coefficients, error sum of squares and inverse matrix will be correct even of the row operations are not applied in a column by column reduction. By “sweeping” out each column of the XX matrix one by one to obtain an identity matrix, the sequentially adjusted sums of squares error can also be obtained. This requires augmenting the XX matrix with the XY matrix and an identity matrix prior to applying the row operations. The complete augmented matrix is given below. The matrix has separate sections that are recognizable as matrices seen earlier. This type of sectioned matrix is called a partitioned matrix. X X X Y I I B (X X) -1 row operations X Y Y Y 0 0 SSE -B Sections of the matrix may be left off if less information is required. For example, if only the regression coefficients are needed, then the sweepout technique need be applied only to the matrix, XX XY row operations I B , and if only the inverse is required, the only matrix needed is X X I row operations I (X X) -1 . LM N OP Q LM N OP Q The regression coefficients and sum of squares error can be obtained by sweeping out the matrix, XX XY I B row operations . XY YY 0 SSE If the above matrix is swept out column by column, then it will also provide the sequentially adjusted sums of squares. Only the use of the complete augmented matrix provides the inverted XX matrix necessary to obtain the variance - covariance matrix, confidence limits and other types of sums of squares. The technique will be illustrated with an example using data from Snedecor and Cochran (1981; ex. 17.2.1). The example will employ the complete augmented matrix. The original data matrices are; LM N OP Q LM N OP Q James P. Geaghan - Copyright 2011 Statistical Techniques II Matrix Algebra Handout (Part 1) LM 17 XX = M188.2 MN 700 Appendix 4 OP PP 31712 Q 188.2 700 3602.78 8585.1 8585.1 LM 1295 OP XY= M16203.8P MN 54081 PQ Supplemental Page 172 YY= [103075] The augmented matrix to be swept is then, 17 188.2 700 1295 1 0 0 188.2 3602.78 8585.1 16203.8 0 1 0 700 8585.1 31712 54081 0 0 1 1295 16203.8 54081 103075 0 0 0 The first step in the sweepout technique is to multiply through the first row by the inverse of 17. This will result in a value of 1 in the first row - first column. A multiple of this new first row is then subtracted from each of the other rows (2, 3 and 4). The multiplier should be such that value(i,1)– [value(1,1)*multiplier] = 0 for i ≠ 1. The multiplier which accomplishes this is simply the value(i,1) since the new value(1,1) is unity (1). Therefore, every value(i,j) will be processed in the same way. The calculations would be, for row 2: value(2,j) – (value(1,j) * 118.2) for row 3: value(3,j) – (value(1,j) * 700) for row 4: value(4,j) – (value(1,j) * 1295) After applying these transformations we obtain the following matrix, COLUMN 1 SWEEP 1 11.0706 41.1765 76.1765 0.05882 0 0 0 1519.30 835.688 1867.39 11.0706 1 0 0 835.688 2888.47 757.471 41.1765 0 1 0 1867.39 757.471 4426.47 76.1765 0 0 At this point the effect of X (the intercept) has been removed from the model. The value replacing YY is 4426.471. This is the corrected sum of squares of Y (i.e. Y was 103075, and has now been corrected for the mean, yielding 4426.47). The sweepout now proceeds to the second column. A value of 1 is needed in the second column second row to proceed with the development of the identity matrix. This is obtained by multiplying through the second row by the inverse of the value presently in that position (i.e. 1519.30). Then, the appropriate multiple of the new row 2 is subtracted from each of the other rows. Note that the first column remains unchanged since the value subtracted is always a multiple of zero. COLUMN 2 SWEEP 1 0 35.08709 62.5694 0.13949 0.00729 0 0 1 0.550050 1.22911 0.00729 0.00066 0 0 0 2428.800 -269.686 35.0871 0.55005 1 0 0 -269.686 2131.236 62.5694 1.22911 0 The sweep then proceeds with the third column. Once again a value of 1 is required in row 3, column 3, and all rows other than row 3 will have a multiple of row 3 subtracted from them. James P. Geaghan - Copyright 2011 Statistical Techniques II Matrix Algebra Handout (Part 1) Appendix 4 Supplemental Page 173 COLUMN 3 SWEEP 1 0 0 66.4654 0.646369 0.000660 0.014446 0 1 0 1.29019 0.000660 0.000783 0.000226 0 0 1 0.11104 0.014446 0.000226 0.000412 0 0 0 2101.291 66.46541 1.290191 0.11104 Once this swept out matrix has been obtained, most commonly desired calculations follow easily. Some of these results are discussed below. There are also several checks which can be done on the calculations. As the matrix is swept out, the null matrix (matrix of zeroes in the original augmented matrix) is replaced by the negative values of the regression coefficients if the calculations have been done correctly. As a second check, the product of the original XX matrix and its inverse should produce an identity matrix. (i.e. XX * (XX)-1 = I ) REGRESSION COEFFICIENTS The regression coefficients are produced during the sweepout, replacing the XY matrix. The model for the analysis above is, ˆ Y = b0 + b1X1i + b 2 X 2i ˆ Y = 66.4654 + 1.2902X1i + 0.1110X 2i SEQUENTIALLY ADJUSTED SUMS OF SQUARES As each column is swept out, the sums of squares are “adjusted” for the factor removed. The first sweep adjusts for the intercept (i.e. 1 = n) on the diagonal of XX, so the reduction in the Y is the correction factor or the adjustment for the mean. e.g. C.F. = 103075 4426.470 = 98648.530 The second sweep adjusts for the second term in the X matrix, usually X, and the reduction in the error term is that sum of squares attributable to X (given that X is already in the model). e.g. SS(X|X) = 4426.470 2131.236 = 2295.234 The third sweep adjusts for X and the reduction in the sum of squares is attributable to X (given that X and X are already in the model). e.g. SS(X|X X) = 2131.236 2101.291 = 29.945 Finally, the remaining sum of squares is the error sum of squares SSE = 2101.291 Note that since the variables are adjusted sequentially, the sums of squares obtained are dependent on the order in which the variables are entered. That is, if we had entered X first and X second, the sums of squares attributable to these two variables would not be the same as the results obtained above. Only the correction factor would be the same (since it would have been entered first in both models). Each adjustment of the sum of squares takes one degree of freedom. The residual sum of squares has (nk) degrees of freedom, where n is the number of observations, and k is the number of sweeps, or the number of columns in the XX matrix. The mean square error is then, SSE 2101.291 MSE = = = 150.092 (n-k) (17 - 3) PARTIAL SUMS OF SQUARES Since the sequentially adjusted sums of squares are dependent on the order in which the variables are entered, another value of interest is the partial sum of squares or the uniquely attributable sum of squares. This is simply the sum of squares that would be accounted for by each variable if it had been entered into the model in last place. This value could be obtained by reversing the sweep operation, and observing the change in the sum of squares as each variable was swept back into the model. The only change in sum of squares when a variable is swept back into the model is, bc, James P. Geaghan - Copyright 2011 Statistical Techniques II Matrix Algebra Handout (Part 1) Appendix 4 Supplemental Page 174 So this calculation will give the partial SS due to variable X without actually doing all the calculations necessary to reverse the sweepout technique. The elements (c) are obtained from the (XX)-1 matrix and are called Gaussian multipliers. The partial SS due to X above does not change since it was the variable in the last position. The partial SS due to X would be calculated as, (1.29019)2 SS(X1|X 0 X 2 ) = = 2125.913 (0.000783) VARIANCE COVARIANCE MATRIX Another major result of the sweepout technique is the inverse of the XX matrix. Multiplying this matrix by the mean square error (MSE) gives the variance - covariance matrix of the regression coefficients. e.g. VarCov = MSE * (XX)-1 = 0.64637 0.00066 -0.01445 97.0149 0.0990 -2.1683 150.092 0.00066 0.00078 -0.00023 0.0990 0.1175 -0.0340 -0.01445 -0.00023 0.00041 -2.1683 -0.0340 0.0618 so, Var(b0)=97.0149, Var(b1)=0.1175, Var(b2)=0.0618, Var(b12)=0.0340, etc. ˆ The variance - covariance matrix can also be used to obtain confidence intervals about estimates of Y for particular values of X and X. The most versatile approach is to use matrix algebra in these calculations. The equation is S2ˆ = MSE (L(X X) -1L) Y ˆ where L is a vector of values for X corresponding to Y. It may also be a vector of hypothesized X values for which a variance is needed. ˆ For example, if we wish to predict the response ( Y) and its variance when X = 4 and X = 24, first we would calculate the response, ˆ Y=66.4654+1.2902X1i +0.1110X 2i 66.4654+1.2902(4)+0.1110(24) 68.9622 Using L = [ 1 4 24 ], (note that a 1 is included for the intercept) the variance of the estimate is then, 0.64637 0.00066 -0.01445 1 2 SY 150.092[1 4 24] 0.00066 0.00078 -0.00023 4 = 24.6782 ˆ -0.01445 -0.00023 0.00041 24 and the standard error is 2 4 .6 7 8 2 = 4.9677. The sweepout technique is not the only method of matrix inversion. However, its application to the augmented matrix described above is a relatively simple and versatile method of obtaining most of the results commonly desired from a multiple regression analysis. REFERENCE: Goodnight, J. H. 1978. The Sweep Operator: Its importance in statistical computing. in Proc. Eleventh Annual Symposium on the INTERFACE. Gallant A. R. and T. M. Gerig (ed), Inst. Statistics, N. C. State University, Raleigh, N. C. James P. Geaghan - Copyright 2011 Statistical Techniques II Matrix Applications Appendix 5 Sweepout Example Supplemental Page 175 Three factor multiple regression from Snedecor and Cochran (1967), table 13.10.1, page 405. Y = estimated plant available phosphorus in the soil (20 C) X1 = inorganic phosphorus X2 = organic phosphorus soluble in K2CO3 and hydrolized by hypobromite X3= organic phosphorus soluble in K2CO3 and NOT hydrolized by hypobromite All least squares regression analyses start with the same three matrices. X= X´X = Y´Y = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 18 215 758 2214 0.4 0.4 3.1 0.6 4.7 1.7 9.4 10.1 11.6 12.6 10.9 23.1 23.1 21.6 23.1 1.9 26.8 29.9 215 4321.02 10139.5 27645 53 23 19 34 24 65 44 31 29 58 37 46 50 44 56 36 58 51 158 163 37 157 59 123 46 117 173 112 111 114 134 73 168 143 202 124 Y= 758 10139.5 35076 96598 2214 27645 96598 307894 64 60 71 61 54 77 81 93 93 51 76 96 77 93 95 54 168 99 X´Y = 1463 20706.2 63825 187542 [ 131299 ] Create a fully augmented matrix of the form; X´X (X´Y)´ X´Y Y´Y I 0 James P. Geaghan - Copyright 2011 Statistical Techniques II Matrix Applications Appendix 5 Sweepout Example Supplemental Page 176 The resulting matrix contains; X0 X1 X2 X3 X´Y c0 c1 c2 c3 n X1 X2 X3 X1 X12 X1X2 X1X3 X2 X1X2 X22 X2X3 X3 X1X3 X2X3 X32 Y X1Y X2Y X3Y 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 Y X1Y X2Y X3Y Y2 0 0 0 0 Numerically for this problem given previously the matrix is; X0 X1 X2 X3 18 215 758 2214 215 4321.02 10139.5 27645 758 10139.5 35076 96598 2214 27645 96598 307894 1463 20706.2 63825 187542 X´Y c0 c1 c2 c3 1463 20706.2 63825 187542 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 131299 0 0 0 0 The first step (divide row 1 by value1,1) in the sweepout technique produces, X0 X1 X2 X3 X´Y c0 c1 c2 c3 1 215 11.944444 4321.02 42.111111 10139.5 123 27645 81.277778 20706.2 0.055556 0 0 1 0 0 0 0 758 10139.5 35076 96598 63825 0 0 1 0 2214 27645 96598 307894 187542 0 0 0 1 1463 20706.2 63825 187542 131299 0 0 0 0 And after sweeping out the first column (subtracting a multiple of row 1 from all other rows) we have; X0 X1 X2 X3 X´Y c0 c1 c2 c3 1 0 11.944444 1752.964444 42.111111 1085.611111 123 1200 81.277778 3231.477778 0.055556 -11.944444 0 1 0 0 0 0 0 1085.611111 3155.777778 3364 2216.444444 -42.111111 0 1 0 0 1200 3364 35572 7593 -123 0 0 1 0 3231.477778 2216.444444 7593 12389.61111 -81.277778 0 0 0 We start the second column sweep by dividing row 2 by value2,2, X0 X1 X2 X3 X´Y c0 c1 c2 c3 1 0 11.944444 1 42.111111 0.6193 123 0.684555 81.277778 1.843436 0.055556 -0.006814 0 0.00057 0 0 0 0 0 1085.611111 3155.777778 3364 2216.444444 -42.111111 0 1 0 0 1200 3364 35572 7593 -123 0 0 1 0 3231.477778 2216.444444 7593 12389.61111 -81.277778 0 0 0 James P. Geaghan - Copyright 2011 Statistical Techniques II Matrix Applications Appendix 5 Sweepout Example Supplemental Page 177 and finish sweeping the second column to obtain; X0 X1 X2 X3 X´Y c0 c1 c2 c3 1 0 0 1 34.713915 0.6193 114.823375 0.684555 59.258959 1.843436 0.136943 -0.006814 -0.006814 0.00057 0 0 0 0 0 0 2483.458674 2620.839842 215.189831 -34.713915 -0.6193 1 0 0 0 2620.839842 34750.53439 5380.87679 -114.823375 -0.684555 0 1 0 0 215.189831 5380.87679 6432.588616 -59.258959 -1.843436 0 0 The third column starts with, X0 X1 X2 X3 X´Y c0 c1 c2 c3 1 0 0 1 34.713915 0.6193 114.823375 0.684555 59.258959 1.843436 0.136943 -0.006814 -0.006814 0.00057 0 0 0 0 0 0 1 1.055318 0.086649 -0.013978 -0.000249 0.000403 0 0 0 2620.839842 34750.53439 5380.87679 -114.823375 -0.684555 0 1 0 0 215.189831 5380.87679 6432.588616 -59.258959 -1.843436 0 0 and after being swept out produces, X0 X1 X2 X3 X´Y c0 c1 c2 c3 1 0 0 1 0 0 78.189139 0.030996 56.251024 1.789774 0.622176 0.001843 0.001843 0.000725 -0.013978 -0.000249 0 0 0 0 1 1.055318 0.086649 -0.013978 -0.000249 0.000403 0 0 0 0 31984.71367 5153.782984 -78.189139 -0.030996 -1.055318 1 0 0 0 5153.782984 6413.942579 -56.251024 -1.789774 -0.086649 0 Finally the fourth column in the X´X matrix is started and swept out, X0 X1 X2 X3 X´Y c0 c1 c2 c3 1 0 0 1 0 0 78.189139 0.030996 56.251024 1.789774 0.622176 0.001843 0.0018428 0.0007249 -0.0139781 -0.0002494 0 0 0 0 1 1.055318 0.086649 -0.013978 -0.0002494 0.0004027 0 0 0 0 1 0.161133 -0.002445 -0.0000010 -0.0000330 0.000031 0 0 0 5153.782984 6413.942579 -56.251024 -1.7897741 -0.0866492 0 c1 c2 and the final result is; X0 X1 X2 X3 X´Y c0 c3 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 43.652198 1.78478 -0.083397 0.161133 0.813316 0.001919 -0.011398 -0.002445 0.0019185 0.0007249 -0.0002483 -0.0000010 -0.0113982 -0.0002483 0.0004375 -0.0000330 -0.002445 -0.000001 -0.000033 0.000031 0 0 0 0 5583.499658 -43.652198 -1.7847797 0.0833971 -0.161133 James P. Geaghan - Copyright 2011 Statistical Techniques II Matrix Applications Appendix 5 Sweepout Example Supplemental Page 178 The resulting matrix is of the form; I 0 (X´X)-1 -B´ B SSE and contains the values X0 X1 X2 X3 X´Y c0 c1 c2 c3 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 b0 b1 b2 b3 c00 c10 c20 c30 c01 c11 c21 c31 c02 c12 c22 c32 c03 c13 c23 c33 0 0 0 0 SSE -b0 -b1 -b2 -b3 The solution to the regression equation is then, Yi = 43.652 + 1.785X1i - 0.083X2i + 0.161X3i + e The sums of squares are given by the sequential reduction in the YY matrix MATRIX Y´Y VALUE INTERPRETATION of the REPLACEMENT VALUE DIFFERENCE from PREVIOUS VALUE Original Col 1 sweep Col 2 sweep Col 3 sweep Col 4 Sweep 131299 12389.6111 6432.5886 6413.9426 5583.4997 Y2 (uncorrected) Y2-(Y)2/n = SSY|X0 SSY|X0,X1 SSY|X0,X1,X2 SSY|X0,X1,X2,X3 = SSE 118909.3840 5957.0225 18.6460 830.4429 INTERPRETATIO N of the DIFFERENCE (Y)2/n = C.F. SeqSSX1 SeqSSX2 SeqSSX3 Partial sums of squares, or fully adjusted sums of squares, are given by bk 2 PARTIAL SS = ckk Partial SSX1 = b12/c11 = 1.78482 / 0.0007249 = 4394.1523 Partial SSX2 = b22/c22 = 0.083402 / 0.0004375 = 15.8979 Partial SSX3 = b32/c33 = 0.16112 / 0.00003127 = 830.4429 Recall that I number the positions in the X´X matrix differently, from k = 0, 1, ... , p (where p is the number of parameters excluding the intercept) instead of starting at 1 as other matrices. This is done in order to be able to associate the matrix position with the regression coefficient subscript. James P. Geaghan - Copyright 2011 ...
View Full Document

## This note was uploaded on 12/29/2011 for the course EXST 7015 taught by Professor Wang,j during the Fall '08 term at LSU.

Ask a homework question - tutors are online