EXST7015 Fall2011 Lect07

EXST7015 Fall2011 Lect07 - Statistical Techniques II Page...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Statistical Techniques II Page 25 Curvilinear Residual Patterns Transformations of Yi, like log transformations, will affect homogeneity of variance. The raw data should actually appear nonhomogeneous. Yi Yi Xi Xi Transformations of Xi will fit curves but will not affect the homogeneity of variance. Y Yi i Xi Xi Polynomials assume homogeneous variance and will not adjust variance. Yi Xi Intrinsically Linear (Curvilinear) regression example Remember our SLR example about the amount of wood harvested from trees, predicted on the basis of DBH (diameter at breast height)? Recall that it looked a little curved, and maybe even had nonhomogeneous variance? Let's take another look at that model. Typically, morphometric relationships (between parts of an organism) are best fitted with models with both Log(Y) and Log(X). Fish length - scale length Fish total length - fish fork length Crab width - crab length Fish length - fish weight, etc. Statistics quote: There are three kinds of lies: lies, damned lies, and statistics. Benjamin Disraeli (1804 - 1881) James P. Geaghan - Copyright 2011 Statistical Techniques II Page 26 Here we have tree diameter and tree weight. Lets try a log-log model (using natural logs). Since we are fitting a linear measurement to a volumetric or weight measurement, I expect the following relationship to apply. 1gm = 1c3, for the metric system if the material has the same density as water (specific gravity = 1). In other words, I expect Wood weight b0 woodlength 3 log Wood weight log b0 3log woodlength Note that this line will go through the origin, no problem there. Yi b0 X ib1 ei The power term should be approximately 3, so we will test the coefficient of DBH against a value of 3. See computer output (Appenxix 3). The residual plot for the log-log model appears to show no curvature, no nonhomogeneous variance, no obvious outliers, and no significant departure from the normal distribution. In short, it is much improved, and probably fits better than the linear model and it is interpretable. Geometrically, the model for a tree trunk should approximate a cylinder or cone, depending on the taper in the bole. Weight = C**(specific gravity)*(D/2)2*H where C=1 for a cylinder and 1/3 for a cone. Curvilinear Regression Notes and Summary For transformed models, The usual regression assumptions must be met for the transformed model, not the raw data (homogeneity, normality, etc.). Estimates, hypothesis tests and confidence intervals would be calculated for the transformed model. The estimates and limits can then be detransformed. A wide range of biometrics situations call for established curvilinear models. These would include, exponential growth, mortality, morphometric models, instrument standardization, some other growth models (power models and quadratics have been used), recruitment models. Check the literature in your field to see what models are used. Statistics quote: "In earlier times, they had no statistics, and so they had to fall back on lies". -- Stephen Leacock James P. Geaghan - Copyright 2011 Statistical Techniques II Page 27 Matrix Algebra (see Appendix 4) We will not be doing our regressions with matrix algebra, except that the computer does employ matrices. In fact, there is really no other way to do the basic calculations. You will be responsible for knowing about matrices only to the extent that PROC REG or PROC GLM produces information. This is primarily the initial and final matrices. So, what is a matrix? A matrix is a rectangular arrangement of numbers. The matrix is usually denoted by a capital letter. A= LM4 1 D= M MM3 N2 LM1 3OP N7 9 Q OP P 5P 0P Q 2 4 6 0 0 3 The dimensions of a matrix are given by the number of rows and columns in the matrix (i.e. the dimensions are r by c). For the matrices above, A is 2 by 2 D is 4 by 3 For a simple linear regression the matrices of initial interest would be the data matrices, a Y matrix of values of the dependent variable and an X matrix of values of the independent variable. The X matrix also has a column of ones added to fit the intercept. LMY OP MMY PP MY P Y MY P MMY PP MMY PP NY Q 1 2 3 4 5 6 7 LM1 MM1 MM1 X 1 MM1 MM1 N1 OP P X P P X P XP P X P X P Q X1 X2 3 4 5 6 7 As with our algebraic calculations we need some intermediate values; sums, sums of squares and cross-products. These are obtained by calculating First, a transpose matrix for both X and Y. This is simply the matrix turned on its side so the rows of the original matrix become the columns of the transpose. These are denoted X' and Y'. X LM 1 NX 1 1 1 1 1 1 1 X2 X3 X4 X5 X6 X7 OP Q Y Y1 Y2 Y3 Y4 Y5 Y6 Y7 We now calculate 3 matrices, X'X, Y'Y and X'Y. This requires matrix multiplication. The results of these 3 calculations are; James P. Geaghan - Copyright 2011 Statistical Techniques II XX Page 28 LM 1 NX 1 X2 1 YY Y1 Y2 1 1 X3 X 4 Y3 Y4 1 1 X5 X 6 Y5 Y6 OP PP PP PP X P X P Q LM1 MM1 1 1 OM 1 X PM QM1 MM1 MN1 X1 X2 X3 X4 = X5 7 6 LM n MM MN X O PP X P Q n X P i i 1 n n i i 1 2 i i 1 7 Y1 Y 2 Y3 n 2 Y7 Y4 Yi Y5 i=1 Y6 Y 7 LMY OP MMY PP L O Y 1 OM P M Y P Y =M X PM P M QMY P X Y PP PQ MMY PP MN MNY PQ 1 2 XY n LM 1 NX 1 X2 1 1 1 X3 X 4 1 X5 3 1 X6 i i 1 n 4 7 i 5 i i 1 6 7 Notice that the contents of these 3 matrices are the same as the values we used for the algebraic solution. n n n n n X , X , Y , Y , X Y , n i 1 i i 1 2 i i 1 i i 1 2 i i 1 i i Normal equations – when the equations needed to solve a simple linear regression are derived, the result is two equations with two unknowns that must be resolved. These are called the normal equations. The normal equations are: b0 n b1X i Yi b0 X i b1X i2 Yi X i If you solve these algebraically, you get the two equations we use to solve for b0 and b1. n When expressed as matrices this factors out to n Xi i 1 n Xi b0 Yi i 1 i 1 n n 2 b1 Xi Yi Xi i 1 i 1 n In simple matrix notation, the B matrix (vector) times the X'X equals the X'Y: (X'X)B=X'Y. As with the algebraic equations we need to solve for B (i.e. b0 and b1). If we do this with algebra, we get the usual equations. James P. Geaghan - Copyright 2011 Statistical Techniques II Page 29 Solving the matrix equations we get, B=(X'X)–1X'Y This equation is the matrix algebra solution for a simple linear regression. Note that there is no such thing as matrix “division”. The solution requires multiplying by the inverse matrix. As with the algebraic values, if we multiply the B values (B matrix) by the X values we get the predicted values. XB=X(X'X)–1X'Y = Yhat vector What do we need to know about these matrix calculations? We need to know that the solution to the problem using matrix algebra involves the same values as for the simple linear regression. We need to know that the (X'X)–1 is a key component to this solution. We need to know that the predicted values require the matrix segment X(X'X) –1X' times the Y vector (MAIN DIAGONAL). Why? We can get the matrices from SAS, but we want to understand what we have. (X'X) –1 is a key component, not only of the solution for the regression coefficients, but also for the variance-covariance matrix. The X(X'X) –1X' matrix main diagonal is a diagnostic that we will use (hat diag). But the most important reason for using matrices is that the solution for simple linear and multiple regression are the same. Basically, matrix algebra is the ONLY way to solve multiple regressions. So, what do we get from SAS? If the options XPX and I are placed on the model statement, we can get the X'X matrix and the (X'X) –1 matrix. For the simple linear regression that we saw for the tree weights and diameters, these options produce the following output. Model Cross-products X'X X'Y Y'Y X'X INTERCEP INTERCEP 47 DBH 289.2 WEIGHT 17359 DBH 289.2 1981.98 142968.3 WEIGHT 17359 142968.3 13537551 X'X Inverse, Parameter Estimates, and SSE INTERCEP DBH INTERCEP 0.2082694963 -0.030389579 DBH -0.030389579 0.004938832 WEIGHT -729.3963003 178.56371409 WEIGHT -729.3963003 178.56371409 670190.7322 The first two rows and columns of numbers contain the X'X matrix, which has the values for n Xi, i 1 n X i 1 2 i , n. X'X INTERCEP DBH INTERCEP 47 289.2 The last column has X'Y (values for DBH 289.2 1981.98 n Yi , i 1 n n i 1 i 1 X iYi ) and the last value is Y'Y ( Yi 2 ). Model Cross-products X'X X'Y Y'Y James P. Geaghan - Copyright 2011 Statistical Techniques II Page 30 X'X INTERCEP DBH WEIGHT WEIGHT 17359 142968.3 13537551 In the X'X inverse matrix section, the first two rows and columns of numbers contain the (X'X)–1 matrix and the value in the third row and third column is the SSE. The other values are b0 and b1. INTERCEP DBH WEIGHT INTERCEP 0.2082694963 -0.030389579 -729.3963003 DBH -0.030389579 0.004938832 178.56371409 WEIGHT -729.3963003 178.56371409 670190.7322 You will be responsible only for knowing where the 6 intermediate values are for simple linear regression, and where to find the (X'X)–1 matrix. Model Cross-products X'X X'Y Y'Y X'X INTERCEP INTERCEP 47 DBH 289.2 WEIGHT 17359 DBH 289.2 1981.98 142968.3 WEIGHT 17359 142968.3 13537551 X'X Inverse, Parameter Estimates, and SSE INTERCEP DBH INTERCEP 0.2082694963 -0.030389579 DBH -0.030389579 0.004938832 WEIGHT -729.3963003 178.56371409 WEIGHT -729.3963003 178.56371409 670190.7322 Multiple Regression with matrix algebra The only difference between simple linear regression and multiple regression is the fact that multiple regression has several independent variables (Xi variables). There for the matrix X'X will be larger. For a simple linear regression, X'X is 2×2. For a 3 factor multiple regression (X1, X2, X3 and an intercept) the X'X matrix will be 4×4. Y1 Y 2 Y3 Y Y4 Y5 Y6 Y 7 1 1 1 X 1 1 1 1 X11 X 21 X12 X13 X 22 X 23 X14 X 24 X15 X16 X 25 X 26 X17 X 27 X31 X32 X33 X34 X35 X36 X37 James P. Geaghan - Copyright 2011 ...
View Full Document

This note was uploaded on 12/29/2011 for the course EXST 7015 taught by Professor Wang,j during the Fall '08 term at LSU.

Ask a homework question - tutors are online