EXST7015 Fall2011 Appendices (3)

EXST7015 Fall2011 Appendices (3) - Statistical Techniques...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Statistical Techniques II Intrinsically Linear Regression 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Appendix 3 Annotated SAS example Page 159 ***********************************************; *** Data from Freund & Wilson (1993) ***; *** TABLE 8.24 : ESTIMATING TREE WEIGHTS ***; ***********************************************; options ps=256 ls=132 nocenter nodate nonumber; data one; infile cards missover; TITLE1 'EXST7015: Estimating tree weights from other morphometric variables'; input ObsNo Dbh Height Age Grav Weight ObsID $; ******** label ObsNo = 'Original observation number' Dbh = 'Diameter at breast height (inches)' Height = 'Height of the tree (feet)' Age = 'Age of the tree (years)' Grav = 'Specific gravity of the wood' Weight = 'Harvest weight of the tree (lbs)' ObsId = 'Identification letter added to dataset'; lweight = log(weight); ldbh = log(DBH); cards; NOTE: The data set WORK.ONE has 47 observations and 9 variables. NOTE: DATA statement used: real time 0.06 seconds cpu time 0.06 seconds 19 ! run; 67 ; 68 proc print data=one; TITLE2 'Raw data print'; run; NOTE: There were 47 observations read from the data set WORK.ONE. NOTE: The PROCEDURE PRINT printed page 1. NOTE: PROCEDURE PRINT used: real time 0.03 seconds cpu time 0.03 seconds EXST7015: Estimating tree weights from other morphometric variables Raw data print Obs 1 2 3 . Obs No 1 2 3 Dbh 5.7 8.1 8.3 Height 34 68 70 Age 10 17 17 Grav 0.409 0.501 0.445 Weight 174 745 814 Obs ID a b c lweight 5.15906 6.61338 6.70196 ldbh 1.74047 2.09186 2.11626 614 194 66 S T U 6.41999 5.26786 4.18965 2.07944 1.64866 1.30833 . . (see previous handout for full listing) 45 46 47 45 46 47 8.0 5.2 3.7 61 47 33 13 13 13 0.508 0.432 0.389 70 options ls=99 ps=55; TITLE2 'Scatter plot'; 71 proc plot data=one; plot weight*Dbh=obsid; 72 run; 73 options ps=256 ls=132; 74 NOTE: There were 47 observations read from the data set WORK.ONE. NOTE: The PROCEDURE PLOT printed page 2. NOTE: PROCEDURE PLOT used: real time 0.01 seconds cpu time 0.01 seconds James P. Geaghan - Copyright 2011 Statistical Techniques II Intrinsically Linear Regression Appendix 3 Annotated SAS example Page 160 EXST7015: Estimating tree weights from other morphometric variables Scatter plot Plot of Weight*Dbh. Symbol is value of ObsID. Weight | | 1800 + | | f q | 1600 + | | g | 1400 + | | | 1200 + | | | 1000 + | | | 800 + c z | b t | N | 600 + S | y | s | D 400 + d | l u | B COj o | F Pe 200 + A L mk J | G a | Kn Hw h | iI p r 0 + | -+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+ 3 4 5 6 7 8 9 10 11 12 13 Dbh NOTE: 8 obs hidden. 75 76 77 78 NOTE: NOTE: 78 79 80 80 81 NOTE: NOTE: NOTE: proc reg data=one lineprinter; ID ObsID; TITLE2 'Simple linear regression'; model Weight = Dbh / CLB; output out=next1 p=yhat r=e; run; 47 observations read. 47 observations used in computations. ! options ls=99 ps=55; plot residual.*predicted.=obsid / VREF=0; run; ! options ps=256 ls=132; The data set WORK.NEXT1 has 47 observations and 11 variables. The PROCEDURE REG printed pages 3-4. PROCEDURE REG used: real time 0.10 seconds cpu time 0.10 seconds James P. Geaghan - Copyright 2011 Statistical Techniques II Intrinsically Linear Regression Appendix 3 Annotated SAS example Page 161 EXST7015: Estimating tree weights from other morphometric variables Simple linear regression The REG Procedure Model: MODEL1 Dependent Variable: Weight Analysis of Variance Source Model Error Corrected Total Root MSE Dependent Mean Coeff Var Sum of Squares 6455980 670191 7126171 DF 1 45 46 122.03740 369.34043 33.04198 R-Square Adj R-Sq Mean Square 6455980 14893 F Value 433.49 Pr > F <.0001 0.9060 0.9039 Parameter Estimates Variable Intercept Dbh DF 1 1 Parameter Estimate -729.39630 178.56371 Standard Error 55.69366 8.57640 t Value -13.10 20.82 Pr > |t| <.0001 <.0001 95% Confidence Limits -841.56910 -617.22350 161.28996 195.83747 Recall the original SLR. The fit didn’t seem to bad except for the negative intercept. The apparent curvature and possible lack of homogeneity of the residuals also indicated possible problems. There were even some possible outliers. The residuals, however, appeared to be normal. Pr<W = 0.3544 ---+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+---RESIDUAL | | | | | | 400 + + | | | f | | | | | 300 + + | | | q | | | | | 200 + + | | | i | R | ?K g | e | | s 100 + ? + i | p G | d | Hw c | u | ? A B N | a | b | l 0 + r ? l + | F z | | C | | ? O | | E u S | -100 + j y + | a P d D | | J | | e o t | | | -200 + + | | | | | | | | -300 + s + | | | | ---+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----100 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 Predicted Value of Weight PRED James P. Geaghan - Copyright 2011 Statistical Techniques II Intrinsically Linear Regression Appendix 3 Annotated SAS example Page 162 85 86 proc reg data=one lineprinter; ID ObsID; 87 TITLE2 'Simple linear regression on logarithms'; 88 model lWeight = lDbh / CLB; 89 output out=next2 p=yhat r=le; 90 TEST lDBH = 3; 91 run; 91 ! options ls=99 ps=55; 92 plot residual.*predicted.=obsid / VREF=0; 93 run; 93 ! options ps=256 ls=132; 94 NOTE: The data set WORK.NEXT2 has 47 observations and 11 variables. NOTE: The PROCEDURE REG printed pages 6-8. NOTE: PROCEDURE REG used (Total process time): real time 0.01 seconds cpu time 0.01 seconds The REG Procedure Model: MODEL1 Dependent Variable: lweight Analysis of Variance Source Model Error Corrected Total Root MSE Dependent Mean Coeff Var Sum of Squares 35.94979 1.30846 37.25825 DF 1 45 46 0.17052 5.49466 3.10337 R-Square Adj R-Sq Mean Square 35.94979 0.02908 F Value 1236.37 Pr > F <.0001 0.9649 0.9641 Parameter Estimates Variable Intercept ldbh DF 1 1 Parameter Estimate 0.55219 2.79854 Standard Error 0.14275 0.07959 t Value 3.87 35.16 Pr > |t| 0.0004 <.0001 95% Confidence Limits 0.26469 0.83970 2.63824 2.95884 The model fits well. The R2 value is actually higher than the value for the SLR. Recall the original power model (before transformation) was Yi = b 0 X ib1 e i . After taking logarithms we fitted actually a simple linear regression, Yi = b 0 b 1 X i , where b0 is the log of the original b0. After the model is fitted we can back-transform the model, taking the antilog of b0 and putting the model back in the original form. Where the antilog of 0.55219 is e0.55219=1.7371 the model in the original form is Yi = 1.7371X i2.79854 . In this form the line will graph as a curve. The value of the slope is not too different from the hypothesized value of 3. However, the test of the slope against 3 (below) shows a significant difference. Test 1 Results for Dependent Variable lweight Mean Source DF Square F Value Pr > F Numerator 1 0.18629 6.41 0.0149 Denominator 45 0.02908 James P. Geaghan - Copyright 2011 Statistical Techniques II Intrinsically Linear Regression Appendix 3 Annotated SAS example Page 163 So how about the residual plot? Recall that for the linear model it showed curvature and possible nonhomogeneous variance. The plots now look good: No curvature, good homogeneous spread and no outliers. ------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+-----RESIDUAL | | | | | | | | | | 0.4 + + | B | | | | | | l N | | K c | 0.2 + G A b + | | | L C | | T F | | Q O u z f | R | I ? y S | e 0.0 + i n M d + s | U w ?k j D | i | E | d | R g q | u | H P o | a | t | l -0.2 + J + | e | | a | | | | p s | | | -0.4 + + | | | r | | | | | | | -0.6 + + | | | | | | | | | | ------+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+-----4.00 4.25 4.50 4.75 5.00 5.25 5.50 5.75 6.00 6.25 6.50 6.75 7.00 7.25 7.50 7.75 Predicted Value of lweight PRED 95 proc univariate data=next2 normal plot; var le; 96 TITLE3 'Residual analysis (log model)'; run; NOTE: The PROCEDURE UNIVARIATE printed page 9. NOTE: PROCEDURE UNIVARIATE used (Total process time): real time 0.00 seconds cpu time 0.00 seconds James P. Geaghan - Copyright 2011 Statistical Techniques II Intrinsically Linear Regression Appendix 3 Annotated SAS example Page 164 EXST7015: Estimating tree weights from other morphometric variables Simple linear regression on logarithms Residual analysis (log model) The UNIVARIATE Procedure Variable: le (Residual) Moments 47 Sum Weights 0 Sum Observations 0.16865578 Variance -0.3773174 Kurtosis 1.30845959 Corrected SS . Std Error Mean N Mean Std Deviation Skewness Uncorrected SS Coeff Variation 47 0 0.02844477 0.45701405 1.30845959 0.02460097 Residual from the transformed model were also tested for normality. The hypothesis of normality is not rejected for these results (Pr < W = 0.5634). Test Shapiro-Wilk Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling Tests for Normality --Statistic--W 0.979294 D 0.128993 W-Sq 0.069887 A-Sq 0.396238 -----p Value-----Pr < W 0.5634 Pr > D 0.0483 Pr > W-Sq >0.2500 Pr > A-Sq >0.2500 Extreme Observations ------Lowest-----------Highest----Value Obs Value Obs -0.457352 -0.337451 -0.329823 -0.263905 -0.237736 Stem 3 3 2 2 1 1 0 0 -0 -0 -1 -1 -2 -2 -3 -3 -4 -4 18 16 19 1 5 0.227337 0.234177 0.267333 0.268304 0.363138 Leaf 6 # 1 77 1133 9 0123 556668 013334 333332210 8 443000 5 40 6 43 2 4 1 4 6 6 9 1 6 1 2 1 2 6 1 ----+----+----+----+ Multiply Stem.Leaf by 10**-1 Boxplot | | | | | +-----+ | | *--+--* | | | | +-----+ | | | | 0 3 37 40 12 28 Normal Probability Plot 0.375+ +*+ | +++ | +*+* | **** | +*+ | +*** | +*** | **** | ***** | *+++ | ****+ | *++ | +** | +++* | ++* * | +++ |++ -0.475+ * +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 James P. Geaghan - Copyright 2011 ...
View Full Document

This note was uploaded on 12/29/2011 for the course EXST 7015 taught by Professor Wang,j during the Fall '08 term at LSU.

Ask a homework question - tutors are online