# R 2 is not useful for non linear fits r 2 is really

• Notes
• 78
• 100% (2) 2 out of 2 people found this document helpful

This preview shows page 71 - 73 out of 78 pages.

R 2 is not useful for non-linear fits . R 2 is really only useful for linear fits with the estimated regression line free to have a non-zero intercept. The reason is that R 2 is really a comparison between two types of models. For example, refer back to the length-weight relationship examined earlier. In the linear fit case, the two models being compared are log( weight ) = log ( b 0 ) + error vs. log( weight ) = log( b 0 ) + b 1 * log( length ) + error and so R 2 is a measure of the improvement with the regression line. [In actual fact, it is a 1-1 transform of the test that β 1 = 0 , so why not use that statistics directly?]. In the non-linear fit case, the two models being compared are: weight = 0 + error vs. weight = b 0 * length * * b 1 + error The model weight=0 is silly and so R 2 is silly. Hence, the R 2 values reported are really all for linear fits - it is just that sometimes the actual linear fit is hidden. Not defined in generalized least squares . There are more complex fits that don’t assume equal variance around the regression line. In these cases, R 2 is again not defined. c 2015 Carl James Schwarz 954 2015-08-20
CHAPTER 14. CORRELATION AND SIMPLE LINEAR REGRESSION Cannot be used with different transformations of Y . R 2 cannot be used to compare models that are fit to different transformations of the Y variable. For example, many people try fitting a model to Y and to log( Y ) and choose the model with the highest R 2 . This is not appropriate as the D terms are no longer comparable between the two models. Cannot be used for non-nested models. R 2 cannot be used to compare models with different sets of X variables unless one model is nested within another model (i.e. all of the X variables in the smaller model also appear in the larger model). So using R 2 to compare a model with X 1 , X 3 , and X 5 to a model with X 1 , X 2 , and X 4 is not appropriate as these two models are not nested. In these cases, AIC should be used to select among models. 14.5 A no-intercept model: Fulton’s Condition Factor K It is possible to fit a regression line that has an intercept of 0, i.e., goes through the origin. Most computer packages have an option to suppress the fitting of the intercept. The biggest ‘problem’ lies in interpreting some of the output – some of the statistics produced are misleading for these models. As this varies from package to package, please seek advice when fitting such models. The following is an example of where such a model may be sensible. Not all fish within a lake are identical. How can a single summary measure be developed to represent the condition of fish within a lake? In general, the the relationship between fish weight and length follows a power law: W = aL b where W is the observed weight; L is the observed length, and a and b are coefficients relating length to weight. The usual assumption is that heavier fish of a given length are in better condition than than lighter fish. Condition indices are a popular summary measure of the condition of the population.