•
R
2
is not useful for non-linear fits
.
R
2
is really only useful for linear fits with the estimated
regression line free to have a non-zero intercept. The reason is that
R
2
is really a comparison
between two types of models. For example, refer back to the length-weight relationship examined
earlier.
In the linear fit case, the two models being compared are
log(
weight
) =
log
(
b
0
) +
error
vs.
log(
weight
) = log(
b
0
) +
b
1
*
log(
length
) +
error
and so
R
2
is a measure of the improvement with the regression line. [In actual fact, it is a 1-1
transform of the test that
β
1
= 0
, so why not use that statistics directly?]. In the non-linear fit case,
the two models being compared are:
weight
= 0 +
error
vs.
weight
=
b
0
*
length
* *
b
1
+
error
The model
weight=0
is silly and so
R
2
is silly.
Hence, the
R
2
values reported are really all for linear fits - it is just that sometimes the actual linear
fit is hidden.
•
Not defined in generalized least squares
. There are more complex fits that don’t assume equal
variance around the regression line. In these cases,
R
2
is again not defined.
c 2015 Carl James Schwarz
954
2015-08-20

CHAPTER 14. CORRELATION AND SIMPLE LINEAR REGRESSION
•
Cannot be used with different transformations of
Y
.
R
2
cannot be used to compare models
that are fit to different transformations of the
Y
variable. For example, many people try fitting a
model to
Y
and to
log(
Y
)
and choose the model with the highest
R
2
. This is not appropriate as
the
D
terms are no longer comparable between the two models.
•
Cannot be used for non-nested models.
R
2
cannot be used to compare models with different
sets of
X
variables unless one model is nested within another model (i.e. all of the
X
variables in
the smaller model also appear in the larger model). So using
R
2
to compare a model with
X
1
,
X
3
,
and
X
5
to a model with
X
1
,
X
2
, and
X
4
is not appropriate as these two models are not nested. In
these cases, AIC should be used to select among models.
14.5
A no-intercept model: Fulton’s Condition Factor
K
It is possible to fit a regression line that has an intercept of 0, i.e., goes through the origin. Most computer
packages have an option to suppress the fitting of the intercept.
The biggest ‘problem’ lies in interpreting some of the output – some of the statistics produced are
misleading for these models. As this varies from package to package, please seek advice when fitting
such models.
The following is an example of where such a model may be sensible.
Not all fish within a lake are identical. How can a single summary measure be developed to represent
the condition of fish within a lake?
In general, the the relationship between fish weight and length follows a power law:
W
=
aL
b
where
W
is the observed weight;
L
is the observed length, and
a
and
b
are coefficients relating length
to weight. The usual assumption is that heavier fish of a given length are in better condition than than
lighter fish. Condition indices are a popular summary measure of the condition of the population.