are in the model. Such tests may lead to the exclusion of
certain variables from the model. On the other hand, other variables such as
horsepower
x
4
may be important and should be included. Then the model needs
to be extended so that its predictive capability is increased.
The model in Eq. (1.9) is quite simple and should provide a useful starting
point for our modeling. Of course, we do not know the values of the model
coef
fi
cients, nor do we know whether the functional representation is appropriate.
For that we need data. One must keep in mind that there are only 38 observations
and that one cannot consider models that contain too many unknown parameters.
A reasonable strategy starts with simple
parsimonious
models such as the one
speci
fi
ed here and then checks whether this representation is capable of explaining
the main features of the data. A parsimonious model is simple in its structure
and economical in terms of the number of unknown parameters that need to be
estimatedfromdata,yetcapableofrepresentingthekeyaspectsoftherelationship.
We will say more on model building and model checking in subsequent chapters.
The introduction in this chapter is only meant to raise these issues.
1.3
A GENERAL MODEL
In all of our examples, we have looked at situations in which a single response
variable
y
is modeled as
y
=
μ
+
ε
(1.10a)
The deterministic component
μ
is written as
μ
=
β
0
+
β
1
x
1
+
β
2
x
2
+ · · · +
β
p
x
p
(1.10b)
where
x
1
,
x
2
,. . . ,
x
p
are
p
explanatory variables. We assume that the explana-
tory variables are
“fi
xed
”—
that is, measured without error. The parameter

Abraham
Abraham
˙
C01
November 8, 2004
0:33
16
Introduction to Regression Models
β
i
(
i
=
1
,
2
,. . . ,
p
)
is interpreted as the change in
μ
when changing
x
i
by one
unit while keeping
all
other explanatory variables the same.
The random component
ε
is a random variable with zero mean,
E
(
ε
)
=
0
,
and variance
V
(
ε
)
=
σ
2
that is constant for all cases and that does not depend on
the values of
x
1
,
x
2
,. . . ,
x
p
. Furthermore, the errors for different cases,
ε
i
and
ε
j
,
are assumed independent. Since the response
y
is the sum of a deterministic and
a random component, we
fi
nd that
E
(
y
)
=
μ
and
V
(
y
)
=
σ
2
.
WerefertothemodelinEq.(1.10)as
linearintheparameters
.Toexplainthe
idea of linearity more fully, consider the following four models with deterministic
components:
i.
μ
=
β
0
+
β
1
x
ii.
μ
=
β
0
+
β
1
x
1
+
β
2
x
2
(1.11)
iii.
μ
=
β
0
+
β
1
x
+
β
2
x
2
iv.
μ
=
β
0
+
β
1
exp
(
β
2
x
)
Models (i)
–
(iii) are linear in the parameters since the derivatives of
μ
with re-
spect to the parameters
β
i
,
∂
μ/
∂β
i
, do not depend on the parameters. Model
(iv) is nonlinear in the parameters since the derivatives
∂
μ/
∂β
1
=
exp
(
β
2
x
)
and
∂
μ/
∂β
2
=
β
1
x
exp
(
β
2
x
)
depend on the parameters.
The model in Eqs. (1.10a) and (1.10b) can be extended in many different
ways. First, the functional relationship may be nonlinear, and we may consider
a model such as that in Eq. (1.11iv) to describe the nonlinear pattern. Second,
we may suppose that
V
(
y
)
=
σ
2
(
x
)
is a function of the explanatory variables.