(d) Analyze the residuals from this multiple regression. Are there
any patterns of interest?
(e) One of the owners is troubled by the equation because the
intercept is not zero (that is, no items sold should result in $0
gross sales). Explain to this owner why this isn’t a problem.
11.32 Architectural firm billings.
A summary of firms en-
gaged in commercial architecture in the Indianapolis, Indiana,
area provides firm characteristics including total annual billing
and the number of architects, engineers, and staff employed in
the firm.
10
Consider developing a model to predict total billing.
D
A
T
A
F
I
L
E
ARCHITECT
(a) Using numerical and graphical summaries, describe the dis-
tribution of total billing and the number of architects, engineers,
and staff.
(b) For each of the 6 pairs of variables, use graphical and numer-
ical summaries to describe the relationship.
(c) Carry out a multiple regression. Report the fitted regression
equation and the value of the regression standard error
s
.
(d) Analyze the residuals from the multiple regression. Are there
any concerns?
(e) The firm HCO did not report its total billing but employs 3 ar-
chitects, 1 engineer, and 17 staff members. What is the predicted
total billing for this firm?
11.2 Inference for Multiple Regression
To move from using multiple regression for data analysis to inference in the multiple
regression setting, we need to make some assumptions about our data. These assumptions
are summarized in the form of a statistical model. As with all the models that we have
studied, we do not require that the model be exactly correct. We only require that it be
approximately true and that the data do not severely violate the assumptions.
Recall that the
simple linear regression model
assumes that the mean of the response
variable
y
depends on the explanatory variable
x
according to a linear equation
μ
y
=
β
0
+
β
1
x
For any fixed value of
x
, the response
y
varies Normally around this mean and has a
standard deviation
σ
that is the same for all values of
x
.
In the
multiple regression
setting, the response variable
y
depends on not one but
p
explanatory variables, denoted by
x
1
,
x
2
,...,
x
p
. The mean response is a linear function
of the explanatory variables:
μ
y
=
β
0
+
β
1
x
1
+
β
2
x
2
+ ··· +
β
p
x
p
This expression is the
population regression equation.
We do not observe the
population regression
equation
mean response, because the observed values of
y
vary about their means. We can think
of subpopulations of responses, each corresponding to a particular set of values for
all
the explanatory variables
x
1
,
x
2
,...,
x
p
. In each subpopulation,
y
varies Normally with
a mean given by the population regression equation. The regression model assumes that
the standard deviation
σ
of the responses is the same in all subpopulations.

11.2
Inference for Multiple Regression
591
Multiple linear regression model
To form the multiple regression model, we combine the population regression equation
with assumptions about the form of the
variation
of the observations about their mean.