CHAPTER 12: MULTIPLE LINEAR
REGRESSION & CORRELATION
12.1 MULTIPLE LINEAR REGRESSION AND CORRELATION ANALYSIS
As we mentioned in
Chapter 7
, we can use
more than one independent variable
to
estimate the value of the dependent variable and, in this way, attempt to increase the
accuracy of the estimate. This process is called
multiple linear regression and
correlation analysis
. In fact,
multiple linear regression analysis
is merely an extension
of
simple linear regression analysis
and is used for testing hypotheses about the
relationship between
a dependent variable
(
response variable
) and
two
or more
independent variables
(
predictors
) and for prediction. For example, the selling price of
a home may be modeled as a function of the number of rooms, the size of the surrounding
lot, and the total square footage of a house. Also, we can use the customer’s age, gender,
income level, type of residence, etc. to predict how much they will spend on an
automobile.
In general, the dependent variable is designated by Y, while the
k
quantitative
independent variables are designated sequentially by X
1
, X
2
, …, and X
k
. The general
descriptive form of a multiple linear regression equation for the population is then given
by the following formula:
Y
j
=
β
0
+
β
1
X
j1
+
β
2
X
j2
+
β
3
X
j3
+ … +
β
k
X
jk
+
ε
j
where:
Y
j
=
The value of the j
th
observation of the variable
Y
, with
j
=
1
,
2
, …,
n
.
β
0
= Population’s regression constant
β
h
= Population’s regression coefficient for each independent variable
X
h
; where
h = 1, 2, …, k
.
k
= Number of independent variables
ε
= (Greek letter epsilon) Random error term or residual.
Based on the sample data, the least-squares multiple linear regression equation is written
as
ŷ =
b
0
+ b
1
x
1
+ b
2
x
2
+ b
3
x
3
+ … + b
k
x
k
Constants
b
1
,
b
2
,
b
3
, …, and
b
k
are called
partial regression coefficients
. They indicate
the change in the estimated value of the dependent variable for a unit change in one of the
independent variable, when the other independent variables are held constant. The
constant
b
0
is the
y-intercept
, the value of Y when all the
x
h
’
s are zero; and
x
1
,
x
2
,
x
3
,
etc. are respectively the values of the independent variables
X
1
,
X
2
,
X
3
, etc.
Dr. LOHAKA – QBA 2305 CHAPT. 12: MULTIPLE REGRESSION & CORRELATION
Page
118