the income variable to explain family consumption expenditure. But unfortunately, information on family
wealth generally is not available. Therefore,
we may be forced to omit the wealth variable from our model despite its
great theoretical relevance in ex
3
2Xi +ui
2.8.What is meant by an intrinsically linear regression model? If 2in exercise 2.7dwere 0.8, would it be a
linear or nonlinear regression model?
*2.9.Consider the following nonstochastic models (i.e., models without the stochastic error term). A
32 PART ONE: SINGLE-EQUATION REGRESSION MODELS
16
Subtract from the current years CPI the CPI from the previous year, divide the difference by the previous
years CPI, and multiply the result by 100. Thus, the inflation rate for
Canada for 1974 is [(45.240
Companies, 2004
CHAPTER TWO: TWO-VARIABLE REGRESSION ANALYSIS: SOME BASIC IDEAS 47
our theory is not strong enough to suggest what other variables might be
included, why introduce more variables? Let ui
represent all other variables.
Of course, we should
2.3 THE MEANING OF THE TERM LINEAR
Since this text is concerned primarily with linear models like (2.2.2), it is essential to know what the term
linear really means, for it can be interpreted
in two different ways.
Linearity in the Variables
The first and
For an illuminating account, see Albert T. Somers, The U.S. Economy Demystified: What
the Major Economic Statistics Mean and their Significance for Business,D.C. Heath, Lexington,
Mass., 1985.
13
In the social sciences too sometimes one can have a control
time period shown in the table?
TABLE 1.3 EXCHANGE RATES FOR SEVEN COUNTRIES: 19771998
Year Canada France Germany Japan Sweden Switzerland U.K.
1977 1.063300 4.916100 2.323600 268.6200 4.480200 2.406500 1.744900
1978 1.140500 4.509100 2.009700 210.3900 4.
14
We will explore these data
(with additional explanatory variables) in Chapter 3 (Example 3.3, p. 91).
Plotting the (conditional) mean wage against education, we obtain the
picture in Figure 2.6. The regression curve in the figure shows how mean
wages v
therefore difficult to study the interfirm differences on these items.
Because of all these and many other problems,the researcher should
always keep in mind that the results of research are only as good as the
quality of the data.Therefore, if in given s
For instance, in assessing the impact of obesity on blood pressure, the researcher would want to collect
data while holding constant the eating, smoking, and drinking habits of the people in order to minimize
the influence of
these variables on blood pres
u
i
ui
Yi
Y
i
A
FIGURE 2.5 Sample and population regression lines.
the PRF based on the SRF is at best an approximate one. This approximation is shown diagrammatically
in Figure 2.5.
ForX=Xi
, we have one (sample) observation Y=Yi. In terms of the
SRF, th
is the difference between the stochastic error term and the residual, ui?
2.4.Why do we need regression analysis? Why not simply use the mean value
of the regressand as its best value?
2.5.What do we mean by a linearregression model?
2.6.Determine whether
conditional expectation function (CEF)orpopulation regression function (PRF)orpopulation regression
(PR)for short. It states merely that
theexpected valueof the distribution ofYgivenXiis functionally related toXi
.
In simple terms, it tells how the mean o
ofui
(conditional upon the given Xs) are zero.
From the previous discussion, it is clear (2.2.2) and (2.4.2) are equivalent
forms if E(ui
|Xi)=0.
9
But the stochastic specification (2.4.2) has the
advantage that it clearly shows that there are other varia
are not likely to be the same.
Now, analogously to the PRF that underlies the population regression
line, we can develop the concept of the sample regression function(SRF)
to represent the sample regression line. The sample counterpart of (2.2.2)
may be w
+X
1
2 Y= e
FIGURE 2.3 Linear-in-parameter functions.
TABLE 2.3 LINEAR REGRESSION MODELS
Model linear in parameters? Model linear in variables?
Yes No
Yes LRM LRM
No NLRM NLRM
Note: LRM= linear regression model
NLRM= nonlinear regression model
2.4 STOCHAS
SRF2
SRF1
to fit the scatters reasonably well: SRF1is based on the first sample, and
SRF2is based on the second sample. Which of the two regression lines represents the true population
regression line? If we avoid the temptation of
looking at Figure 2.1,
CHAPTER TWO: TWO-VARIABLE REGRESSION ANALYSIS: SOME BASIC IDEAS 45
8
SeeApp. Afor a brief discussion of the properties of the expectation operator E. Note that
E(Y|Xi), once the value of Xi
is fixed, is a constant.
9
As a matter of fact, in the method of
5
In the present example the PRL is a straight line, but it could be a curve (see Figure 2.3).
say, $140, we get the answer $101 (the conditional mean). To put it differently, if we ask the question,
What is the best (mean) prediction of weekly
expenditur
takes the value 3. Therefore, E(Y|X=3)=1+92, which is obviously linear in 1and2. All the models
shown in Figure 2.3 are thus linear regression models, that is, models linear in the parameters.
Now consider the model E(Y|Xi)=1+
2
Xi
. Now suppose X=3; then
Table 2.1 we have given the mean, or average, weekly consumption expenditure corresponding to each
of the 10 levels of income. Thus, corresponding to the weekly income level of $80, the mean
consumption expenditure is
$65, while corresponding to the incom
Companies, 2004
52 PART ONE: SINGLE-EQUATION REGRESSION MODELS
2.8 SUMMARY AND CONCLUSIONS
1.The key concept underlying regression analysis is the concept of
the conditional expectation function (CEF), or population regression
function (PRF).Our objective