**Unformatted text preview: **ECON 301 - Introduction to Econometrics I
METU - Department of Economics February 2014 LECTURE 01
INTRODUCTION AND ASSUMPTIONS OF
CLASSICAL LINEAR REGRESSION MODEL (CLRM)
Outline of today’s lecture:
1. What is Econometrics?........................................................................................................................ 1 2. Types of Econometrics ........................................................................................................................ 2 3. Simple Linear Regression Model ......................................................................................................... 2 4. Linearity Issue ..................................................................................................................................... 4 5. Stochastic Nature of Linear Regression Model ................................................................................... 4 6. Sources of Disturbance Term .............................................................................................................. 5 7. Assumptions of Classical Linear Regression Model (CLRM) ................................................................ 8 1. What is Econometrics? Literally
interpreted,
measurement.” econometrics means “economic Although measurement is an important part of econometrics, the
scope of econometrics is much broader, as can be seen from the
following quotations:
o . . . econometrics may be defined as the quantitative analysis of actual economic
phenomena based on the concurrent development of theory and observation,
related by appropriate methods of inference.1
o Econometrics may be defined as the social science in which the tools of economic
theory, mathematics, and statistical inference are applied to the analysis of
economic phenomena.2
o Econometrics is concerned with the empirical determination of economic laws.3 1 Samuelson, T. C. Koopmans, and J. R. N. Stone “Report of the Evaluative Committee for Econometrica,”
Econometrica, vol. 22, no. 2, April 1954, pp. 141–146.
2
Arthur S. Goldberger, Econometric Theory, John Wiley & Sons, New York, 1964, p. 1.
3
H. Theil, Principles of Econometrics, John Wiley & Sons, New York, 1971, p. 1.
Instructor: Dr. Ozan ERUYGUR
Lecture Notes e-mail: [email protected] 1 ECON 301 - Introduction to Econometrics I
METU - Department of Economics February 2014 2. Types of Econometrics Econometrics may be divided into two broad categories:
theoretical econometrics and applied econometrics. In each category, one can approach the subject in the classical
(frequentist) or Bayesian tradition. In this course, the emphasis is on the classical approach. 3. Simple Linear Regression Model In econometrics we deal exclusively with stochastic relations.
o Stochastic is a term for random or uncertain.
o It is the opposite of deterministic. The simplest form of stochastic relation between two variables X
and Y is called a simple linear regression model.
o Yt 0 1 X t ut t=1,..., n where Y is dependent variable
X is independent (explanatory) variable4
u is the stochastic disturbance term, or error term.
0 and 1 are the regression parameters which are unknown. Subscript t refers to the tth observation.
4 The terms regressor, regressand and covariate are also used. Instructor: Dr. Ozan ERUYGUR
Lecture Notes e-mail: [email protected] 2 ECON 301 - Introduction to Econometrics I
METU - Department of Economics February 2014 The values of the variables X and Y are observable. However, the values of u are not observable. Observations on X and Y can be made over time, in which case we
speak of having “time series” data.
o For example we may have data on Turkey’s GDP over 30
years (data collected over a period of time). Or they can be made over individuals, or groups of individuals,
firms, or group of firms,
countries, or group of countries,
objects,
geographical areas, etc., in which we speak of having “cross section data”.
o For example we may have data on several countries’ GDP for
one year (data collected at one point in time). Hence, the subscript t may refer to the tth year (quarter, month, day,
etc) or to the tth individual or group (such as countries; Turkey,
Germany, France, USA, Japan, etc..). Data of both kinds can be combined to obtain “pooled times and
cross-section data” or simply “pooled data”
o For example we may have data on 20 countries’ GDP over 30
years.
o There is a special type of pooled data, the panel or
longitudinal data, also called micropanel data, in which the
same cross-sectional unit (say, a family or firm) is surveyed
over time (Gujarati, p.23).
Instructor: Dr. Ozan ERUYGUR
Lecture Notes e-mail: [email protected] 3 ECON 301 - Introduction to Econometrics I
METU - Department of Economics February 2014 Hence, panel data is a special case of “pooled data”.
The phrase "pooled data" is more general. It implies
that the data have both a time-series and a crosssectional dimension, but it includes the case where the
individuals (or sections) are not the same. For example; some individuals drop out over time
and are replaced by others. 4. Linearity Issue Yt 0 1 3 X t ut is nonlinear in the parameter 1 . This model is an example of a nonlinear (in the parameter)
regression model. 5. Stochastic Nature of Linear Regression Model The stochastic nature of the regression model implies that for every
value of X there is a whole probability distribution of values of Y.
Instructor: Dr. Ozan ERUYGUR
Lecture Notes e-mail: [email protected] 4 ECON 301 - Introduction to Econometrics I
METU - Department of Economics February 2014 Uncertainty concerning Y arises because of the presence of the
stochastic disturbance u which, being random, imparts randomness
to Y.
o For example, consider a production function of a firm.
o Suppose that output depends in some specified way on the
quantity of labor (L) in accordance with firm’s technology.
Outputt 0 1Labort ut This function may apply in the short run when the
quantities of other factors are fixed.
o In general, the same quantity of labor will lead to different
quantities of output because of variations in weather, human
performance, frequency of machine breakdowns, and many
other factors.
o Output, which is the dependent variable in this case, will
depend not only on the quantity of labor input, but also on the
large number of random causes, which we summarize in the form of the stochastic
disturbance (u). 6. Sources of Disturbance Term
The existence of the disturbance term is justified in three main ways.
(l) Omission of the influence of chance events [Not Omitted
Variable Case!] Although income might be the major
determinant of the level of consumption, it is not the only
determinant. Other variables, such as the interest rate or liquid
asset holdings may have a systematic influence on consumption.
Their omission constitutes one type of specification error: the
Instructor: Dr. Ozan ERUYGUR
Lecture Notes e-mail: [email protected] 5 ECON 301 - Introduction to Econometrics I
METU - Department of Economics February 2014 nature of the economic relationship is not correctly specified. In
addition to these systematic influences, however, are
innumerable less systematic influences (chance events) such as
weather variations, taste changes, earthquakes, epidemics,
strikes, etc.. Although some of these variables may have a
significant impact on consumption and thus should definitely be
included in the specified relationship, many have only a very
slight, irregular influence: the disturbance is often viewed as
representing the net influence of a large number of such small
and independent causes (Kennedy, p.3).
In any economic model, we want to include all the important
and relevant explanatory variables in the model, so the error
term u is a ‘‘storage bin’’ for unobservable and/or unimportant
factors affecting level of consumption. As such, it adds noise
that masks the relationship between X (income) and Y
(consumption) (Griffith and Hill, p. 48).
Note that, if we have omitted some important factor, made any
other serious specification error (omitted variable case), then the
assumption of E (ut ) 0 that we will see in the next section will
be violated, which will have serious consequences (Griffith and
Hill, p. 48)
(2) Measurement error in Y [not in X!] It may be the case that
the dependent variable Y cannot be measured accurately, either
because of data collection difficulties or because it is inherently
unmeasurable and a proxy variable must be used in its stead.
The disturbance term can in these circumstances be thought of
as representing this measurement error. On the other hand, errors in measuring the independent
variable(s) X (as opposed to the dependent variable Y)
create a serious econometric problem and hence this
situation cannot be stated as a source of the existence of
the disturbance term. Instructor: Dr. Ozan ERUYGUR
Lecture Notes e-mail: [email protected] 6 ECON 301 - Introduction to Econometrics I
METU - Department of Economics February 2014 For example:
(1) It may be that dependent variable Z is exactly related to
X:
Z 0 1 X t
but we observe Yt Zt ut , where ut denotes the
measurement error. Thus:
Yt 0 1 X t ut However, note that, this justification is less satisfactory since (as
we will see later) measurement error in X, as stated above, can
cause considerable problems.
(3) Human Indeterminacy Human behavior is such that actions
taken under identical circumstances will differ in a random way.
The disturbance term can be thought of as representing this
inherent randomness in human behavior.
In other words, People are not predictable in their behavior.
There is an element of unpredictability in all human behavior.
We might interpret our model as combining a systematic or
deterministic component ( 0 1 X t ) with a random or
stochastic component ( ut ) that represents the unpredictable
element in the relationship.
To sum up, in the social sciences, controlled experiments are usually
not possible. For example, an economist cannot hold Turkish national
income constant for several years while examining the effect of
interest rate on investment. Since the economist cannot neutralize
extraneous influences by holding them constant, the best alternative is
to take them explicitly into account, by regression Y on X and the
extraneous factors (Wonnacott, 1979, p.23) Instructor: Dr. Ozan ERUYGUR
Lecture Notes e-mail: [email protected] 7 ECON 301 - Introduction to Econometrics I
METU - Department of Economics February 2014 7. Assumptions of Classical Linear Regression
Model (CLRM)
The following assumptions are important in CLRM5. Assumption 1 (A1): Linear regression model [the regression
model is linear in the parameters although it may be nonlinear in
variables] In simple regression model, the value of Yt , for each value
of X t is Yt 0 1 X t ut Assumption 2 (A2): X values are fixed in repeated sampling or X
values must be independent of the error term: Cov( X t , ut ) 0 .
In many practical cases, Xt is fixed across samples is not just the
simplest assumption; it is also the most realistic assumption6.
Keeping the value of income X fixed, say, at level 1,100 TL, we
draw at random a family and observe its consumption
expenditure (Y) as, say, 640 TL. Still keeping at 1,100 TL, we
draw at random another family and observe its Y value as 580
TL. In each of these drawings (i.e. repeated sampling), the
value of X is fixed at 1,100 TL.
This is the sense in which we are assuming the X’s are fixed in
repeated sampling (or, across samples). Assumption 3 (A3): Zero mean value of disturbance ut [ E (ut ) 0 ]
which is equivalent to assuming that
E (Y | X t ) 0 1 X t o In other words, the expected value of the disturbance term
for any X should be 0.
5
6 Gujarati, D. (2009) Basic Econometrics, 5th Edition, Mc Graw Hill, pp.62-68.
Murray, M. P. (2006) Econometrics: A Modern Introduction, Pearson International, pp. 41-42. Instructor: Dr. Ozan ERUYGUR
Lecture Notes e-mail: [email protected] 8 ECON 301 - Introduction to Econometrics I
METU - Department of Economics February 2014 Assumption 4 (A4): Homoscedasticity, i.e. constant variance of ut
[Var (ut ) 2 ].
o Population variance of the disturbance term ( ut ) should be
constant for all observations (Homoscedasticity).
o If this condition is not satisfied, the OLS regression
coefficients will be inefficient.
o In terms of our production example, this assumption
implies that the variation in output is the same whether the
quantity of labor is 20, 100, or any other number of units. Instructor: Dr. Ozan ERUYGUR
Lecture Notes e-mail: [email protected] 9 ECON 301 - Introduction to Econometrics I
METU - Department of Economics February 2014 Assumption 5 (A5): No autocorrelation7 or zero covariance
between ut and us [i.e., Cov( ut ,us ) 0 ]. This assumption can be
made stronger by assuming that the values of ut are all statistically
independent8.
o This condition states that there should be no systematic
association between the values of the disturbance term in
any two observations.
o If this condition is not satisfied, OLS will again give
inefficient estimates.
o This assumption implies that output is higher than
expected today should not lead to higher (or lower) than
expected output tomorrow.
o Recall that correlation between X and Y is given by: Cov( X ,Y ) XY where X and Y are standard deviations of X and Y,
respectively. Therefore, if the correlation between X and Y
is zero ( 0 ), it implies that Cov( X ,Y ) 0 . As a result,
the no autocorrelation implies that Cov( ut ,us ) 0 where
t s .9 7 No autocorrelation means the correlation between any ut and us ( t s ) is zero. 8 The value which the disturnbance term assumes in one period does not depend on the value which it assumed
in any other period.
9
Recall that a zero value of the covariance indicates no linear dependence between X and Y.
Instructor: Dr. Ozan ERUYGUR
Lecture Notes e-mail: [email protected] 10 ECON 301 - Introduction to Econometrics I
METU - Department of Economics February 2014 o Note that covariance between X and Y is given by
Cov( X ,Y ) E X E( X )Y E( Y ) Therefore, no autocorrelation implies zero covariance:
Cov( ut ,us ) E ut E( ut )us E( us ) 0 From Assumption 3, we know that E( ut ) 0 and
E( us ) 0 . Thus, no autocorrelation assumption implies
that Cov( ut ,us ) E ut E( ut ) us E( us ) 0 0
0 E ut us 0 Assumption 6 (A6): The number of observations (T) must be
greater than the number of parameters to be estimated10 (T>k+1)
and that there are no exact linear relationship between the
independent variables (No perfect multicollinearity).
o Although this is stated as an assumption, it can easily be
checked, so that it need not be assumed. 10 There are k independent variables and one intercept term, hence: k+1. Instructor: Dr. Ozan ERUYGUR
Lecture Notes e-mail: [email protected] 11 ECON 301 - Introduction to Econometrics I
METU - Department of Economics March 2014 LECTURE 02
POPULATION REGRESSION FUNCTION – SAMPLE
REGRESSION FUNCTION
Outline of today’s lecture:
1. Population Regression Function (PRF) and Sample Regression Function (SRF) ................................. 1
A. Homoscedasticity Case ................................................................................................................... 2
B. Heteroscedasticity Case .................................................................................................................. 9
2. Gauss-Markov Assumptions.............................................................................................................. 11
3. Sampling Distribution and Repeated Sampling ................................................................................ 12
4. The Sampling Distribution of βˆi ...................................................................................................... 12 1. Population Regression Function (PRF) and
Sample Regression Function (SRF)
Suppose we have a small community of 24 families (This is our
population). We’re interested in studying the relationship between
their weekly disposable income (X) and consumption (Y). We want to
estimate the average consumption expenditures of this population
(population mean of consumption expenditures) at different income
levels.
The 24 families can be grouped into four income groups. Each
family within a group has the same disposable income [Hence, X’s
are fixed!].
Note that this is the entire population, not a sample. Instructor: Dr. Ozan ERUYGUR
Lecture Notes e-mail: [email protected] 1 ECON 301 - Introduction to Econometrics I
METU - Department of Economics March 2014 A. Homoscedasticity Case We have a small community of 24 families (This is our population,
hence the population size is N=24).
Table 1 Consumption Expenditures for Different Levels of Disposable Income
(Population, N=24) – Homoscedasticity Case
Disposable
Income
(Xt) Consumption
(Yt) Average
Consumption E (Y | X = X t ) Disturbances
(ut) =
Yt − E (Y | X =
Xt ) Conditional Mean
of ut E (ut | X t ) = 0 1100 580, 640, 700,
820, 880, 920 760 -180, -120, -60, 60,
120, 180 0 1200 640, 700, 760,
880, 940, 1000 820 -180, -120, -60, 60,
120, 180 0 1300 700, 760, 820,
940, 1000, 1060 880 -180, -120, -60, 60,
120, 180 0 1400 760, 820, 880,
1000, 1060,
1120 940 -180, -120, -60, 60,
120, 180 0 Conditional Variance of ut Var ( u t | X t ) = 0
t [(-180-0)2+(-120-0)2+(-60-0)2+
(180-0)2+(120-0)2+(600)2]/6=16800
[(-180-0)2+(-120-0)2+(-60-0)2+
(180-0)2+(120-0)2+(600)2]/6=16800
[(-180-0)2+(-120-0)2+(-60-0)2+
(180-0)2+(120-0)2+(600)2]/6=16800
[(-180-0)2+(-120-0)2+(-60-0)2+
(180-0)2+(120-0)2+(600)2]/6=16800 Below we plot these data points on a diagram.
The “bold” square dots are the actual observations.
The Conditional Mean or Conditional Expectation is given as
follows:
E (Y | X = X t ) E (Y | X = X t ) may be regarded as the average or expected
consumption expenditure of households with the given
disposable income X.
In the figure, the “blank” square dots are the conditional means. Instructor: Dr. Ozan ERUYGUR
Lecture Notes e-mail: [email protected] 2 ECON 301 - Introduction to Econometrics I
METU - Department of Economics March 2014 Figure 1 Consumption Expenditures for Different Levels of Disposable Income
(Population) – Homoscedasticity Case We might anticipate that consumption will be linearly related to
disposable income.
This is an initial assumption of our estimation.
We could narrow this functional form to: E (Y | X= X t=
) β 0 + β1 X t
or simply E (Y=
) β 0 + β1 X t
This is the (linear) Population Regression Function
(PRF) 1: β 0 + β1 X t . Notice that, here the PRF is the straight line
connecting these conditional means. Clearly, consumption
expenditures “on average” increase with disposable income.
1 Or, True Regression Function. Instructor: Dr. Ozan ERUYGUR
Lecture Notes e-mail: [email protected] 3 ECON 301 - Introduction to Econometrics I
METU - Department of Economics March 2014 Geometrically, the Population Regression Function (PRF) is
simply the locus of average consumption expenditures of our
population at different income levels2.
Analytically the equation of PRF3 can be found using any two
data points. For example, E (Y=
| X 1100)
= 760 , X = 1100 and
E (Y=
| X 1200)
= 820 , X = 1200 can be used: β 0 + β11100 =
760
β 0 + β11200 =
820
Solving this system yields β 0 = 100 and β1 = 0.6 . Hence the
equation of PRF is as follows:
E (Y |=
X X=
100 + 0.6 X t
t) Role of Disturbance The PRF tells us the “average” consumption
expenditures for a given level of household income.
But we know that any “particular” household is unlikely to be
on this function since there are stochastic deviations from
these average consumption expenditure levels at different
levels of income.
The“average”consumption expenditures at different income
levels, in terms of PRF, can be expressed as (Data
Generating Process, DGP): =
Yt E (Y |=
X X t ) + ut
where ut is a random variable with mean 0 (disturbance).
2 Or, “th...

View
Full Document

- Spring '10
- öcal
- Linear Regression, Regression Analysis, Variance, probability density function, Dr. Ozan Eruygur