Unformatted text preview: ECON2206/ECON3290: Week 12
Overview
Dr. Rachida Ouysse
School of Economics
UNSW ECON2206/ECON3290 Overview of this course material We discussed estimation issues under a set of model assumptions.
The benchmark model: linear regression under Gauss Markov assumptions:
The population model for the conditional mean of yi given xi :
E (yi xi ) = m(xi ) is assumed to be linear in xi :
yi = xi β + = β0 + β1 xi 1 + β2 xi 2 + · · · + βk xik + (1) i
i (2) Overview of this course material Assumption 1: Linearity in β : if not correct we have model
misspeciﬁcation. Can still have nonlinearity with respect to xi .
Possible to test for misspeciﬁcation:
RESET test. (See W. Ch. 9):
First estimate
y = β0 + β1 x1 + β2 x2 + · · · + βk xk + µ, (3) get y and then test in the augmented model
y = β0 + β1 x1 + β2 x2 + · · · + βk xk + δ1 y 2 + δ2 y 3 + µ, the null hypothesis
H 0 : δ1 = δ2 = 0
Under the null RESET is distributed as an F2,n−k −3 . (4) Overview of this course material Assumption 2: Random Sampling
We have a random sample of n observations,
{(xi 1 , xi 2 , · · ·, xik , yi ) : i = 1, 2, · · ·, n}, following the population model in
equation (2).
Nonrandom sampling causes OLS estimator to be biased and inconsistent. When does the non random sampling fail? Sampling from a truncated population induces systematic bias due to non
random sampling.
Censoring systematically assigns a ﬁxed value to a nontrivial proportion of
the population: therefore nonrandom!
Selection bias: when dealing with choice variables. Example years ability
and years of schoolins. More able people may systematically choose higher
level of education. Studies of return to education that uses a wage
equation are aﬀected with such bias.
Time series data are special and require diﬀerent deﬁnition of randomness. Gauss Markov: Finite sample properties of OLS The population parameter in the population model (2) is identiﬁed if and
only if:
1 E (xi ui ) = 0.
A suﬃcient condition for this is the zero conditional mean assumption:
E (ui xi ) = 0. 2 E (xi xi ) is invertible: there is no PERFECT collinearity. Under these assumtions: the OLS estimator β is unbiased.
If in addition we assume homoscedasticity: E (ui2 ) = σ 2 , then OLS is
BLUE according to the Gauss Markov theorem.
If heteroscedasticity: use weighted least squares or heteroscedasticity
robust standard errors. Large sample properties of OLS If the errors are normally distributed, then the exact distribution of OLS
estimator is also normal.
If cannot assume normality, use a large sample and call:
the law of large numbers: OLS is consistent
the central limit theorem: OLS is asymptotically normally distributed. Note: Consistency of OLS only requires a less strict assumption: E (µ) = 0
and E (xi ui ) = 0 (cov (xi , ui ) = 0)
plimβ1 = β1 + cov (u , x1 )
Var (x1 ) Large sample tests: The LM test, the Wald test.
OLS estimators have the smallest asymptotic variances among the class of
linear estimators. OLS under deviations from the Gauss Markov theorem Misspeciﬁcation: depending on the form OLS could still be useful.
If omitted variables are irrelevant: OLS unbiased but more precise.
If omitted variables are relevant but uncorrelated with any of the regressors:
OLS unbiased but less eﬃcient.
However if the omitted variable is correlated with the regressor: loss of
ZCM. Thus OLS is biased and sometimes inconsistent.
the inclusion of an irrelevant variable in the regression generally leads to
ineﬃcient estimator.
y
y =
= β0 + β1 x1 + β2 x2 + u
β0 + β1 x1 + v (5)
(6) Var β1 = σ2
2
SST1 (1 − R1 ) (7) Var β1 = 2
σ 2 + β2 Var (x2 )
2
SST1 (1 − R1 ) (8) OLS under deviations from the Gauss Markov theorem Non random sampling: OLS most probably not useful anymore.
If sample selection is based on a regressor: OLS still ﬁne;
If the sample selection is endogenous: OLS is biased, inconsistent. There is
Selection Bias due to failure of zero conditional mean Heteroscedasticity: OLS still unbiased and consistent but not eﬃcient.
OLS standard errors not useful for inference: because they underestimate
the estimation error. As a result, tests statistics (including t − ratio ) are
misleading and incorrect.
Two options:
1
2
3 correct the OLS standard errors using het.robust standard errors and use
het.robust statistics for hypothesis testing and conﬁdence intervals;
use FGLS (WLS) which is more eﬃcient than OLS under heteroscedasticity.
If use FGLS then R 2 is not useful for goodness of ﬁt. We can test for heteroscedasticity: BP (LM or Ftest) and White test. Inference When using large sample argument to do inference, the standard errors
and the tests are asymptotic.
You must always state the null hypothesis for the tests you are using and
their distribution under the null.
For example:
the BP and White test use diﬀerent deﬁnitions of heteroscedasticity. This
deﬁnes the null of the test and the corresponding auxillary regression.
you must understand the limitations of each test: RESET not useful to
directly compare two speciﬁcations.
tests of nested and nonnested models: what can we do if the dependent
variable appears in diﬀerent functional form. Large sample tests like the LM can be used to test linear restrictions:
understand how this done.
LM and Wald are asymptotically chisquared distributed. W = q × F Data issues and solutions Omitted variables. One solution is to use Proxy variable.
What are the requirement for a proxy to work? Measurement error: at best ineﬃcient estimator, at worst biased
estimator.
Attenuation bias
models with measurement error or errors in variables are a special case of
models with endogenous regressors: that is where ZCM fails.
Can you use a proxy to solve the problem? Time series data Very special: why?
We can still get unbiased estimates but under very restrictive assumption:
strict exogeneity
To be more realistic we drop this assumption. We lose unbiasedness but
keep consistency!!
Issues speciﬁc to time series data: spurious regression due to common
trends in the variables. What I hope you take from this course: the bottom line Applied work is not as easy as writing a Stata command and pressing
ENTER!
A good foundation for sound and robust empirical work is a full
understanding of:
1 2 3 4 5 6 The research question: always helpful if you can have a theoretical model
(economic, ﬁnancial, social ..) to be able able to tell a story. What
information do you need.
Understand your data: the source (is it reliable), the survey questions.
What are the limitations
Understand what assumptions lie behind the estimation methods you are
using. Are these assumptions reasonable for the data you have?
You get an estimator: it is not the truth. In fact there is no true model.
There are only approximations of the truth (if it exists).
The story you tell from the data is only as good as the model you are using
which is only valid if the assumptions it imposes are plausible.
Always good to check the help of the statistical software you use to see
what are the default options. Last but not least: Final Examination: Date 09 November 08:45AM11:00AM, Location:
Derby Rm, Oaks Lawn Marquee, Galaxy Rm Randwick Racecource!
Final exam cumulative: all material in lectures, tutorial and assignments is
examinable.
The exam is TWO hours: FOUR problem solving questions.
Formulae provided, statistical tables provided.
Bring a calculator.
Some practice questions will be posted.
Oﬃce hours: Monday 07 of November 10:00AM12:30PM. Check BB
for additional announcement THANKS and,
GOOD LUCK IN YOUR EXAMS... ...
View
Full
Document
This note was uploaded on 02/29/2012 for the course ECON 2206 taught by Professor Yang during the One '11 term at University of New South Wales.
 One '11
 yang
 Economics

Click to edit the document details