overview - ECON2206/ECON3290: Week 12 Overview Dr. Rachida...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ECON2206/ECON3290: Week 12 Overview Dr. Rachida Ouysse School of Economics UNSW ECON2206/ECON3290 Overview of this course material We discussed estimation issues under a set of model assumptions. The benchmark model: linear regression under Gauss Markov assumptions: The population model for the conditional mean of yi given xi : E (yi |xi ) = m(xi ) is assumed to be linear in xi : yi = xi β + = β0 + β1 xi 1 + β2 xi 2 + · · · + βk xik + (1) i i (2) Overview of this course material Assumption 1: Linearity in β : if not correct we have model misspecification. Can still have nonlinearity with respect to xi . Possible to test for misspecification: RESET test. (See W. Ch. 9): First estimate y = β0 + β1 x1 + β2 x2 + · · · + βk xk + µ, (3) get y and then test in the augmented model y = β0 + β1 x1 + β2 x2 + · · · + βk xk + δ1 y 2 + δ2 y 3 + µ, the null hypothesis H 0 : δ1 = δ2 = 0 Under the null RESET is distributed as an F2,n−k −3 . (4) Overview of this course material Assumption 2: Random Sampling We have a random sample of n observations, {(xi 1 , xi 2 , · · ·, xik , yi ) : i = 1, 2, · · ·, n}, following the population model in equation (2). Nonrandom sampling causes OLS estimator to be biased and inconsistent. When does the non random sampling fail? Sampling from a truncated population induces systematic bias due to non random sampling. Censoring systematically assigns a fixed value to a nontrivial proportion of the population: therefore nonrandom! Selection bias: when dealing with choice variables. Example years ability and years of schoolins. More able people may systematically choose higher level of education. Studies of return to education that uses a wage equation are affected with such bias. Time series data are special and require different definition of randomness. Gauss Markov: Finite sample properties of OLS The population parameter in the population model (2) is identified if and only if: 1 E (xi ui ) = 0. A sufficient condition for this is the zero conditional mean assumption: E (ui |xi ) = 0. 2 E (xi xi ) is invertible: there is no PERFECT collinearity. Under these assumtions: the OLS estimator β is unbiased. If in addition we assume homoscedasticity: E (ui2 ) = σ 2 , then OLS is BLUE according to the Gauss Markov theorem. If heteroscedasticity: use weighted least squares or heteroscedasticity robust standard errors. Large sample properties of OLS If the errors are normally distributed, then the exact distribution of OLS estimator is also normal. If cannot assume normality, use a large sample and call: the law of large numbers: OLS is consistent the central limit theorem: OLS is asymptotically normally distributed. Note: Consistency of OLS only requires a less strict assumption: E (µ) = 0 and E (xi ui ) = 0 (cov (xi , ui ) = 0) plimβ1 = β1 + cov (u , x1 ) Var (x1 ) Large sample tests: The LM test, the Wald test. OLS estimators have the smallest asymptotic variances among the class of linear estimators. OLS under deviations from the Gauss Markov theorem Misspecification: depending on the form OLS could still be useful. If omitted variables are irrelevant: OLS unbiased but more precise. If omitted variables are relevant but uncorrelated with any of the regressors: OLS unbiased but less efficient. However if the omitted variable is correlated with the regressor: loss of ZCM. Thus OLS is biased and sometimes inconsistent. the inclusion of an irrelevant variable in the regression generally leads to inefficient estimator. y y = = β0 + β1 x1 + β2 x2 + u β0 + β1 x1 + v (5) (6) Var β1 = σ2 2 SST1 (1 − R1 ) (7) Var β1 = 2 σ 2 + β2 Var (x2 ) 2 SST1 (1 − R1 ) (8) OLS under deviations from the Gauss Markov theorem Non random sampling: OLS most probably not useful anymore. If sample selection is based on a regressor: OLS still fine; If the sample selection is endogenous: OLS is biased, inconsistent. There is Selection Bias due to failure of zero conditional mean Heteroscedasticity: OLS still unbiased and consistent but not efficient. OLS standard errors not useful for inference: because they underestimate the estimation error. As a result, tests statistics (including t − ratio ) are misleading and incorrect. Two options: 1 2 3 correct the OLS standard errors using het.robust standard errors and use het.robust statistics for hypothesis testing and confidence intervals; use FGLS (WLS) which is more efficient than OLS under heteroscedasticity. If use FGLS then R 2 is not useful for goodness of fit. We can test for heteroscedasticity: BP (LM or F-test) and White test. Inference When using large sample argument to do inference, the standard errors and the tests are asymptotic. You must always state the null hypothesis for the tests you are using and their distribution under the null. For example: the BP and White test use different definitions of heteroscedasticity. This defines the null of the test and the corresponding auxillary regression. you must understand the limitations of each test: RESET not useful to directly compare two specifications. tests of nested and nonnested models: what can we do if the dependent variable appears in different functional form. Large sample tests like the LM can be used to test linear restrictions: understand how this done. LM and Wald are asymptotically chi-squared distributed. W = q × F Data issues and solutions Omitted variables. One solution is to use Proxy variable. What are the requirement for a proxy to work? Measurement error: at best inefficient estimator, at worst biased estimator. Attenuation bias models with measurement error or errors in variables are a special case of models with endogenous regressors: that is where ZCM fails. Can you use a proxy to solve the problem? Time series data Very special: why? We can still get unbiased estimates but under very restrictive assumption: strict exogeneity To be more realistic we drop this assumption. We lose unbiasedness but keep consistency!! Issues specific to time series data: spurious regression due to common trends in the variables. What I hope you take from this course: the bottom line Applied work is not as easy as writing a Stata command and pressing ENTER! A good foundation for sound and robust empirical work is a full understanding of: 1 2 3 4 5 6 The research question: always helpful if you can have a theoretical model (economic, financial, social ..) to be able able to tell a story. What information do you need. Understand your data: the source (is it reliable), the survey questions. What are the limitations Understand what assumptions lie behind the estimation methods you are using. Are these assumptions reasonable for the data you have? You get an estimator: it is not the truth. In fact there is no true model. There are only approximations of the truth (if it exists). The story you tell from the data is only as good as the model you are using which is only valid if the assumptions it imposes are plausible. Always good to check the help of the statistical software you use to see what are the default options. Last but not least: Final Examination: Date 09 November 08:45AM-11:00AM, Location: Derby Rm, Oaks Lawn Marquee, Galaxy Rm Randwick Racecource! Final exam cumulative: all material in lectures, tutorial and assignments is examinable. The exam is TWO hours: FOUR problem solving questions. Formulae provided, statistical tables provided. Bring a calculator. Some practice questions will be posted. Office hours: Monday 07 of November 10:00AM-12:30PM. Check BB for additional announcement THANKS and, GOOD LUCK IN YOUR EXAMS... ...
View Full Document

Ask a homework question - tutors are online