ie_Slide13(1) - Introductory Econometrics ECON2206/ECON3209...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Introductory Econometrics ECON2206/ECON3209 Slides13 Lecturer: Minxian Yang ie_Slides13 my, School of Economics, UNSW 1 13. Pooled Cross Sections and Panel Data (Ch13) 13. Pooled Cross Sections and Panel Data: Simple Methods • Lecture plan – – – – – – Pooling cross sections across time Policy analysis with pooled cross sections Two period panel data Fixed effects model and first-differenced estimator Policy analysis with panel data Differencing with multiple-period panel data ie_Slides13 my, School of Economics, UNSW 2 13. Pooled Cross Sections and Panel Data (Ch13) • Pooled cross sections – Cross sections at different points in time • An independently-pooled-cross-section is obtained by sampling randomly from a population at different points in time (usually different years). • Observations at different points in time are not necessarily collected from the same set of individuals. • Pooled cross sections consist of independent observations (from a number of random samples). • Pooled cross sections differ from a single random sample in that observations at different points in time may not follow the same population distribution. • The time stamps of observations are important. ie_Slides13 my, School of Economics, UNSW 3 13. Pooled Cross Sections and Panel Data (Ch13) • Pooled cross sections – Example 13.1 (fertil1.raw) Dependent Variable: KIDS NObs = 1129 aveKIDS = 2.743 SSR = 2685.9 R2 = 0.130 Sigma2 = 2.418 Adj.R2 = 0.116 Variable Estimated OLS H.robust H.robust Name Coeffs Stderr Stderr p-Value EDUC -0.128 0.018 0.021 0.000 AGE 0.532 0.138 0.138 0.000 AGE2 -0.006 0.002 0.002 0.000 BLACK 1.076 0.174 0.200 0.000 EAST 0.217 0.133 0.126 0.086 NORTHCEN 0.363 0.121 0.116 0.002 WEST 0.198 0.167 0.161 0.221 FARM -0.053 0.147 0.145 0.717 OTHRURAL -0.163 0.175 0.179 0.364 TOWN 0.084 0.125 0.127 0.508 SMCITY 0.212 0.160 0.153 0.166 Y74 0.268 0.173 0.186 0.150 Y76 -0.097 0.179 0.198 0.624 Y78 -0.069 0.182 0.196 0.726 Y80 -0.071 0.183 0.192 0.711 Y82 -0.522 0.172 0.186 0.005 Y84 -0.545 0.175 0.184 0.003 CONST. -7.743 3.052 3.046 0.011 • US data: surveys from 72 to 84. Did fertility rates change over time? • Look at the coefficients on year dummies (base year =1972). Y84 = 1 • Holding other factors fixed, a lady for obs from 1984, in 1984 had .545 less kids than =0 a lady in 1972 on average. for others. • Het.-robust standard errors do not change the conclusion. • White-LM test confirms heteroskedasticity. Nonetheless, the result from WLS (FGLS) leads to similar conclusions. ie_Slides13 my, School of Economics, UNSW 4 13. Pooled Cross Sections and Panel Data (Ch13) • Policy analysis with pooled cross sections – Example 13.3 (house price and incinerator) • North Andover: new incinerator rumours started in 1978, construction began in 1981 and operation began in 1985. nearinc =1 for houses • near incinerator, • =0 for others. Data consist of samples in 1978 and 1981. The price of houses near the incinerator is expected to fall over time? other factors Model (base year = 1978) and interpretation: log(price) = 0 + / 0 y81 + ( 1 + / 1 y81)∙nearinc + ... + u . • OLS estimation results log(price) = 1.69 + .417 y81 + .086nearinc − .128 y81∙nearinc + ... (1.97) (.028) (.057) (.051) • Over the period, the price of houses near the incinerator decreased 12.8% (p-val = .013) in comparison to houses in other areas. • “Difference in difference” estimator. ie_Slides13 Before Control Treatment (nearinc) Treatment – Control my, School of Economics, UNSW 0 0 1 + 1 After (y81) After - Before /0 0 + /0 + /0+ 1 + /1 /0+ /1 0 /1 1 + /1 5 13. Pooled Cross Sections and Panel Data (Ch13) • Policy analysis with pooled cross sections – Generalisation of Example 13.3 • Natural experiment: an event, typically a government policy, that changes the environment of individuals. • Control group consists of individuals unaffected by the event. (eg. houses far away from the incinerator) • Treatment group consists of those affected by the event. (eg. houses near the incinerator) • Two cross sections, before and after the event, are needed dT = treatment to compensate the fact that the control and treatment groups dummy; are not really randomly chosen: other factors d2 = “after” y = 0 + / 0 d2 + ( 1 + / 1 d2)∙dT + ... + u . dummy. • Without other factors, ˆ δ1 ( y 2,T y 2,C ) ( y1,T y1,C ). (difference-in-difference) ie_Slides13 Before Control Treatment (dT) Treatment – Control my, School of Economics, UNSW 0 0+ 1 1 After (d2) 0 + /0 0 + /0+ 1 + /1 After - Before 1 + /1 /0 /0+ /1 /1 6 13. Pooled Cross Sections and Panel Data (Ch13) • Panel (longitudinal) data – A set of panel data is collected by following the same individuals over a number of time periods. eg. A panel data set for wage, educ, exper, ... i. randomly select a sample of people from the population and collect data for 2009; ii. the same people are re-interviewed to collect data for 2010, 2011, .... Time series on wage, educ, exper, ... are collected for each individual. – Panel data, while costly, allow researchers to address issues related to unobserved factors, which are difficult to handle with cross sectional data. ie_Slides13 my, School of Economics, UNSW 7 13. Pooled Cross Sections and Panel Data (Ch13) • Two period panel data – Two period panel data {(yit, xit), i = 1,...,N; t = 1, 2 } • Example (crime2.raw): crime rates (crmrte) and unemployment rates (unem) for 46 cities in 1982 ( t = 1) and 1987 (t = 2). Did unem influence crmrte? For 1987, crmrte = 128.38 − 4.16 unem, n = 46, R2 = .033. (20.76) (3.42) • The result is likely biased because many relevant factors (eg. population, police, ...) are not controlled for. • For panel data, unobserved factors may be grouped into those that vary over time and those that do not: yit = 0 + / 0 d2t + β xit + ai + uit, t = 1, 2, varying over time but not across section varying across section but not over time where d2t = 1 if t = 2 and 0 otherwise. ie_Slides13 my, School of Economics, UNSW 8 13. Pooled Cross Sections and Panel Data (Ch13) • Two period panel data – Unobserved (or fixed) effects model • In the unobserved-effects model yit = 0 + / 0 d2t + β xit + ai + uit , t = 1, 2, ai can be correlated with xit . uit is uncorrelated with xit. – ai is the unobserved effect (fixed effect, invariant to t) that represents factors specific to individual i; – uit is the idiosyncratic error that represents unobserved factors varying both overtime and across section. • Can we just pool data for t = 1, 2 and OLS y on (d2, x)? No, because ai is usually correlated with xit . eg. “crime2.raw”, pooled OLS: n = 92, R2 = .012, the composite crmrteit = 93.42 + 7.94d2t + .427unemit . error (ai + uit) here is likely correlated (12.74) (7.98) (1.188) with unemit. ie_Slides13 my, School of Economics, UNSW 9 13. Pooled Cross Sections and Panel Data (Ch13) • Two period panel data – First-differenced estimation of fixed effects model • Write the model separately yi1 = 0 + / 0 0 + β xi1 + ai + ui1 , (t = 1), yi2 = 0 + / 0 1 + β xi2 + ai + ui2 , (t = 2). • Subtracting the first equation from the second gives Δyi = / 0 + β Δxi + Δui , Δxi = xi2 − xi1 , (first-differenced equation) which is a cross-section model and is free of ai. • When the idiosyncratic error uit is uncorrelated with regressors in both periods, the OLS of Δyi on Δxi will deliver consistent estimators of (/ 0, β). ie_Slides13 my, School of Economics, UNSW 10 13. Pooled Cross Sections and Panel Data (Ch13) • Two period panel data – First-differenced estimation example • Example (crime2.raw): first differenced estimation With panel data, we can deal with, to some extent, unobserved factors! ie_Slides13 Δcrmrtei = 15.40 + 2.22 Δunemi , n = 46, R2 = .127. (4.70) (.88) – there is a positive and significant relationship between unem and crmrte. A 1% rise in unemployment increases 2.22 crimes per 1,000 people. – the crimes per 1,000 people increased by 15.4 in 1987, in comparison to 1981. – It may be reasonable to assume thae Δui is uncorrelated with Δunemi, given that the “fixed effects” are removed. – But, if police effort increased more in cities where unemployment decreased, then Δui and Δunemi are correlated. Need to include other observable factors. my, School of Economics, UNSW 11 13. Pooled Cross Sections and Panel Data (Ch13) • Two period panel data – First-differenced estimation: a shortcoming • Consider the log wage model with two-period panel log(wageit)= 0 + / 0 d2t + 1 educit + ai + uit , t = 1, 2, where ai represents unobserved factors, ability, say. • The first differenced equation is Δlog(wagei)= / 0 + 1 Δeduci + Δui . • However, for most adult workers, Δeduci is zero. The overall variation in Δeduci is small. The OLS estimator will have a large standard error (Ch2-3). • Using the first-differenced estimation is a good idea for “return to education”. But, frequently, it does not work well because of the lack of variation in Δeduc. ie_Slides13 my, School of Economics, UNSW 12 13. Pooled Cross Sections and Panel Data (Ch13) • Policy analysis with two period panel data – Ex.13.7: effect of drunk drive laws on traffic fatalities • Data: dthrte (traffic death rate), open (open container laws), admn (adiministative per se laws) for 51 US states and district for 1985 and 1990. • Model: t = 1, 2, dthrteit = 0 + / 0 d2t + 1openit + 2admnit + ai + uit . • The first-differenced equation is Δdthrteit = / 0 + 1Δopenit + 2Δadmnit . • Estimation result: n = 51, R2 = .119 Δdthrteit = −.497 − .420 Δopenit − .151Δadmnit . (.052) (.206) (.117) • open is significantly negative at the 5% level. The laws are effective in reducing traffic deaths. ie_Slides13 my, School of Economics, UNSW 13 13. Pooled Cross Sections and Panel Data (Ch13) • Panel data with more than two periods – Panel data {(yit, xit), i = 1,...,N; t = 1,2,3} • The fixed effects model for three periods yit = / 1 + / 2 d2t + / 3 d3t + β xit + ai + uit , t = 1,2,3, where d2t and d3t are year dummies. – The fixed effect ai is generally correlated with xit. – The differenced equation Δyit = / 2 Δd2t + / 3 Δd3t + β Δxit + Δuit , t = 2,3, where t cannot be dropped. Or, re-parameterisation, Δd2t + 2Δd3t = 1 Δyit = / 2 + . 3 d3t + β Δxit + Δuit , t = 2,3, for t = 2, 3 where . 3 = (/ 3 − 2/ 2). – The key assumption about the idiosyncratic error uit is Cov(Δxit, Δuis) = 0 for all t, s and i, which is implied by strict exogeneity (TS3). ie_Slides13 my, School of Economics, UNSW 14 13. Pooled Cross Sections and Panel Data (Ch13) • Panel data with more than two periods – Panel data {(yit, xit), i = 1,...,N; t = 1,...,T } • The fixed effects model for T periods yit = / 1 + / 2d2t +...+ / TdTt + β xit + ai + uit , t = 1,...,T, where d2t,..., dTt are year dummies. “T ” here is assumed to be much smaller than “N ”. ie_Slides13 – The differenced equation takes the form Δyit = . 2 + . 3d3t +...+ . TdTt + β Δxit + Δuit , t = 2,...,T. – The key assumption about the idiosyncratic error uit is Cov(Δxit, Δuis) = 0 for all t, s and i. – For a give i, “uit is independent across t ” implies “Δuit is autocorrelated”, whereas “Δuit is independent across t ” implies “uit is a random walk”. – To deal with serial correlation in Δuit, Prais-Winsten (GLS) method may be used. my, School of Economics, UNSW 15 13. Pooled Cross Sections and Panel Data (Ch13) • Panel data and program evaluation – Example 13.8, Effect of enterprise zone (EZ) program on unemployment claims: 22 cities over 1980-1988 • The differenced equation, t = 81,...,88 (T = 9), Δlog(uclmsit)= . 1 + . 2 d82t +...+ . 8d88t + 1Δezit + Δuit , where ezit is one if City i at year t was an EZ. Dependent Variable: dLog(uclms) • Estimation result NObs = 176 – The estimated 1 is -.182 and is statistically significant. – The presence of an EZ causes 16.6% (= e-.182 – 1) fall in unemployment claims, which is economically significant. ie_Slides13 SSR = 7.796 Sigma2 = 0.047 Variable Estimated Name Coeffs DEZ -0.182 D82 0.779 D83 -0.033 D84 -0.017 D85 0.323 D86 0.292 D87 0.054 D88 -0.017 CONST. -0.322 my, School of Economics, UNSW R2 = 0.623 Adj.R2 = 0.605 Stderr 0.078 0.065 0.065 0.069 0.067 0.065 0.065 0.065 0.046 t-Ratio -2.326 11.950 -0.508 -0.250 4.845 4.485 0.828 -0.262 -6.982 p-Value 0.021 0.000 0.612 0.803 0.000 0.000 0.409 0.794 0.000 16 13. Pooled Cross Sections and Panel Data (Ch13) • Summary – Effects of a “policy” may be analysed by using pooled cross sections or panel data – For pooled cross sections, “difference-in-difference” estimation is useful. – For panel data with small T, the first-differenced estimation is useful. – With panel data, we can address the issue of “unobserved factors” to some extent. ie_Slides13 my, School of Economics, UNSW 17 A Brief Review A Brief Review • Data structures – Cross sectional data • “Population” and “sample” are clearly defined. • Valid inference procedures are available under MLR1-5. • We are able to handle Heteroskedasticity. – Time series data • Features of time series data • Valid inference procedures are available under TS1-5. • Similarities and differences in comparison to cross-sectional data – Pooled cross sections and panel data • Definitions and applications (policy analysis) ie_Slides13 my, School of Economics, UNSW 18 A Brief Review • Linear regression models – Definition of linear regression • • • • • Basic structure and terminology Interpretation of parameters and disturbance Restricted and un-restricted Re-parameterisation LPM – Applications of linear regression • • • • ie_Slides13 Analyse “causal” or “ceteris paribus” effects; Analyse “association”; Make predictions; Analyse policy implications. my, School of Economics, UNSW 19 A Brief Review • Statistical inference – OLS • OLS: choose parameter estimates by min SSR • The properties of the OLS estimators: unbiased, consistent, efficient, (asymptotic) normality • Assumptions: MLR1-5(6), TS1-5(6) • WLS and FGLS – Inference procedures • Confidence intervals • Tests: significance (single or joint), heteroskedasticity, misspecification, autocorrelation, etc.. • Predictions: point and interval ie_Slides13 my, School of Economics, UNSW 20 About Final Exam • Final exam – – – – – – – Five compulsory questions for two hours. No multiple choice questions. Cover the whole course. Bring your ID, pens, non-programmable calculator. Statistical tables and some formulae are provided. Read instructions (front page) carefully. In your answers: • • • • ie_Slides13 show your understanding of the material; write to-the-point answers with clear steps; specialise general formula to the question at hand; express and justify your conclusions. my, School of Economics, UNSW 21 About Final Exam • Final exam – More specifics: • Know what are MLR1-6 and TS1-6. • “What is ...” requires good understandings of notions/definitions/principles/procedures. • “Outline ...” or “List ...” requires brief descriptions for the steps involved in a test or estimation procedure. • Know the meaning of “regressing ... on ...” Able to interpret estimation results. • If the level of significance for a test is not given, choose 5% or 1%. • The front page, statistical tables and formulae of the exam paper are given in the following sides. • Answer easy questions first. Make your writings readable! ie_Slides13 my, School of Economics, UNSW 22 About Final Exam ie_Slides13 my, School of Economics, UNSW 23 About Final Exam ie_Slides13 my, School of Economics, UNSW 24 ...
View Full Document

This note was uploaded on 06/12/2011 for the course ECONOMICS 3291 taught by Professor Professorsnamespublishedtheyarethesoleowners during the Three '11 term at University of New South Wales.

Ask a homework question - tutors are online