This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Introductory Econometrics
ECON2206/ECON3209 Slides13 Lecturer: Minxian Yang
ie_Slides13 my, School of Economics, UNSW 1 13. Pooled Cross Sections and Panel Data (Ch13) 13. Pooled Cross Sections and Panel Data:
Simple Methods
• Lecture plan
–
–
–
–
–
– Pooling cross sections across time
Policy analysis with pooled cross sections
Two period panel data
Fixed effects model and firstdifferenced estimator
Policy analysis with panel data
Differencing with multipleperiod panel data ie_Slides13 my, School of Economics, UNSW 2 13. Pooled Cross Sections and Panel Data (Ch13) • Pooled cross sections
– Cross sections at different points in time
• An independentlypooledcrosssection is obtained by
sampling randomly from a population at different points
in time (usually different years).
• Observations at different points in time are not
necessarily collected from the same set of individuals.
• Pooled cross sections consist of independent
observations (from a number of random samples).
• Pooled cross sections differ from a single random
sample in that observations at different points in time
may not follow the same population distribution.
• The time stamps of observations are important.
ie_Slides13 my, School of Economics, UNSW 3 13. Pooled Cross Sections and Panel Data (Ch13) • Pooled cross sections
– Example 13.1 (fertil1.raw) Dependent Variable: KIDS
NObs =
1129
aveKIDS = 2.743
SSR =
2685.9
R2 =
0.130
Sigma2 = 2.418
Adj.R2 = 0.116
Variable
Estimated
OLS H.robust H.robust
Name
Coeffs
Stderr
Stderr pValue
EDUC
0.128
0.018
0.021
0.000
AGE
0.532
0.138
0.138
0.000
AGE2
0.006
0.002
0.002
0.000
BLACK
1.076
0.174
0.200
0.000
EAST
0.217
0.133
0.126
0.086
NORTHCEN
0.363
0.121
0.116
0.002
WEST
0.198
0.167
0.161
0.221
FARM
0.053
0.147
0.145
0.717
OTHRURAL
0.163
0.175
0.179
0.364
TOWN
0.084
0.125
0.127
0.508
SMCITY
0.212
0.160
0.153
0.166
Y74
0.268
0.173
0.186
0.150
Y76
0.097
0.179
0.198
0.624
Y78
0.069
0.182
0.196
0.726
Y80
0.071
0.183
0.192
0.711
Y82
0.522
0.172
0.186
0.005
Y84
0.545
0.175
0.184
0.003
CONST.
7.743
3.052
3.046
0.011 • US data: surveys from 72 to 84.
Did fertility rates change over time?
• Look at the coefficients on year
dummies (base year =1972).
Y84 = 1
• Holding other factors fixed, a lady
for obs
from 1984, in 1984 had .545 less kids than
=0
a lady in 1972 on average.
for others.
• Het.robust standard errors do
not change the conclusion.
• WhiteLM test confirms heteroskedasticity. Nonetheless, the
result from WLS (FGLS) leads to similar conclusions.
ie_Slides13 my, School of Economics, UNSW 4 13. Pooled Cross Sections and Panel Data (Ch13) • Policy analysis with pooled cross sections
– Example 13.3 (house price and incinerator)
• North Andover: new incinerator rumours started in 1978,
construction began in 1981 and operation began in 1985. nearinc
=1
for houses •
near
incinerator,
•
=0
for others. Data consist of samples in 1978 and 1981. The price of houses
near the incinerator is expected to fall over time?
other factors
Model (base year = 1978) and interpretation:
log(price) = 0 + / 0 y81 + ( 1 + / 1 y81)∙nearinc + ... + u . • OLS estimation results log(price) = 1.69 + .417 y81 + .086nearinc − .128 y81∙nearinc + ...
(1.97) (.028) (.057) (.051) • Over the period, the price of houses near the incinerator decreased
12.8% (pval = .013) in comparison to houses in other areas. • “Difference in difference”
estimator.
ie_Slides13 Before
Control
Treatment (nearinc)
Treatment – Control my, School of Economics, UNSW 0
0
1 + 1 After (y81)
After  Before
/0
0 + /0
+ /0+ 1 + /1 /0+ /1
0
/1
1 + /1 5 13. Pooled Cross Sections and Panel Data (Ch13) • Policy analysis with pooled cross sections
– Generalisation of Example 13.3
• Natural experiment: an event, typically a government policy,
that changes the environment of individuals.
• Control group consists of individuals unaffected by the
event. (eg. houses far away from the incinerator)
• Treatment group consists of those affected by the event.
(eg. houses near the incinerator)
• Two cross sections, before and after the event, are needed
dT =
treatment
to compensate the fact that the control and treatment groups
dummy;
are not really randomly chosen:
other factors
d2 = “after”
y = 0 + / 0 d2 + ( 1 + / 1 d2)∙dT + ... + u .
dummy.
• Without other factors,
ˆ
δ1 ( y 2,T y 2,C ) ( y1,T y1,C ). (differenceindifference)
ie_Slides13 Before
Control
Treatment (dT)
Treatment – Control my, School of Economics, UNSW 0
0+
1 1 After (d2)
0 + /0
0 + /0+
1 + /1 After  Before
1 + /1 /0
/0+ /1
/1
6 13. Pooled Cross Sections and Panel Data (Ch13) • Panel (longitudinal) data
– A set of panel data is collected by following the same
individuals over a number of time periods.
eg. A panel data set for wage, educ, exper, ...
i. randomly select a sample of people from the
population and collect data for 2009;
ii. the same people are reinterviewed to collect data for
2010, 2011, ....
Time series on wage, educ, exper, ... are collected for
each individual. – Panel data, while costly, allow researchers to address
issues related to unobserved factors, which are
difficult to handle with cross sectional data.
ie_Slides13 my, School of Economics, UNSW 7 13. Pooled Cross Sections and Panel Data (Ch13) • Two period panel data
– Two period panel data {(yit, xit), i = 1,...,N; t = 1, 2 }
• Example (crime2.raw): crime rates (crmrte) and
unemployment rates (unem) for 46 cities in 1982 ( t = 1)
and 1987 (t = 2). Did unem influence crmrte?
For 1987, crmrte = 128.38 − 4.16 unem, n = 46, R2 = .033.
(20.76) (3.42) • The result is likely biased because many relevant
factors (eg. population, police, ...) are not controlled for.
• For panel data, unobserved factors may be grouped
into those that vary over time and those that do not:
yit = 0 + / 0 d2t + β xit + ai + uit, t = 1, 2,
varying over time but
not across section varying across section
but not over time where d2t = 1 if t = 2 and 0 otherwise.
ie_Slides13 my, School of Economics, UNSW 8 13. Pooled Cross Sections and Panel Data (Ch13) • Two period panel data
– Unobserved (or fixed) effects model
• In the unobservedeffects model
yit = 0 + / 0 d2t + β xit + ai + uit , t = 1, 2, ai can be
correlated
with xit .
uit is uncorrelated with xit. – ai is the unobserved effect (fixed effect, invariant to t)
that represents factors specific to individual i;
– uit is the idiosyncratic error that represents unobserved
factors varying both overtime and across section. • Can we just pool data for t = 1, 2 and OLS y on (d2, x)?
No, because ai is usually correlated with xit .
eg. “crime2.raw”, pooled OLS: n = 92, R2 = .012, the composite
crmrteit = 93.42 + 7.94d2t + .427unemit .
error (ai + uit) here
is likely correlated
(12.74) (7.98) (1.188)
with unemit.
ie_Slides13 my, School of Economics, UNSW 9 13. Pooled Cross Sections and Panel Data (Ch13) • Two period panel data
– Firstdifferenced estimation of fixed effects model
• Write the model separately
yi1 = 0 + / 0 0 + β xi1 + ai + ui1 , (t = 1),
yi2 = 0 + / 0 1 + β xi2 + ai + ui2 , (t = 2).
• Subtracting the first equation from the second gives
Δyi = / 0 + β Δxi + Δui ,
Δxi = xi2 − xi1 ,
(firstdifferenced equation) which is a crosssection
model and is free of ai.
• When the idiosyncratic error uit is uncorrelated with
regressors in both periods, the OLS of Δyi on Δxi will
deliver consistent estimators of (/ 0, β). ie_Slides13 my, School of Economics, UNSW 10 13. Pooled Cross Sections and Panel Data (Ch13) • Two period panel data
– Firstdifferenced estimation example
• Example (crime2.raw): first differenced estimation With panel
data, we can
deal with, to
some extent,
unobserved
factors! ie_Slides13 Δcrmrtei = 15.40 + 2.22 Δunemi , n = 46, R2 = .127.
(4.70) (.88)
– there is a positive and significant relationship between
unem and crmrte. A 1% rise in unemployment increases
2.22 crimes per 1,000 people.
– the crimes per 1,000 people increased by 15.4 in 1987,
in comparison to 1981.
– It may be reasonable to assume thae Δui is uncorrelated
with Δunemi, given that the “fixed effects” are removed.
– But, if police effort increased more in cities where
unemployment decreased, then Δui and Δunemi are
correlated. Need to include other observable factors.
my, School of Economics, UNSW 11 13. Pooled Cross Sections and Panel Data (Ch13) • Two period panel data
– Firstdifferenced estimation: a shortcoming
• Consider the log wage model with twoperiod panel
log(wageit)= 0 + / 0 d2t + 1 educit + ai + uit , t = 1, 2,
where ai represents unobserved factors, ability, say.
• The first differenced equation is
Δlog(wagei)= / 0 + 1 Δeduci + Δui .
• However, for most adult workers, Δeduci is zero. The
overall variation in Δeduci is small. The OLS estimator
will have a large standard error (Ch23).
• Using the firstdifferenced estimation is a good idea for
“return to education”. But, frequently, it does not work
well because of the lack of variation in Δeduc.
ie_Slides13 my, School of Economics, UNSW 12 13. Pooled Cross Sections and Panel Data (Ch13) • Policy analysis with two period panel data
– Ex.13.7: effect of drunk drive laws on traffic fatalities
• Data: dthrte (traffic death rate), open (open container laws),
admn (adiministative per se laws) for 51 US states and
district for 1985 and 1990.
• Model: t = 1, 2,
dthrteit = 0 + / 0 d2t + 1openit + 2admnit + ai + uit .
• The firstdifferenced equation is
Δdthrteit = / 0 + 1Δopenit + 2Δadmnit .
• Estimation result: n = 51, R2 = .119
Δdthrteit = −.497 − .420 Δopenit − .151Δadmnit .
(.052) (.206)
(.117)
• open is significantly negative at the 5% level. The laws are
effective in reducing traffic deaths.
ie_Slides13 my, School of Economics, UNSW 13 13. Pooled Cross Sections and Panel Data (Ch13) • Panel data with more than two periods
– Panel data {(yit, xit), i = 1,...,N; t = 1,2,3}
• The fixed effects model for three periods
yit = / 1 + / 2 d2t + / 3 d3t + β xit + ai + uit ,
t = 1,2,3,
where d2t and d3t are year dummies.
– The fixed effect ai is generally correlated with xit.
– The differenced equation
Δyit = / 2 Δd2t + / 3 Δd3t + β Δxit + Δuit , t = 2,3,
where t cannot be dropped. Or, reparameterisation,
Δd2t + 2Δd3t = 1
Δyit = / 2 + . 3 d3t + β Δxit + Δuit , t = 2,3,
for t = 2, 3
where . 3 = (/ 3 − 2/ 2).
– The key assumption about the idiosyncratic error uit is
Cov(Δxit, Δuis) = 0 for all t, s and i,
which is implied by strict exogeneity (TS3).
ie_Slides13 my, School of Economics, UNSW 14 13. Pooled Cross Sections and Panel Data (Ch13) • Panel data with more than two periods
– Panel data {(yit, xit), i = 1,...,N; t = 1,...,T }
• The fixed effects model for T periods
yit = / 1 + / 2d2t +...+ / TdTt + β xit + ai + uit , t = 1,...,T,
where d2t,..., dTt are year dummies.
“T ” here is
assumed
to be much
smaller
than “N ”. ie_Slides13 – The differenced equation takes the form
Δyit = . 2 + . 3d3t +...+ . TdTt + β Δxit + Δuit , t = 2,...,T.
– The key assumption about the idiosyncratic error uit is
Cov(Δxit, Δuis) = 0 for all t, s and i.
– For a give i, “uit is independent across t ” implies “Δuit is
autocorrelated”, whereas “Δuit is independent across t ”
implies “uit is a random walk”.
– To deal with serial correlation in Δuit, PraisWinsten
(GLS) method may be used.
my, School of Economics, UNSW 15 13. Pooled Cross Sections and Panel Data (Ch13) • Panel data and program evaluation
– Example 13.8, Effect of enterprise zone (EZ) program on
unemployment claims: 22 cities over 19801988
• The differenced equation, t = 81,...,88 (T = 9),
Δlog(uclmsit)= . 1 + . 2 d82t +...+ . 8d88t + 1Δezit + Δuit ,
where ezit is one if City i at year t was an EZ.
Dependent Variable: dLog(uclms)
• Estimation result
NObs = 176
– The estimated 1 is .182
and is statistically significant.
– The presence of an EZ
causes 16.6% (= e.182 – 1)
fall in unemployment claims,
which is economically
significant. ie_Slides13 SSR =
7.796
Sigma2 = 0.047
Variable Estimated
Name
Coeffs
DEZ
0.182
D82
0.779
D83
0.033
D84
0.017
D85
0.323
D86
0.292
D87
0.054
D88
0.017
CONST.
0.322 my, School of Economics, UNSW R2 =
0.623
Adj.R2 = 0.605
Stderr
0.078
0.065
0.065
0.069
0.067
0.065
0.065
0.065
0.046 tRatio
2.326
11.950
0.508
0.250
4.845
4.485
0.828
0.262
6.982 pValue
0.021
0.000
0.612
0.803
0.000
0.000
0.409
0.794
0.000 16 13. Pooled Cross Sections and Panel Data (Ch13) • Summary
– Effects of a “policy” may be analysed by using pooled
cross sections or panel data
– For pooled cross sections, “differenceindifference”
estimation is useful.
– For panel data with small T, the firstdifferenced
estimation is useful.
– With panel data, we can address the issue of
“unobserved factors” to some extent. ie_Slides13 my, School of Economics, UNSW 17 A Brief Review A Brief Review
• Data structures
– Cross sectional data
• “Population” and “sample” are clearly defined.
• Valid inference procedures are available under MLR15.
• We are able to handle Heteroskedasticity. – Time series data
• Features of time series data
• Valid inference procedures are available under TS15.
• Similarities and differences in comparison to crosssectional
data – Pooled cross sections and panel data
• Definitions and applications (policy analysis)
ie_Slides13 my, School of Economics, UNSW 18 A Brief Review • Linear regression models
– Definition of linear regression
•
•
•
•
• Basic structure and terminology
Interpretation of parameters and disturbance
Restricted and unrestricted
Reparameterisation
LPM – Applications of linear regression
•
•
•
•
ie_Slides13 Analyse “causal” or “ceteris paribus” effects;
Analyse “association”;
Make predictions;
Analyse policy implications.
my, School of Economics, UNSW 19 A Brief Review • Statistical inference
– OLS
• OLS: choose parameter estimates by min SSR
• The properties of the OLS estimators: unbiased,
consistent, efficient, (asymptotic) normality
• Assumptions: MLR15(6), TS15(6)
• WLS and FGLS – Inference procedures
• Confidence intervals
• Tests: significance (single or joint), heteroskedasticity,
misspecification, autocorrelation, etc..
• Predictions: point and interval
ie_Slides13 my, School of Economics, UNSW 20 About Final Exam • Final exam
–
–
–
–
–
–
– Five compulsory questions for two hours.
No multiple choice questions.
Cover the whole course.
Bring your ID, pens, nonprogrammable calculator.
Statistical tables and some formulae are provided.
Read instructions (front page) carefully.
In your answers:
•
•
•
• ie_Slides13 show your understanding of the material;
write tothepoint answers with clear steps;
specialise general formula to the question at hand;
express and justify your conclusions.
my, School of Economics, UNSW 21 About Final Exam • Final exam
– More specifics:
• Know what are MLR16 and TS16.
• “What is ...” requires good understandings of
notions/definitions/principles/procedures.
• “Outline ...” or “List ...” requires brief descriptions for the steps
involved in a test or estimation procedure.
• Know the meaning of “regressing ... on ...”
Able to interpret estimation results.
• If the level of significance for a test is not given, choose 5%
or 1%.
• The front page, statistical tables and formulae of the exam
paper are given in the following sides.
• Answer easy questions first. Make your writings readable!
ie_Slides13 my, School of Economics, UNSW 22 About Final Exam ie_Slides13 my, School of Economics, UNSW 23 About Final Exam ie_Slides13 my, School of Economics, UNSW 24 ...
View
Full
Document
This note was uploaded on 06/12/2011 for the course ECONOMICS 3291 taught by Professor Professorsnamespublishedtheyarethesoleowners during the Three '11 term at University of New South Wales.
 Three '11
 professorsnamespublishedtheyarethesoleowners
 Economics, Econometrics

Click to edit the document details