This preview shows page 1. Sign up to view the full content.
Unformatted text preview: PADP 8130: Linear Models Introduc)on Spring 2012 Angela Fer:g, Ph.D. Gree:ngs • Partner with someone • Find something you have in common • Introduce each other 1 3 hour survival plan • Snacks – sign up sheet – During break, we’ll go outside to eat/drink • Only 1 1.5 hour of lecture – This means you have to read before class. • Last half of class will involve “prac:ce”: – We are going to work in Stata – Work with data – Work some problems Course Mechanics • Prerequisites: 8120 (or some basic sta:s:cs & matrix algebra) • Required Texts: Greene (any edi:on); Kennedy (any edi:on) • Grading: – Almost weekly homework sets (10%) • Can work in groups, turn in separate work – 2 exams: in class midterm, take home ﬁnal (30% each) – 1 group presenta:on (10%) • Explain a published empirical paper’s results (group grading) – 1 paper (20%) • Data/Methods/Results sec:on of your own original research • Oﬃce hours: Thursdays 10 noon, or by appt • Website: hap://hogwarts.spia.uga.edu/~afer:g/lmodels.html 2 Course Overview • Introduc:on to linear regression techniques to analyze the rela:onship between a hypothesized cause and its eﬀect using diﬀerent types of data • Goal is two fold: – You will be able to understand and cri:cize the research of others – You will be able to do your own research Any ques:ons about course before we dive in? 3 What is Sta:s:cs? Methods for: – Designing and conduc:ng empirical research studies – Describing collected data – Making decisions/inferences about phenomena represented by data What is Econometrics? “Field of economics that applies mathema:cal sta:s:cs and the tools of sta:s:cal inference to the empirical measurement of rela:onships postulated by economic theory.” Arguing causality is a goal For causa:on, we need 3 things: 1. Associa)on: i.e. a sta:s:cally signiﬁcant rela:onship between the two variables we are interested in 2. Time ordering: i.e. cause comes from eﬀect. Diﬃcult for social science because we can’t do experiments and we ohen have “ﬁxed” variables like race. 3. No alterna)ve explana)ons, i.e. is it possible…? 4 For an associa:on, we need an es:mator Key Terms • Parameters: characteris:cs of the popula:on about which we make inferences using sample data (the “truth”) • Sta)s)cs: corresponding characteris:cs of the sample data upon which we base our inferences about parameters (es)mates of the “truth”) • Es)mators: the formula by which the data are transformed into a sta:s:c or an es:mate For :me ordering, we need panel data Types of data • Cross sec)onal: a random sample where each observa:on is a
diﬀerent individual/ﬁrm with informa:on at a point in :me • Time series: separate observa:on for each :me period (e.g. stock prices, GDP, unemployment rate) • Panel or longitudinal: a random sample where each observa:on is followed over :me 5 To address alternate explana:ons, we need mul:ple regression • The rela:onship could be spurious. – e.g. Ice cream consump:on and spousal abuse complaints are associated – should we ban ice cream? No. There is no causal rela:onship, because both are caused by another variable – hot weather. • The rela:onship could work through another variable (a chain rela)onship) – e.g. Being employed may be associated with more preventa:ve health care. Why would that be? There is a media:ng variable – health insurance. Employed people are much more likely to have health insurance and thus get preventa:ve care. • The rela:onship could be condi)onal on another variable. – e.g. As the price of cigareaes goes up, cigareae consump:on goes down for young adults. There is almost no eﬀect for older smokers (who are more likely to be very addicted). Thus the rela:onship between cigareae price and consump:on is condi:onal on age. Criteria of preferred es:mators • A main focus of this course is knowing how to choose an appropriate es:mator • We’ll discuss 5 criteria for judging es:mators; each researcher has to evaluate the importance of each of these criteria for their par:cular project 6 1. Minimizing weighted sum of residuals First, some terms: – Determinis)c: a rela:onship that is exactly determined by some func:on – Stochas)c: a rela:onship that is approximated by some func:on, but includes some error – Disturbance/error/residual term: a term that captures the size of the errors in a stochas:c rela:onship • Not because our func:on is a bad one • Because measures may not be perfect, variability across people Example • Say we could run this experiment: – We have 8 low income families, each with one daughter, age 10, who scored poorly on a standardized school test – We move these 8 families to diﬀerent neighborhoods with diﬀerent poverty rates, and aher a year, have the girls take the test again • We have 2 variables: – Test score 1 year aher the move. This is the dependent variable. This is what we are interested in predic:ng. – Neighborhood poverty rate. This is the independent variable. This is what we think predicts the dependent variable. • Note that we think we have a clear causal “story”. We change the neighborhood poverty rate and the girls’ school performance change. Generally, causality can be more diﬃcult to ascertain. 7 Here’s our data Poverty Test rate score Ava 4% 85 Bella 6% 80 Clara 8% 83 Dolores 10% 75 Evie 12% 60 Fern 14% 70 Gabbie 16% 55 Hermione 18% 50 Girl It appears that higher poverty rates result in lower test scores. Scaaer plot I ﬁt a line “by eye” for now by trying to minimize the diﬀerences between each point and the line. 90 85 Test score 80 75 70 residual 65 Evie 60 55 50 0 2 4 6 8 10 12 14 16 18 20 Poverty rate 8 2. Unbiasedness First, some terms: – Popula)on distribu)on: We don’t know this, but we want to know about it (e.g. the true mean/parameter). – Sample distribu)on: We know this, and calculate sta:s:cs such as the sample mean and the sample standard devia.on from it. – Sampling distribu)on: This describes the variability in value of the sample means amongst all of the possible samples of a certain size. • E.g. draw 2000 repeated samples from the popula:on distribu:on and plot the distribu:on of the 2000 sample means An es:mator is unbiased if the mean of its sampling distribu:on is equal to the true value of the parameter being es:mated. – That is, if we could take a large number of samples, we would get the correct es:mate “on average” using this es:mator. 3. Eﬃciency An es:mator is eﬃcient if its sampling distribu:on has small variance. – The unbiased es:mator with the smallest variance is called the best unbiased es)mator. – Because it is diﬃcult to determine mathema:cally which unbiased es:mator has the smallest variance, and it is more tractable to ﬁnd the unbiased linear es:mator with the smallest variance, econometricians ohen focus on the best unbiased linear es)mator (BLUE). 9 4. Mean Square Error (MSE) MSE is a weighted average of bias and variance so that biased es:mators with really low variance can be considered as well. – Only used when all unbiased es:mators have high variance 5. Asympto:c proper:es An es:mator may be biased or have high variance for small sample sizes, but it may have “good” proper:es in extremely large samples (asympto:cally). – A consistent es)mator can be thought to have, in the limit, zero bias and zero variance (large sample equivalent of the minimum MSE) – An asympto)cally eﬃcient es)mator has a variance that goes to zero faster than the variance of any other consistent es:mator. 10 Organizing principle of econometrics The Classical Linear Regression Model makes 5 assump:ons: 1. The func:onal form is Y = α+βX+ε 2. Zero mean of the disturbance/error 3. Disturbance terms have same variance (homoskedas:city) & are not correlated with one another (non autocorrela:on) 4. Uncorrelatedness of regressor and disturbance (regressors ﬁxed in repeated samples) 5. No exact linear rela:onships between regressors 11 ...
View
Full
Document
This note was uploaded on 03/28/2012 for the course PADP 8130 taught by Professor Fertig during the Spring '12 term at LSU.
 Spring '12
 Fertig

Click to edit the document details