How much would we expect this result to be dierent if

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: idual-level protest behavior for a group of about 2300 people living in 34 states of the USA. Some questions tend to arise at this point in data analysis: How precise is our estimate of the relationship between inequality change and protest? How much would we expect this result to be different if we took a new sample of the same size from the same states? We do not know the answers to these questions for certain, but we can make a guess at them using an approximation to the sampling distribution of the coefficients. How would a sampling distribution help us answer questions like the ones just posed? A sampling distribution for our coefficient would tell us how the coefficient would vary if we could estimate the same model on different samples. 6. Now Kaplan shows us both how to directly generate a sampling distribution when we observe an entire population (illustrated in his Chap 5 using the data on running times) and also how to approximate the sampling distribution using a resampling distribution, which involves drawing new samples of our sample with replacement. Kaplan does a great job explaining about this in the reading assigned for today. What we’ll do in class is get some practice with it using political science data. Let us try this a few times to see how our coefficient estimate might vary. First, load Kaplan’s library: library(mosaic) Then run the following three or four times. lm(protest ~ GiniChange79to09.01, data = resample(nes08gini.df)) What is the minimum and maximum value for relationship between inequality and protest across your different bootstrap samples? <- do(5) * lm(protest ~ GiniChange79to09.01, data = resample(nes08gini.df)) print( 1 2 3 4 5 Intercept GiniChange79to09.01 sigma r.squared 0.140 0.1077 0.390 0.003147 0.164 0.0411 0.386 0.000487 0.133 0.1560 0.400 0.005870 0.129 0.1557 0.396 0.005878 0.147 0.1259 0.401 0.003909 range($GiniChange79to09.01) [1] 0.0411 0.1560 7. Now, let us create an actual resampling distribution (often called a “bootstrap sampling distribution”). Write a comment on each line of code below, explaining what is happening. (The last line will take a bit of time to run once you send it to the R console.) mylm.fn <- function() { thelm <- lm(protest ~ GiniChange79to09.01, data = resample(nes08gini.df)) thecoef <- coef (thelm) return(thecoef) } <- do(100) * mylm.fn() 8. Now let us use our approximation to the sampling distribution to assess the precision of our estimation of this relationship. What does the following quantity tell us about the precision of our estimate? sd($GiniChange79to09.01) [1] 0.0415 From sample to sample the typical variation of our estimate of the relationship is about 0.04. So, even though we estimated a relationship of 0.1 we could easily see relationships from 0.05 to 0.14 Our estimate is precise enough to be able to distinguish positive from negative relationships (and either of them from a flat or 0 relationship). 9. What is another word for “standard deviation of a sampling distribution”? Standard error. Class 17 — Political Science 230 Formalizing Uncertainty with Standard Errors and (Re)Sampling Distributions— October 22, 2013— 3...
View Full Document

Ask a homework question - tutors are online