This preview shows page 1. Sign up to view the full content.
Unformatted text preview: iduallevel protest behavior
for a group of about 2300 people living in 34 states of the USA. Some questions tend to arise at this point in data analysis: How
precise is our estimate of the relationship between inequality change and protest? How much would we expect this result to be
diﬀerent if we took a new sample of the same size from the same states?
We do not know the answers to these questions for certain, but we can make a guess at them using an approximation to the sampling
distribution of the coeﬃcients.
How would a sampling distribution help us answer questions like the ones just posed?
A sampling distribution for our coeﬃcient would tell us how the coeﬃcient would vary if we could estimate the same model
on diﬀerent samples.
6. Now Kaplan shows us both how to directly generate a sampling distribution when we observe an entire population (illustrated in his
Chap 5 using the data on running times) and also how to approximate the sampling distribution using a resampling distribution,
which involves drawing new samples of our sample with replacement. Kaplan does a great job explaining about this in the reading
assigned for today. What we’ll do in class is get some practice with it using political science data. Let us try this a few times to see
how our coeﬃcient estimate might vary.
First, load Kaplan’s library:
library(mosaic) Then run the following three or four times.
lm(protest ~ GiniChange79to09.01, data = resample(nes08gini.df)) What is the minimum and maximum value for relationship between inequality and protest across your diﬀerent bootstrap samples?
mini.bs < do(5) * lm(protest ~ GiniChange79to09.01, data = resample(nes08gini.df))
print(mini.bs)
1
2
3
4
5 Intercept GiniChange79to09.01 sigma r.squared
0.140
0.1077 0.390 0.003147
0.164
0.0411 0.386 0.000487
0.133
0.1560 0.400 0.005870
0.129
0.1557 0.396 0.005878
0.147
0.1259 0.401 0.003909 range(mini.bs$GiniChange79to09.01)
[1] 0.0411 0.1560 7. Now, let us create an actual resampling distribution (often called a “bootstrap sampling distribution”). Write a comment on each
line of code below, explaining what is happening. (The last line will take a bit of time to run once you send it to the R console.)
mylm.fn < function() {
thelm < lm(protest ~ GiniChange79to09.01, data = resample(nes08gini.df))
thecoef < coef (thelm)
return(thecoef)
}
lm1.bs < do(100) * mylm.fn() 8. Now let us use our approximation to the sampling distribution to assess the precision of our estimation of this relationship. What
does the following quantity tell us about the precision of our estimate?
sd(lm1.bs$GiniChange79to09.01)
[1] 0.0415 From sample to sample the typical variation of our estimate of the relationship is about 0.04.
So, even though we estimated a relationship of 0.1 we could easily see relationships from 0.05 to 0.14 Our estimate is precise
enough to be able to distinguish positive from negative relationships (and either of them from a ﬂat or 0 relationship).
9. What is another word for “standard deviation of a sampling distribution”?
Standard error. Class 17 — Political Science 230
Formalizing Uncertainty with Standard Errors and (Re)Sampling Distributions— October 22, 2013— 3...
View
Full
Document
 Fall '08
 Staff
 Political Science

Click to edit the document details