Bootstrap Resampling Applied to Normal and Non-normal  Regression Data KNNL 11.5

20 40 60 80 100 0 10 20 30 leaves Leaf Length
The Mean (Average Leaf Length) = = N i i X N 1 1 μ = = n i i x n x 1 1 Sample Sample Population Population

Measures of Variation for Two Different, but Related, Populations x n s s x x x n s x n i i of Variation of Variation ) ( 1 1 2 2 1 2 2 = - - = =
) 2 / ( 1 ) 2 / ( 1 ) ( ˆ α - - ± t x E S t x t-Distribution Confidence Intervals qt(.975,30)    2.042272 qt(.025,30)   -2.042272

To determine 95% confidence intervals from repeated sampling sample.means=rep(0,1000) for(i in seq(1000)) { sample.means[i]=            mean(sample(leaves,25)) } hist(sample.means)
60 65 70 75 80 0 50 100 150 200 250 sample.means Distribution of Sample Means

97.5 th quantile, 2.5 th quantile sorted.means=sort(sample.means) sorted.means[25]    63.2 sorted.means[975]    73.2 So, 95% CI is (63.2,73.2) The population mean will fall within a CI 95% of the time  under repeated sampling.
Using single sample calculations my.sample=sample(leaves,25) mean(my.sample)      66.8 sqrt(var(my.sample)/length(my.sample))       3.109126 qt(.975,25)       2.059539 66.8-2.06*3.1    # Lower bound      60.414 66.8+2.06*3.1    # Upper bound      73.186 Compare  (63.2, 73.2)  with  (60.4, 73.2)  from above. Both are estimates of what we would expect under  repeated sampling.

Resampling Confidence intervals, hypothesis testing  and many other types of inferences make  use of the idea of the probability under
