Unformatted text preview: Bootstrapping in R Commander
I have added three new functions to R Commander to allow you to do bootstrap confidence intervals and permutation tests. They are found under Statistics > Bootstrap. If you are running R Commander on your own computer, you will need to replace your copy of the RcmdrPlugin.ESM library with the contents of R:\Fall2007\ESM 206A\R\library\RcmdrPlugin.ESM. Bootstrap CI
This requires that you have an active dataset that includes at least one numeric variable. It allows you to calculate bootstrap confidence intervals for the mean, standard deviation, or variance of a variable. The input dialog looks like this: You need to select a variable, choose one of the statistics, and adjust your desired confidence level and number of bootstrap replicates. Hitting OK gives you text output and a graph. Here's an example of the graph: Histogram of t
80 0.04 0.03 Density 0.02 t* 0.01 0.00 20 30 40 50 t* 60 70 80 30 3 40 50 60 70 2 1 0 1 2 3 Quantiles of Standard Normal The histogram shows the distribution of the bootstrapped statistic, with a dotted line showing the value of the statistic in the data. The QQ plot helps you assess whether the distribution of the bootstrapped statistic is normal (in this case it looks pretty close) if it is substantially non normal then you should distrust CIs based on t statistics and instead use the bootstrap CI. The text output looks like this:
ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = Dataset$Chlorophyll.a, statistic = function(bdata, i) { mean(bdata[i]) }, R = 1000) Bootstrap Statistics : original bias t1* 50.304 0.008712 std. error 9.54919 BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on 1000 bootstrap replicates CALL : boot.ci(boot.out = bstat, conf = 0.95, type = c("norm", "perc", "bca")) Intervals : Level Normal Percentile BCa 95% (31.60, 69.03 ) (32.59, 69.76 ) (33.66, 71.26 ) Calculations and Intervals on Original Scale The things to look for are the bias (under "Bootstrap Statistics") and the confidence intervals. As a rule of thumb, I would suggest being concerned about bias only if it is more than 20% of the standard error. The three confidence intervals are the ones we discussed in class; "Normal" uses the t distribution and the standard error of the bootstrap replicates. If you rerun the analysis on the same dataset, you will see that the values for the bias and the CIs are a little different each time. Try increasing the number of bootstrap replicates to see how many you need for consistent answers. Bootstrap Regression Coefs
This requires that you have fit an OLS or GLM regression, and have the model selected. It calculates bootstrap confidence intervals for all the regression coefficients. In the dialog box, you only need to specify the confidence level and the number of bootstrap replicates. It takes quite a bit longer run than the simple bootstrap, just be patient. For each coefficient, you will get text and graphical output just like the previous function. The text output is labeled by coefficient; I wasn't able to get labels to print on the graphs, so you will have to figure out which is which by comparing the locations of the vertical dashed lines in the histograms with the values in the regression output. In addition, there is another graph produced that shows scatterplots of the individual bootstrap estimates, allowing you to see the correlation between the parameters. Thus, in the example below (from a regression Chlorophyll.a ~ Nitrogen + Phosphorus), bootstrap replicates that had lower than average estimates of the Nitrogen coefficient had higher than average estimates of the intercept, and vice versa. 2 1 0 1 2 3 4 5 40  5                Nitrogen 1 0 1 2 3 4 2                       0.55 0.55 0.25 0.35                     80 60 40 20 0 20 40 0.25 0.35 0.45 0.45 Phosphorus Permutation Test
This function provides a permutation test for the F statistic of an OLS model. You need to have created the OLS model already; then you simply specify the number of bootstrap replicates, and the function outputs a P value for the null hypothesis that all the regression coefficients are zero. Because I was running short on time, I did not write a separate function to do a permutation test to compare two or more means. But you can actually do this with the current function. Remember that the twosample ttest is like an ANOVA with only two levels in the independent model; and an ANOVA is just an OLS regression with only categorical independent variables. So if you want to test the null hypothesis that two groups of observations are samples from the same distribution, set up your data so that all the observations are in one column, and another column is a factor variable that specifies which group each observation comes from. Then set up a linear model with that factor variable as the only independent variable; the coefficient for that model is the difference in the sample means of the two groups. If you now run the permutation test on this model, you will get the same result as if you had done a permutation test on the differences of the means. If your factor variable has more than two levels, then you will be testing the null hypothesis that all the groups are samples from identical populations. 80 60 40 20 0 20 (Intercept) ...
View
Full Document
 Spring '08
 KENDALL,BERKLEY
 Environmental Science, Statistics, Normal Distribution, 20%, Commander, bootstrap replicates

Click to edit the document details