Lecture9.pdf - The Bootstrap Rizzo Chapter 7 BTRY/STSCI...

This preview shows page 1 out of 35 pages.

Unformatted text preview: The Bootstrap Rizzo, Chapter 7 BTRY/STSCI 4520 1 / 35 2 / 35 Generating A Sample Distribution We have seen how to simulate in order to conduct tests in R. We can do the same thing for other measures of uncertainty. Eg: condence intervals: Simulate according to the model with parameters θ. Estimate parameters. Look at the distribution of parameter estimates over many simulations. Note: simulation depends on the parameters available. We can also evaluate bias. 3 / 35 Example: Estimating a Variance Why we divide by n − 1: consider σ ˆ2 = 1 n Pn nsim = 1000 sigest = rep(0,nsim) i=1 (xi − x¯)2 for n = 4: for(i in 1:nsim){ X = rnorm(4) sigest[i] = mean( (X-mean(X))^2 ) } The expected value of the estimate is estimated by mean(sigest) and the bias (since the true variance is 1) is > bias = 1-mean(sigest) [1] -0.2484821 4 / 35 Working With Estimates Simulation allows us to do a number of things 1 Bias correction: since we know the bias we can correct our estimate σ ˆ ∗2 = σ ˆ 2 − bias σ ˆ 2 (sighat=sigest[1]). > sighat.nobias = sighat-bias [1] 1.057343 for one real sample giving 2 Standard errors for estimate, from standard deviations of simulated samples > sighat.sd = sd(sigest) [1] 0.6474583 3 Condence intervals based on normal theory: > sighat.nobias + c(-1,1)*qnorm(0.975)*sighat.sd [1] -0.2116517 2.3263383 5 / 35 Alternative Condence Intervals Distribution of sigest strongly skewed: symmetric condence intervals not appropriate. Dening a lower limit We want P(ˆ σ2 bα/2 : − bα/2 > σ 2 ) = α/2. This is the same as Choose bα/2 P(ˆ σ 2 − σ 2 > bα/2 ) = α/2. to be the 1 − α/2 quantile of σ ˆ 2 − σ2 from simulation b.lower = quantile(sigest-sigma,0.975) b.upper = quantile(sigest-sigma,0.025) Analogous for upper limit. Condence interval is then c(sighat - b.lower, sighat - b.upper) Note: no bias correction (why?) 6 / 35 Condence Intervals Continued CI's reverses and shifts the distribution of σ ˆ2. ⇒ σ ˆ2 has a long right tail (can be much too high) So lower side of condence interval needs to be longer to include true σ2. Note: simulation procedure work for any statistic that estimates a parameter t(X1 , . . . , Xn ) θ. 7 / 35 Making Fewer Assumptions Some important limitations to value of simulation: Only valid under the parameters you use to simulate. Bias of σ ˆ 2 is σ 2 /n  changes with σ 2 . But we don't know the parameters  we just have data. Can always plug in our estimate, if that is biassed, our estimate of bias is also biassed. Only valid assuming the distribution you simulate from represents the data generating mechanism. If our data isn't Gaussian, simulation above is not correct. Maybe we could make more use of the data. 8 / 35 The Bootstrap Introduced by Efron (1979), > 26,000 citations from all of NSF's funding areas. Arguably most important statistical development in last 50 years. Simple idea: I want to simulate from distribution However, I do have data ⇒ F, but I don't know it. use this as an estimate of F! 9 / 35 Empirical Estimates of a Distribution Cumulative distribution function: F (x) = P(X ≤ x) Estimate from data: Fn (x) = 1 n n X I (Xi ≤ x) i=1 jumps 1/n at each observation. Fn (x) converges to Interpretation of For each i, Y Y F (x) everywhere as drawn from takes value Xi Practically: to sample from n → ∞. Fn (x): with probability Fn , choose one 1/n. Xi at random. To sample more re-sample with replacement: each time you choose an Xi , keep it in the data set for the next sample. 10 / 35 Sampling Schemes Some general terminology (informal) sample from F generate a random variable distribution F X according to (later in 3520) resample From a data set X1 , . . . , Xn , choose one at random. with replacement when I choose an Xi , keep it in the data for the next sample. without replacement when I choose an Xi , take it out of the data. Dierent types of samples: bootstrap resample subsample resample n observations with replacement k <n observations without replacement. Note: bootstrap samples will have repeated values; a subsample of size n is a permuation. 11 / 35 The Bootstrap Recipe X1 , . . . , Xn , parameter θ : Given and a statistic t(X1 , . . . , Xn ) that estimates a Repeat B times: Draw a bootstrap sample X1∗ , . . . , Xn∗ by resampling X1 , . . . , Xn with replacement. Record Tb = t(X1∗ , . . . , Xn∗ ). Use T1 , . . . , TB to represent To = t(X1 , . . . , Xn ). the sampling distribution of 12 / 35 sample Will resample objects with or without replacement and will return a vector of required size # A permutation of the numbers 5:10 sample(5:10) # A subsample of size 3 sample(5:10,size=3) # A bootstrap sample sample(5:10,replace=TRUE) # A subsample of size 3 with replacement sample(5:10,size=3,replace=TRUE) If N an integer sample(N) is the same as sample(1:N). 13 / 35 Example Law data: average LSAT and GPA for 15 law schools Interest is in correlation between LSAT and GPA. cor(law) obs.cor = cor(law)[1,2] nboot = 1000 boot.cor = rep(0,nboot) n = nrow(law) for(i in 1:nboot){ boot.cor[i] = cor(law[sample(n,replace=TRUE),])[1,2] } 14 / 35 Condence Intervals Number of possible ways to do condence intervals. Simplest: normal approximation Compute estimates T1 , . . . , TB . Obtain the standard deviation v u B u1 X 2 t Tb − T¯ se(T ˆ )= B b=1 Compute the condence interval as (To − zα/2 se(T ˆ ), To + zα/2 se(T ˆ )) for zα/2 the normal critical value. Assumes that To really is approximately normally distributed, but it does mean that you don't have to know its variance. 15 / 35 Bias Correction Some statistics are biassed; to assess this we consider the average bootstrap replicate T¯ = 1 B then we can measure the bias in B X Tb b=1 T by bias = T¯ − To and we can even correct our estimate To by subtracting o the bias Toc = To − bias = 2 ∗ To − T¯ and update condence intervals (Toc − zα/2 se(T ˆ ), Toc + zα/2 se(T ˆ )) 16 / 35 Example Continued # Estimate the bias > cor.bias = mean(boot.cor)-obs.cor [1] -0.004983282 # Bias-Corrected Estimate > obs.cor.c = obs.cor-cor.bias [1] 0.7813578 # Boostrap Standard Error > cor.se = sd(boot.cor) [1] 0.1340546 # Bootstrap Corrected Confidence Interval > obs.cor.c + c(-1,1)*qnorm(0.975)*cor.se [1] 0.5186155 1.0441000 17 / 35 Condence Intervals II Can also use the empirical distribution of the bootstrap statistics. Percentile bootstrap intervals: (T(α/2) , T(1−α/2) ) where T(α) is the αth quantile in T1 , . . . , TB . Alternatively, same recipe as for simulation condence intervals: P(T − bα/2 > θ) = α/2. Use bootstrap sample for distribution of T , To in place of θ . Yields bα/2 = T(1−α/2) − To and condence interval Lower bound is bα/2 such that (2To − T(1−α/2) , 2To − T(α/2) ) Same 'reverse the distribution' eect. Unlike simulation-based CIs, bias correction is important here. (Why?) 18 / 35 Continuing Example # Quantiles of Bootstrap Distribution b0.025 = quantile(boot.cor,0.025) b0.975 = quantile(boot.cor,0.975) # Percentile Bootstrap Confidence interval > c(b0.025,b0.975) 0.4658674 0.9648874 # Standard Bootstrap Confidence Interval > 2*obs.cor.c - c(b0.975, b0.025) 0.5978281 1.0968481 Upper limit ≥1 can be thresholded (remember, this interval is just meant to capture the truth, 95% of the time). Bootstrap test for ρ ≤ 0.5 rejects  null hypothesis parameter is not within condence interval. 19 / 35 Yet Further Intervals Variants (increasingly elaborate) proposed to improve condence intervals. Bootstrap t -interval [(t0 − t1∗−α/2 se(t ˆ 0 ), (t0 − t1∗α/2 se(t ˆ 0 ))] se(t ˆ 0 ): bootstrap estimate of standard error. ∗ tα/ ˆ b ). 2 quantiles of bootstrap t statistic: tb = (Tb − To )/se(T se(T ˆ b ): estimate of standard error for each Tb ; often a bootstrap within a bootstrap. Bias-corrected and accelerated bootstrap (BCa): corrects for both bias and skewness in bootstrap distribution. Basic reasoning: using estimates of standard errors requires smaller B, and has better statistical properties than quantiles of bootstrap distribution. Yet more variants: beyond this course. 20 / 35 When Bootstraps Break Bootstrap doesn't work for all statistics t(X1 , . . . , Xn ) = minij |Xi − Xj | Minimum distance between points in the data set. Bootstrap sample: minimum distance is 0 (tied observations) Real data  almost never ties for continuous variables. Most cases of failure X1 , . . . , Xn t(X1 , . . . , Xn ) is not a smooth function of (cannot dierentiate with respect to Subsampling often a good alternative (but Xi ). k <n might be an issue, as it is for this example). Rare; most cases are pathological (although recent statistical methods are a problem). 21 / 35 Conditionally-Specied Models Frequently we only describe part of the data generating mechanism. Eg: regression models yi = β0 + β1 xi + i = xi β + i with i ∼ N(0, σ 2 ). What about xi ? Treated as xed (often chosen by experimenter). Or, frequently, For large n xi ∼ h(x), but h not specied. (and in practice) very little variance in randomness in βˆ due to xi . 22 / 35 Example: Multiple Regression In the lab, we looked at simple linear regression. For multiple regression yi = β0 + β1 xi 1 + β2 xi 2 + · · · + i where the i ∼ N(0, σ 2 ). Also written as y = Xβ +  Squared error is now represented by SSE (β) = (y − X β)T (y − X β) so that the derivative of squared error is dSSE (β) = 2(X T X β − X T y) dβ which is zero at β = (X T X )−1 X T y 23 / 35 A Data Set 12 children's height, weight and arterial distance from the wrist to the heart. Used as guidelines for inserting catheters. Model distance in terms of height and weights. > mod = lm(dist~.,data=heart) > summary(mod) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 24.50804 5.12461 4.782 0.000998 *** height 0.05091 0.21060 0.242 0.814396 weight 0.25495 0.10326 2.469 0.035624 * 24 / 35 A Simulation We'll used the estimated coecients and residual standard error to simulate at the observed covariates. beta = mod$coef # Start from observed coefficients # Design matrix X = as.matrix(cbind(rep(1,12),heart[,1:2])) # Predicted values (also from mod$fit) pred = X%*%beta # Residual standard error sigma = summary(mod)$sigma 25 / 35 Vectorizing A Simulation Create a large matrix where response for each data set is in one column. Recall that y = X β + , repeat the same but create a matrix of simulated Xβ over each column, . nsim = 1000 Ysim = matrix(pred,12,nsim,byrow=FALSE) + matrix(rnorm(12*nsim,sd=sigma),12,nsim) Now estimate β; the solve command inverts a matrix: beta.sim = solve( t(X)%*%X, t(X)%*%Ysim) Because the estimate is linear in Ysim we can obtain them all at once. 26 / 35 Continuing Simulation results: Lovely Demonstration of eect of correlated covariates. Bootstrap options Re-sample (xi , yi ) pairs and do standard bootstrap. Try to re-sample the i  corresponds to our model. 27 / 35 Residual Bootstrap Basically restricted to linear regression models: y 1 Estimate = Xβ +  βˆ. 2 Estimate residuals  ˆi = yi − xi βˆ, i = 1, . . . , n. 3 Bootstrap residuals to produce  ˆ∗i , i = 1, . . . , n . 4 Add bootstrapped residuals back onto predictions 5 yi∗ = xi βˆ + ˆ∗i , i = 1, . . . , n. ˆb for bootstrapped (xi , y ∗ ) Estimate β i for b = 1, . . . , B . Now all the bias, standard error, condence interval statistics can be calculated with the same recipe. Why Residual Bootstrap? More stable, avoids ties in the xi , doesn't change a xed design. 28 / 35 Continuing The Example # Estimate errors eps.hat = heart$dist - pred # Now bootstrap residuals eps.boot = eps.hat[sample(12,size=nsim*12,replace=TRUE)] eps.boot = matrix(eps.boot,12,nsim) # Create data Y.boot = matrix(pred,12,nsim,byrow=FALSE) + eps.boot # And re-estimate beta.boot = solve( t(X)%*%X, t(X)%*%Y.boot) 29 / 35 Usual Statistics Calculate the same statistics as before # Bias > biases = beta - apply(beta.boot,1,mean) (Intercept) height weight 0.223436803 -0.007668818 0.002312965 # Standard Error > se.boot = apply(beta.boot,1,sd) (Intercept) height weight 4.63035973 0.18945029 0.09076793 Biasses are probably not real but > beta.c = beta-biases 24.28460334 0.05857949 0.25263440 30 / 35 Condence Intervals # Lower and Upper Bounds >lb = apply(beta.boot,1,quantile,0.025) >ub = apply(beta.boot,1,quantile,0.975) lb ub (Intercept) 14.09073439 33.0530314 height -0.30957338 0.4678601 weight 0.06940625 0.4273275 # Confidence Interval >cbind(2*beta - ub, 2*beta-ub) [,1] [,2] (Intercept) 15.96304893 34.9253459 height -0.36603873 0.4113947 weight 0.08256726 0.4404885 31 / 35 Bootstrap Tests of Signicance We can also test how many times the bootstrap falls below 0 > pvals = apply(beta.boot<0,1,mean) > pvals (Intercept) height weight 0.000 0.387 0.004 But note that this does not provide a global test for the signicance of the regression. More information: correlation of βˆ > cor(t(beta.boot)) (Intercept) height weight (Intercept) 1.0000000 -0.9074095 0.6355241 height -0.9074095 1.0000000 -0.8834964 weight 0.6355241 -0.8834964 1.0000000 32 / 35 Parametric Bootstrap Residual bootstrap is not always applicable: Eg: logistic regression;  yi ∈ {0, 1}, log P(yi = 1) P(yi = 0)  = xi β Doesn't make sense to do look at residuals. Instead, estimate βˆ and create a new data set by generating each yi∗ according to e xi β ˆ P(yi∗ = 1) = 1 + e xi βˆ Isn't this just estimate parameters and then simulate data from the model? Yes! But naming is part of good salesmanship. 33 / 35 Example Linear regression simulation above; this is exactly a parametric bootstrap for linear regression. 34 / 35 Summary Simulation (parametric bootstrap) a tool for evaluating condence intervals about estimated parameters. Bootstrap: avoids having to know distribution of data. Useful for Bias Correction Estimate standard errors Condence intervals Residual bootstrap for linear regression models. But Justication is asymptotic: requires enough data that empirical distribution approximates truth. Won't work for every problem or every statistic (but most standard stats are OK). 35 / 35 ...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture