Lecture 16 Resampling Procedures

Lecture 16 Resampling Procedures - Resampling Procedures...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Resampling Procedures Lecture 16 Today Basis of Resampling Approach Randomization Test Other resampling procedures Statistical Inference Basic Question Are these two things related? Or alternatively, are these means different In NHST we make a decision based on probability (p-values) What is the probability of getting this result if the data came by chance alone? Correlation Example (traditional approach) X 5 3 7 9 1 Y 4 6 8 7 5 r = .60 Question: Is this correlation statistically significant? t-test of r vs. 0. t = (.60 0) / SEr SEr = sqrt[ (1 r2) / (N 2) ] = . 46188 t = (.60 0) / .462 = 1.299 t (3) = 1.30, p = .142378 (onetailed) Correlation Example (randomization approach) X 5 3 7 9 1 Y 4 6 8 7 5 r = .60 Question: Is this correlation statistically significant? What is the probability of getting this by chance? What if the X's were paired with the Y's randomly? Correlation Example (randomization approach) X (original) 5 3 7 9 1 r = .60 Y (original) 4 6 8 7 5 X* (random) 3 5 9 7 7 r = .30 Y (original) 4 6 8 7 5 Correlation Example (randomization approach) X (original) 5 3 7 9 1 r = .60 Y (original) 4 6 8 7 5 X* (random) 5 7 3 9 1 r = .20 Y (original) 4 6 8 7 5 Recording Random Results Original (observed) Random 1 Random 2 Random ?? r = .60 r = .30 r = .20 r = .?? How many random orders? How many ways can X be re-ordered? Consider 3 values (A, B, C) ABC BAC BCA CBA CAB ACB So 6 possible orders for 3 values (subjects) How many for 5 subjects? Permutations (factorials) X! = X * (X-1) * (X-2) ... until (X-1) = 1 3! = 3 * 2 * 1 = 6 4! = 4 * 3 * 2 * 1 = 24 5! = 5 * 4 * 3 * 2 * 1 = 120 6! = 720 Correlation Example (cont) So we could compute the correlation for all 120 possible orders of X and record the numbers This set of 120 correlations IS the EMPIRICAL sampling distribution for this data Can compare our observed correlation to this empirical distribution to get a p-value But... 120 is a lot of correlations to compute Could use a computer But what if we have 30 instead of 5 data points? 30! = 2.65253x1032 orders Take a (large) random sample of the possible orders instead (maybe 1000+) 10 Random Sample Results Sample 1 2 3 4 5 6 7 8 9 10 r .30 .50 -.60 -.30 -.50 .30 -.20 .60 -.60 .60 10 Random Sample Results Sample 1 2 3 4 5 6 7 8 9 10 r .30 .50 -.60 -.30 -.50 .30 -.20 .60 -.60 .60 Two of the ten random samples reached our observed value (or greater) So... 2 out of 10 randomly paired sets of Xs and Ys found a correlation of .60 (our observed correlation) or greater Thus, the probability of getting the result we got (r = .60) is approximately 2/10 or .20, or 20% p = .20 Need a large set of randomizations Remember there were 120 possible We only calculated 10 out of the 120 possible Sampling error But with a larger sample our sampling error goes down With the help of a computer we can sample many many more... For 1000 trials First, notice the distribution looked fairly normal (but not perfectly) In this case, 181 out of 1000 random orders produced a correlation of r = .60 or greater, thus there is a .181 probability of getting our result by chance alone. p = .181 Discrepancy So the t for r vs. 0 formula gave us p = . 142 And the randomization test gave us p = .181 Which is more accurate? Theoretical vs. Empirical Distributions Theoretical distribution (e.g. t, F, Z) will be accurate to the degree to which the underlying distribution is actually normal Otherwise, the empirical distribution will be better... In this case the randomization test is probably slightly more accurate (the empirical distribution isn't quite normal). Theoretical vs. Empirical Distributions As sample size (of the actual data set) gets large, these will match even more closely. Other Resampling Approaches Just demonstrated the Randomization Test here... Permutation Test Every possible order is used once and only once Bootstrap Reorder the Xs with replacement Jackknife Without replacement, but using a sub-sample of the data rather than a full sample Other Resampling Approaches Just demonstrated the Randomization Test here... Permutation Test Every possible order is used once and only once Bootstrap Reorder the Xs with replacement Jackknife Without replacement, but using a sub-sample of the data rather than a full sample Correlation Example (bootstrap approach) X (original) 5 3 7 9 1 r = .60 Y (original) 4 6 8 7 5 X* (bootstrap) 5 5 1 7 7 r = -.52 Y (original) 4 6 8 7 5 When can we use resampling techniques? Resampling can be used in place of any statistical inference procedure t-tests, f-tests, chi-square, etc. Advantages of Resampling Has no assumptions (non-parametric test) Other tests have assumptions (e.g. normality, independence, etc.) Can be used for any statistical inference Makes understanding p-values more clear Disadvantages of Resampling Not taught at many places Thus many people don't get it Requires the use of a computer ...
View Full Document

Ask a homework question - tutors are online