Unformatted text preview: Resampling Procedures
Lecture 16 Today Basis of Resampling Approach Randomization Test Other resampling procedures Statistical Inference Basic Question Are these two things related? Or alternatively, are these means different In NHST we make a decision based on probability (pvalues) What is the probability of getting this result if the data came by chance alone? Correlation Example
(traditional approach)
X 5 3 7 9 1 Y 4 6 8 7 5
r = .60 Question: Is this correlation statistically significant? ttest of r vs. 0. t = (.60 0) / SEr SEr = sqrt[ (1 r2) / (N 2) ] = . 46188 t = (.60 0) / .462 = 1.299 t (3) = 1.30, p = .142378 (onetailed) Correlation Example
(randomization approach)
X 5 3 7 9 1 Y 4 6 8 7 5
r = .60 Question: Is this correlation statistically significant? What is the probability of getting this by chance? What if the X's were paired with the Y's randomly? Correlation Example
(randomization approach)
X (original) 5 3 7 9 1
r = .60 Y (original) 4 6 8 7 5 X* (random) 3 5 9 7 7
r = .30 Y (original) 4 6 8 7 5 Correlation Example
(randomization approach)
X (original) 5 3 7 9 1
r = .60 Y (original) 4 6 8 7 5 X* (random) 5 7 3 9 1
r = .20 Y (original) 4 6 8 7 5 Recording Random Results
Original (observed) Random 1 Random 2 Random ?? r = .60 r = .30 r = .20 r = .?? How many random orders? How many ways can X be reordered? Consider 3 values (A, B, C) ABC BAC BCA CBA CAB ACB
So 6 possible orders for 3 values (subjects) How many for 5 subjects? Permutations (factorials) X! = X * (X1) * (X2) ... until (X1) = 1 3! = 3 * 2 * 1 = 6 4! = 4 * 3 * 2 * 1 = 24 5! = 5 * 4 * 3 * 2 * 1 = 120 6! = 720 Correlation Example (cont) So we could compute the correlation for all 120 possible orders of X and record the numbers This set of 120 correlations IS the EMPIRICAL sampling distribution for this data Can compare our observed correlation to this empirical distribution to get a pvalue But... 120 is a lot of correlations to compute Could use a computer But what if we have 30 instead of 5 data points? 30! = 2.65253x1032 orders Take a (large) random sample of the possible orders instead (maybe 1000+) 10 Random Sample Results
Sample 1 2 3 4 5 6 7 8 9 10 r .30 .50 .60 .30 .50 .30 .20 .60 .60 .60 10 Random Sample Results
Sample 1 2 3 4 5 6 7 8 9 10 r .30 .50 .60 .30 .50 .30 .20 .60 .60 .60 Two of the ten random samples reached our observed value (or greater) So... 2 out of 10 randomly paired sets of Xs and Ys found a correlation of .60 (our observed correlation) or greater Thus, the probability of getting the result we got (r = .60) is approximately 2/10 or .20, or 20% p = .20 Need a large set of randomizations Remember there were 120 possible We only calculated 10 out of the 120 possible Sampling error But with a larger sample our sampling error goes down With the help of a computer we can sample many many more... For 1000 trials First, notice the distribution looked fairly normal (but not perfectly) In this case, 181 out of 1000 random orders produced a correlation of r = .60 or greater, thus there is a .181 probability of getting our result by chance alone. p = .181 Discrepancy So the t for r vs. 0 formula gave us p = . 142 And the randomization test gave us p = .181 Which is more accurate? Theoretical vs. Empirical Distributions Theoretical distribution (e.g. t, F, Z) will be accurate to the degree to which the underlying distribution is actually normal Otherwise, the empirical distribution will be better... In this case the randomization test is probably slightly more accurate (the empirical distribution isn't quite normal). Theoretical vs. Empirical Distributions As sample size (of the actual data set) gets large, these will match even more closely. Other Resampling Approaches Just demonstrated the Randomization Test here... Permutation Test Every possible order is used once and only once Bootstrap Reorder the Xs with replacement Jackknife Without replacement, but using a subsample of the data rather than a full sample Other Resampling Approaches Just demonstrated the Randomization Test here... Permutation Test Every possible order is used once and only once Bootstrap Reorder the Xs with replacement Jackknife Without replacement, but using a subsample of the data rather than a full sample Correlation Example
(bootstrap approach)
X (original) 5 3 7 9 1
r = .60 Y (original) 4 6 8 7 5 X* (bootstrap) 5 5 1 7 7
r = .52 Y (original) 4 6 8 7 5 When can we use resampling techniques? Resampling can be used in place of any statistical inference procedure ttests, ftests, chisquare, etc. Advantages of Resampling Has no assumptions (nonparametric test) Other tests have assumptions (e.g. normality, independence, etc.) Can be used for any statistical inference Makes understanding pvalues more clear Disadvantages of Resampling Not taught at many places Thus many people don't get it Requires the use of a computer ...
View
Full Document
 Spring '10
 Ryne
 Psychology, Statistics, 20%, Randomization test, Correlation Example

Click to edit the document details