511_Chap6 - Linear Combinations and Linear Multiple...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Linear Combinations and Linear Multiple Comparisons of Means Means (more ANOVA stuff) Chapter 6 1 Outline Case Studies Inferences About Linear Combinations of Inferences Group Means Group Simultaneous Inferences Some Multiple Comparison Procedures Some Related Issues (Data Snooping) Related 2 Case Study 1 - Discrimination randomized experiment 5 handicap conditions – 1) Amputee – 2) Crutches – 3) Hearing – 4) None – 5) Wheelchair 3 1 HANDICAP WHEELC HA IR NO NE HEA RING C RUTC HES A M P UTEE S C O RE Case Study 1 - Discrimination 9 8 7 6 5 4 3 2 4 Research questions A general question: do subjects systematically general evaluate qualifications differently depending on the candidate’s handicap? the 2. Follow-up: If so, which handicaps produce Follow-up: different evaluations from which others? different 1. • 1. note that this question involves pairwise differences note (like the diet experiment), but not specific pairwise specific differences. differences. Specific comparison of interest (wheelchair-crutches versus amputee--hearing impaired) 5 Case Study 2 – mate Case preferences of fish preferences complicated randomized experiment complicated groups are male pairs of various lengths groups questions of interest: – does percent of time spent with yellow-sword does males differ among male pairs? males – does percent of time spent with yellow-sword does males differ depending on length of males? males – Does percent of time spent with yellow-sword Does males (averaged over male pairs) exceed 50%? males 6 PR O PO R TIO N Case Study 2 – mate preferences Case of fish of 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 28 31 33 34 35 LENGTH 7 Preview In chapter 5 we In – discussed pairwise tests and confidence discussed intervals for comparing any 2 of the I means --based on the pooled estimate of the standard deviation, sp, and pooled degrees of standard and freedom freedom – mentioned the multiple comparison problem – developed the extra-sum-of-squares F-test for developed answering more general questions about the means (e.g. are there any differences among the I means?) the 8 Preview In chapter 6 we – generalize the concept of pairwise generalize comparisons of means (a comparison of 2 means is a special case of investigation of a linear combination of means) linear – suggest some solutions for the multiple suggest comparison problem comparison 9 Linear Combinations Definition: γ = C1µ1 + C2µ2 + … + CIµI The Ci’s are chosen to answer specific research questions. The linear combination is like a mathematical expression of the research question. 10 10 A simple and familiar linear simple combination combination I want to compare the mean for crutches want with the mean for wheelchair – a paired difference just like in the diet example. difference a set of C’s that will do this: C1= C3 = C4 = 0, C2 = -1, C5 = 1 This means we are examining µ5 - µ2 11 11 A llinear combination for the specific inear research question in case study 1 research We want to compare the average of the We wheelchair and crutches means to the average of the hearing and amputee means. of Use C1= -1/2, C2= 1/2, C3= -1/2, C4= 0, C5= 1/2 (1=amp, 2=crut, 3=hear, 4=none, 5=wheel) (1=amp, Note: if the sum of the C’’s equals zero (like this) s the combination is called a ‘contrast’ the 12 12 Inference for Linear Combinations Parameter : γ = C1µ1 + C2 µ 2 + ... + CI µI Estimate : g = C1Y1 + C2Y2 + ... + CIYI 2 C1 C2 CI2 Standard Deviation : SD(g) = σ + 2 + ... + n1 n2 nI Standard Error : SE(g) = sp 2 C1 C2 CI2 + 2 + ... + n1 n2 nI 13 13 Inference for Linear Combinations (g - γ ) t - ratio : t = , d.f. = n - I SE(g) confidence interval : g ± t α (n-I)(1- ) 2 SE(g) 14 14 Example – specific comparison Example for case study 1 for First Perform the usual ANOVA in JMP From this analysis find: – The means for the groups (4.43, 5.92, 4.05, The 4.90, 5.34) 4.90, – The pooled standard deviation = 1.633 – The pooled degrees of freedom = degrees of The freedom for error = n-I = 70 – 5 = 65. freedom 15 15 Hand Calculate g and SE(g) Remember C1= -1/2, C2= 1/2, C3= -1/2, C4= 0, C5= 1/2 1/2 1 1 1 1 g = (− )(4.43) + ( )(5.92) + (− )(4.05) + (0)(4.90) + ( )(5.34) 2 2 2 2 = 1.393 (−1 / 2) 2 (1 / 2) 2 (−1 / 2) 2 (0) 2 (1 / 2) 2 SE ( g ) = 1.633 + + + + 14 14 14 14 14 = 0.436 16 16 Inference: testing γ =0 Inference: t = (g – 0) / SE(g) (g = (1.393-0) / 0.436 (1.393-0) = 3.19 3.19 Is this statistically significant? Compare to a t with degrees of freedom equal to n-I = 65. 65. p-value=.0022 The difference between the averages of The the specific means is real. the 17 17 Inference: confidence interval for γ Inference: g +/- t(n-I)(1-α/2) SE(g) =1.393 +/- 1.997(.436) =1.393 =(0.522, 2.264) =(0.522, 18 18 Other examples – diet study iin diet case study, estimate rate of n increase in average lifetime per calorie increase reduction – can be expressed as (µN/R50-µN/N85)/35 – CN/R50 = 1/35 CN/N85 = -1/35 all other C’s = 0 19 19 Other examples – fish study llinear trend in mate preference with regards to inear length of male pair (i.e. the change in mate preference per mm increase in length of the males) males) if Xi = length of the ith pair, and X is the average of the lengths of all the pairs, average Ci=(Xi- X ) fish example: C28 = -4.5, C31 = -1.5, C33 = .5, C34 = 1.5, C34 = 1.5, C35 = 2.5 20 20 Other Examples – fish study average mate preference C1 = 1/6, C2 = 1/6, … , C6 = 1/6 g = (1/6)(56.41) + … + (1/6)(63.34) = 62.38 62.38 SE(g) = (15.47) sqrt[(1/6)2/16 +…+(1/6)2/14)] = 1.72 1.72 Test γ = 50: t = (62.38-50)/1.72 = 7.20, df = 78 Test 50: p-value < .0001 on average, female fish think yellow swords on on males are cute males 21 21 Simultaneous Inferences consider the handicap study: – the test of the specific planned comparison the planned (wheelchair--crutches versus amputee--hearing impaired) is useful, but we have more questions. impaired) – the overall F-test that there are no differences among the the group means is only marginally useful. It does not tell us which handicaps produce different evaluations from which other handicaps. from – the investigation of which handicaps produce different the evaluations from which other handicaps exemplifies unplanned comparisons. It involves comparing all possible pairs of means. 22 22 Simultaneous inferences 3 kinds of comparisons (ordered by how kinds well standard statistical tools apply): well – planned comparisons – unplanned comparisons – data snooping (look at the sample means; data compare only those that appear to have large differences) differences) 23 23 Why make this distinction? for unplanned and data snooping for comparisons, we will make incorrect inferences (e.g. call differences real when they are not) if we don’’t worry about the t they effects on our inferences of simply doing so many comparisons. so 24 24 The problem with multiple The confidence intervals confidence If we compute 100 independent 95% confidence If intervals for a population with a true mean of 0, about how many of these intervals WILL NOT contain 0 (even though they should)? contain Approximately -- five When we do many confidence intervals, we see When significant results when there are none. significant So, when we do many confidence intervals, So, there are 2 different confidence levels to worry about. about. 25 25 Two Different Confidence Two Levels Levels INDIVIDUAL INDIVIDUAL confidence level is the frequency with which a single interval captures the parameter. parameter. e.g. - 95% confidence e.g. that the interval the includes the true value value FAMILYWISE FAMILYWISE confidence level is the frequency with which all intervals all simultaneously capture their parameters. their e.g. – 95% confidence e.g. that all intervals include all the true values the 26 26 Confidence Intervals that control Confidence the familywise confidence level the usual confidence interval procedure: g +/- t(n-I)(1-α/2) SE(g) it’’s like: s estimate +/- multiplier x SE(estimate) estimate multiplier SE(estimate) where the multiplier comes from the t-table comes confidence intervals that control the confidence familywise confidence level are of exactly the same form, but they use different multipliers multipliers 27 27 Multipliers for different methods Procedure Multiplier LSD t Tukey − Kramer q( I ,n − I )(1−α ) / 2 α ( n − I )(1− ) 2 Scheffe′ Bonferroni ( I − 1) F( I −1,n − I )(1−α ) t α ( n − I )(1− ) 2k Table Table A.2 A.5 A.4 software 28 28 Comments on methods LSD LSD – the usual method – doesn’t control familywise confidence level Tukey-Kramer – good for unplanned comparisons or data good snooping snooping – only works for comparisons like µ1 – µ2 only Scheffe – good for unplanned comparisons or data good snooping, all kinds of comparisons snooping, – price: very conservative 29 29 Comments on methods Bonferroni – not for data snooping (OK for unplanned not comparisons) comparisons) – works for all kinds of comparisons – table A.2 doesn’’t have enough values for this t method – need to use JMP method – excellent for small number of comparisons; excellent not so good for large number not 30 30 Non-Parametric Family-Wise Tests There are familywise tests associated with There the Kruskal-Wallis test. the This is called Dunn’s Test It is not available in JMP 31 31 Com paris on of Multipliers 5 Treatm ents and 65 Degrees of Freedom S ch e f f e M u l ti p l i e r 3.0 B o n fe rro n i T u ke y 2.5 Ne wm a n -K e u l s L SD 2.0 0 5 10 15 20 25 Nu m b e r o f Co n tra sts 32 32 Data Snooping DNA example 2,436 mononucleotides along a DNA 2,436 molecule. (page 166) molecule. 11 breaks occur. 6 of these breaks occur within four of mononucleotides of the TGG trinucleotides. trinucleotides. 33 33 Does TGG seem to cause the Does break? That depends. That If TGG was hypothesized before the data was If observed, then the chances of 6 of the 11 breaks occurring within four units after a TGG is only 0.000243. So it looks significant. only Note, the arbitrary number of four units also Note, needs to be specified before observing the data needs However, TGG was hypothesized after looking However, at the data and observing the patterns before the break. break. 34 34 Computer Simulation This pattern of 2,436 mononucleotides was put This into a computer simulation. into 11 breaks were placed in the sequence at 11 random. This was repeated 1000 times. random. The most frequent trinucleotide within four units The upstream from the breaks was observed. upstream How often did the most frequent occur six or How more times out of the 11? (highly significant ignoring multiple comparison problem) ignoring 35 35 Computer Simulation Computer Continued. Continued. Over 300 out of 1000-- therefore the Over presence of six TGG’’s upstream from the s presence breaks is not statistically significant. breaks Data snooping went on here – and data Data snooping is OK, if we do not quote psnooping values. Do you have any examples of data Do snooping from your field? snooping 36 36 The End 37 37 ...
View Full Document

Ask a homework question - tutors are online