slides 4-chapter 11part1

# slides 4-chapter 11part1 - 3 Chapter 11 — Part One The...

This preview shows pages 1–12. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 3 Chapter 11 — Part One The Analysis of Variance (ANOVA) i Outline . Sections 11.2 — 11.5 will cover: . What is ANOVA? - One-Factor ANOVA: Completely Randomized Design . Tables used: I Table 6 i What is ANOVA? . The responses obtained from an experiment will always exhibit some amount of variability. . This variability will be a result of a few, or possibly many, factors. How certain factors affect responses are of particular interest to the experimenter. . ANOVA partitions the total variation in the experimental responses into individual variability measurements attributable to the factors of interest. Completely Randomized Designs: i A One Factor Analysis of Variance . We are interested in k populations with unknown means y,,yz,...,yk. We wish to infer something about the relationship between these k unknown population means. In order to accomplish this, we take k independent SRSs, one from each population, and make inferences based on the sample data. . This is known as a completely randomized design (CRD). It is an extension of the independent random sample inference procedures we learned in Chapter 10. 4 Completely Randomized Designs: A Note On Random Sampling - There are two possible situations: 1. There is a physical population from which to select our responses. In this case, we simply take k independent SRSs. 2. There is no physical population, and responses can only be obtained after the experimental treatments have been applied. In this situation, we use randomized assignment of subjects to the treatments. NOTE: If the sample sizes 17,, n2, nk are all equal, then the data set is referred to as being balanced. Completely Randomized Designs: i Some Terminology . The factor is an independent variable whose values are controlled by the experimenter. In a CRD there is only one factor of interest — the population from which our measurements are obtained. . The values taken on by the factor are the levels. In a CRD, the levels correspond to the k populations. A treatment in an experiment is simply a combination of factor levels. Since we only have one factor in a CRD, we can use the terms levels and treatments interchangeably. Completely Randomized Designs: i Examples of Applications . We may wish to study the effects of 3 different brands of gasoline on gas mileage. - We could compare 6 different types of fertilizers to decide which one produces the greatest average yield. - We could compare 4 different methods of studying to determine which method is most effective. . We could compare 18 different mutual fund investment strategies. Completely Randomized Designs: Examples of Factors, Levels, and i Responses - Comparing mileage of 3 different brands of gasoline: FACTOR: gasoline brand LEVELS: Shell, Sunoco, Petro-Canada RESPONSE: mileage Comparing effectiveness of 4 studying methods: FACTOR: studying method LEVELS: short frequent sessions, infrequent long sessions, one all-nighter, or don’t study at all RESPONSE: grade Completely Randomized Designs: i Model Assumptions . We make the following assumptions for a completely randomized design: . The observations within each of the k populations are normally distributed. . All k populations share a common variance cr2 . Completely Randomized Designs: i Model Assumptions - A completely randomized design is based upon the modeling assumption xi], = #1, + 5W . That is, the observation xU from the i—th population consists of the unknown population mean ﬂ,» plus a random error (noise) term 81]. . These error terms are independently distributed as 8],], ~N(0,0'2) and so xi] ~N(,ui,0'2) Completely Randomized Designs: i The Hypotheses - We wish to perform the following hypothesis test: Howl=yz=~~=m (all the population means are equal) versus HA :,u,. at ,u]., for some iand j. (at least two of the population means are not equal) i A Point Estimate of #1. . To estimate each population mean ,ui we simply use the sample mean from the corresponding sample. That is, "i inj A j:1 [£42992 : n. I i . NOTE: Ti represents the sum of the observations for treatment (level) i. 12 i Exercise 1: Watching Paint Dry . A study was conducted to determine if the drying time for a certain paint is affected by the type of applicator used. - The data in the table on the next slide represents the drying time (in minutes) for 3 different applicators — brush, roller, and pad — when the paint was applied to standard wallboard. . The factor is the applicator. . The 3 levels or treatments are brush, roller, and pad. 13 i Exercise 1: Sample Data applicator (factor) brush (i=1) roller (i=2) 39.1 31.6 39.4 33.4 31.1 30.2 33.7 41.8 30.5 34.6 T]: 208.4 T2: 137 371 = 34.73 3?, = 34.25 Completely Randomized Designs: Sources of Variation . Even if all the treatment means are equal, we can still expect variation in the value of the sample means. - There are two potential sources of variation to consider: (i) variation due to actual differences between the various treatments, and (ii) variation (within treatments) due to error (noise). . ANOVA answers the question, “Is the difference between the sample means clue simply to error (noise) in the measurements or to an actual difference in the treatments themselves?” Completely Randomized Designs: Total Sum of Squares . The total sum of squares (Total SS) is a measure of the total variability in the data set. Total SS 2 2k: (xii. — if i:l 1:1 Completely Randomized Designs: i Total Sum of Squares . where k n, 2 k 2 z (22) 1'21 CM = —[ H "' 1’1 1’1 andn=nl+n1+...+nk. Completely Randomized Designs: Total Sum of Squares Total sum of squares Total SS Treatment sum of squares Error sum of squares SS T SSE Completely Randomized Designs: i Sum of Squares for Treatments . The sum of squares for treatments (SS1) is a measure of the variability between the various treatments. Completely Randomized Designs: i Sum of Squares for Error - The sum of squares for error (SSE) is a measure of the variability within the various treatments. k n, k T2 SSE=TotaISS—SST=ZZX;_Z I i=1 _/=1 i=1 "i =(nl—1)512+(n2 —1)522 +-~-+(nk —1)sf Completely Randomized Designs: Mean Squares - SSTis said to have k— 1 degrees of freedom. The mean squares for treatments is defined to be MST :SS_T k —1 ' - SSE is said to have n — k degrees of freedom and the mean squares for error is defined to be SSE MSE is a point estimate of our MSE = _ k ' common variance within n treatments 02. Completely Randomized Designs: The Meaning of Mean Squares It turns out that E(MSE) = 02. That is, MSE is an unbiased estimate of the common variance within treatments. If the null hypothesis is true and there are no differences between the population means, then MSTis also an unbiased estimate of the variance. However, if the null hypothesis is false, then E(MST) = 02 + “extra variability”. It is this “extra variability” that represents any variability among the population means. Completely Randomized Designs: i The F-Statistic MST Consider the F-statistic, F = — . MSE If the null hypothesis is true, this ratio should be close to 1. If the null hypothesis is false, this ratio should be greater than 1. How much larger than 1 must this ratio get before we can reject the null hypothesis? That is, before the result is statistically significant? Completely Randomized Designs: Interpreting the F Statistic . To answer this question, we make use of the F-distribution. (See Table 6 in the textbook.) ‘ ‘ denominator degrees of freedom numerator degrees of freedom df 2 dfl - Our p-value for this test is P(E(_1n_k >F). i The ANOVA Table Source df SS MS F Treatments k - 1 Error Total i Exercise 1: The ANOVA Table Source df SS MS F p-Value Treatments 2 13 7. 95 . .01<p-va1ue <05 Error 12 173.23 Total 14 311.18 Using 012.05 we would reject the null hypothesis and conclude that at least two of the applicators are different in terms of their mean drying times. i MINITAB: The ANOVA Table Analysis of Variance Source DF SS MS F P Factor 2 137.9 69.0 4.78 0.030 Error 12 173.2 14.4 Total 14 311.2 i Ranking Population Means . If an ANOVA procedure results in a statistically significant result, we can complete the analysis using a pairwise comparison of the treatment means. In other words, if we reject 1-10 and accept H A, we are concluding that at least two of the population/treatment means differ. We then wish to see which pairs of means are significantly different from one another. i Ranking Population Means . Since we assume a common error variance 02 for all k populations, we could construct a (1 —a)100% CI for each possible pairings using Case 2 (unknown but equal population variances) from Chapter 10. . In our paint drying example, suppose we wanted to construct a 95% CI for each possible pairing of means; that is, a CI interval for each of y, —yz, #1_ru3 I and #2 _Iu3' i Ranking Population Means . However, if we computed C15 in this way, then we would be 95% confident each individual interval would contain the mean difference of interest. - How confident would you be that all three CIs simultaneously contained the mean difference they were attempting to capture? 0.95 x 0.95 x 0.95 = 0.8574 . In other words, we would only be 85.74% confident that all three CIs simultaneously contained the difference of interest. 10 Tukey’s Simultaneous i Confidence Intervals . However, we want to be 95% confident that all three differences are simultaneously contained within the three CIs. To accomplish this, we need to employ a “Tukey's simultaneous confidence intervals”. . A family of(1—a)100% simultaneous C15 is given by _ _ MSE 1 1 ,u,—,uj e(x,—xj)iqa(k,df)‘/— l—+— 2 ml. n]. - Values of q are found in Table 11. (Use df = n—lgf) Interpreting Simultaneous i Confidence Intervals - If an interval contains the value 0, then we say there is no significant difference between those two particular means. . If an interval does not contain the value 0, then we say there is a significant difference between those two particular means. Exercise 1: Simultaneous Confidence Intervals . We obtain the following family of 95% confidence intervals: ,u1 — ,u2 E (—6.06,7.02) ,u1 — #3 E (0.48,12.75) ,u2 — [L13 6 (—0.66,12.92) . We can be 95% confident that these 3 intervals simultaneously contain #1 — ,uznul —,u3 , and ,uZ — #3. - Are any of the means significantly different? 11 Exercise 1: MINITAB Tukey 95‘: Simultaneous Confldence Intervals All Pairwise Comparisons among Levels of Applicator Individual Confidence level : 97.947- Appllcator : l subtracted from: Appllcator Lower center Upper e7,021 e0.483 6.055 el2.746 e6.6l3 e0.480 Appllcatar : 2 subtracted from: Appllcator Lower Center Upper 3 712,924 76.130 0‘664 12 ...
View Full Document

## This note was uploaded on 05/26/2010 for the course MATH Math2107 taught by Professor Lanihaque during the Winter '10 term at Carleton.

### Page1 / 12

slides 4-chapter 11part1 - 3 Chapter 11 — Part One The...

This preview shows document pages 1 - 12. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online