This preview shows pages 1–12. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 3 Chapter 11 — Part One
The Analysis of Variance (ANOVA) i Outline . Sections 11.2 — 11.5 will cover:
. What is ANOVA?  OneFactor ANOVA: Completely
Randomized Design . Tables used:
I Table 6 i What is ANOVA? . The responses obtained from an experiment will
always exhibit some amount of variability. . This variability will be a result of a few, or possibly
many, factors. How certain factors affect responses
are of particular interest to the experimenter. . ANOVA partitions the total variation in the
experimental responses into individual variability
measurements attributable to the factors of interest. Completely Randomized Designs:
i A One Factor Analysis of Variance
. We are interested in k populations with unknown
means y,,yz,...,yk. We wish to infer something
about the relationship between these k unknown
population means.
In order to accomplish this, we take k independent SRSs, one from each population, and make
inferences based on the sample data. . This is known as a completely randomized
design (CRD). It is an extension of the
independent random sample inference procedures
we learned in Chapter 10. 4 Completely Randomized Designs:
A Note On Random Sampling  There are two possible situations: 1. There is a physical population from which to
select our responses. In this case, we simply take k
independent SRSs. 2. There is no physical population, and responses
can only be obtained after the experimental
treatments have been applied. In this situation, we
use randomized assignment of subjects to the
treatments. NOTE: If the sample sizes 17,, n2, nk are all equal,
then the data set is referred to as being balanced. Completely Randomized Designs:
i Some Terminology . The factor is an independent variable whose values
are controlled by the experimenter. In a CRD there
is only one factor of interest — the population from
which our measurements are obtained. . The values taken on by the factor are the levels. In
a CRD, the levels correspond to the k populations.
A treatment in an experiment is simply a
combination of factor levels. Since we only have one factor in a CRD, we can use the terms levels
and treatments interchangeably. Completely Randomized Designs: i Examples of Applications . We may wish to study the effects of 3 different
brands of gasoline on gas mileage.  We could compare 6 different types of fertilizers to
decide which one produces the greatest average
yield.  We could compare 4 different methods of studying
to determine which method is most effective. . We could compare 18 different mutual fund
investment strategies. Completely Randomized Designs: Examples of Factors, Levels, and
i Responses  Comparing mileage of 3 different brands of gasoline:
FACTOR: gasoline brand
LEVELS: Shell, Sunoco, PetroCanada
RESPONSE: mileage
Comparing effectiveness of 4 studying methods:
FACTOR: studying method LEVELS: short frequent sessions, infrequent long
sessions, one allnighter, or don’t study at all RESPONSE: grade Completely Randomized Designs: i Model Assumptions . We make the following assumptions for a completely
randomized design: . The observations within each of the k populations
are normally distributed. . All k populations share a common variance cr2 . Completely Randomized Designs: i Model Assumptions  A completely randomized design is based upon the
modeling assumption xi], = #1, + 5W . That is, the observation xU from the i—th population
consists of the unknown population mean ﬂ,» plus a
random error (noise) term 81]. . These error terms are independently distributed as 8],], ~N(0,0'2) and so xi] ~N(,ui,0'2) Completely Randomized Designs: i The Hypotheses  We wish to perform the following hypothesis test: Howl=yz=~~=m
(all the population means are equal)
versus
HA :,u,. at ,u]., for some iand j. (at least two of the population means are not equal) i A Point Estimate of #1. . To estimate each population mean ,ui we simply
use the sample mean from the corresponding sample. That is, "i inj A j:1 [£42992 :
n. I i . NOTE: Ti represents the sum of the observations for
treatment (level) i. 12 i Exercise 1: Watching Paint Dry . A study was conducted to determine if the drying
time for a certain paint is affected by the type of
applicator used.  The data in the table on the next slide represents
the drying time (in minutes) for 3 different
applicators — brush, roller, and pad — when the paint
was applied to standard wallboard. . The factor is the applicator. . The 3 levels or treatments are brush, roller, and pad. 13 i Exercise 1: Sample Data applicator (factor) brush (i=1) roller (i=2)
39.1 31.6
39.4 33.4
31.1 30.2
33.7 41.8 30.5
34.6 T]: 208.4 T2: 137
371 = 34.73 3?, = 34.25 Completely Randomized Designs:
Sources of Variation . Even if all the treatment means are equal, we can
still expect variation in the value of the sample
means.  There are two potential sources of variation to
consider: (i) variation due to actual differences
between the various treatments, and (ii) variation
(within treatments) due to error (noise). . ANOVA answers the question, “Is the difference
between the sample means clue simply to error
(noise) in the measurements or to an actual
difference in the treatments themselves?” Completely Randomized Designs:
Total Sum of Squares . The total sum of squares (Total SS) is a measure
of the total variability in the data set. Total SS 2 2k: (xii. — if i:l 1:1 Completely Randomized Designs: i Total Sum of Squares . where
k n, 2 k 2
z (22) 1'21 CM = —[ H "'
1’1 1’1 andn=nl+n1+...+nk. Completely Randomized Designs:
Total Sum of Squares Total sum of squares Total SS Treatment sum of squares Error sum of squares SS T SSE Completely Randomized Designs: i Sum of Squares for Treatments . The sum of squares for treatments (SS1) is a
measure of the variability between the various
treatments. Completely Randomized Designs: i Sum of Squares for Error  The sum of squares for error (SSE) is a measure
of the variability within the various treatments. k n, k T2
SSE=TotaISS—SST=ZZX;_Z I i=1 _/=1 i=1 "i =(nl—1)512+(n2 —1)522 +~+(nk —1)sf Completely Randomized Designs:
Mean Squares  SSTis said to have k— 1 degrees of freedom. The
mean squares for treatments is defined to be MST :SS_T k —1 '
 SSE is said to have n — k degrees of freedom and
the mean squares for error is defined to be SSE MSE is a point estimate of our
MSE = _ k ' common variance within
n treatments 02. Completely Randomized Designs:
The Meaning of Mean Squares It turns out that E(MSE) = 02. That is, MSE is an
unbiased estimate of the common variance within
treatments. If the null hypothesis is true and there are no
differences between the population means, then
MSTis also an unbiased estimate of the variance.
However, if the null hypothesis is false, then
E(MST) = 02 + “extra variability”. It is this “extra variability” that represents any
variability among the population means. Completely Randomized Designs: i The FStatistic
MST Consider the Fstatistic, F = — . MSE
If the null hypothesis is true, this ratio should be
close to 1.
If the null hypothesis is false, this ratio should be
greater than 1.
How much larger than 1 must this ratio get before we can reject the null hypothesis? That is, before
the result is statistically significant? Completely Randomized Designs:
Interpreting the F Statistic . To answer this question, we make use of the
Fdistribution. (See Table 6 in the textbook.) ‘ ‘ denominator degrees of freedom
numerator degrees of freedom df
2 dfl
 Our pvalue for this test is P(E(_1n_k >F). i The ANOVA Table Source df SS MS F Treatments k  1 Error Total i Exercise 1: The ANOVA Table Source df SS MS F pValue Treatments 2 13 7. 95 . .01<pva1ue <05 Error 12 173.23
Total 14 311.18 Using 012.05 we would reject the null hypothesis and
conclude that at least two of the applicators are
different in terms of their mean drying times. i MINITAB: The ANOVA Table Analysis of Variance
Source DF SS MS F P Factor 2 137.9 69.0 4.78 0.030
Error 12 173.2 14.4 Total 14 311.2 i Ranking Population Means . If an ANOVA procedure results in a statistically
significant result, we can complete the analysis
using a pairwise comparison of the treatment
means. In other words, if we reject 110 and accept H A, we are concluding that at least two of the
population/treatment means differ. We then wish to
see which pairs of means are significantly different
from one another. i Ranking Population Means . Since we assume a common error variance 02 for all k populations, we could construct a (1 —a)100%
CI for each possible pairings using Case 2 (unknown
but equal population variances) from Chapter 10. . In our paint drying example, suppose we wanted to
construct a 95% CI for each possible pairing of
means; that is, a CI interval for each of y, —yz, #1_ru3 I and #2 _Iu3' i Ranking Population Means . However, if we computed C15 in this way, then we
would be 95% confident each individual interval
would contain the mean difference of interest.  How confident would you be that all three CIs
simultaneously contained the mean difference they
were attempting to capture? 0.95 x 0.95 x 0.95 = 0.8574 . In other words, we would only be 85.74% confident
that all three CIs simultaneously contained the
difference of interest. 10 Tukey’s Simultaneous i Confidence Intervals . However, we want to be 95% confident that all three
differences are simultaneously contained within the
three CIs. To accomplish this, we need to employ a
“Tukey's simultaneous confidence intervals”. . A family of(1—a)100% simultaneous C15 is given by _ _ MSE 1 1
,u,—,uj e(x,—xj)iqa(k,df)‘/— l—+—
2 ml. n].  Values of q are found in Table 11. (Use df = n—lgf) Interpreting Simultaneous i Confidence Intervals  If an interval contains the value 0, then we say there
is no significant difference between those two
particular means. . If an interval does not contain the value 0, then we
say there is a significant difference between those
two particular means. Exercise 1: Simultaneous
Confidence Intervals . We obtain the following family of 95% confidence
intervals: ,u1 — ,u2 E (—6.06,7.02)
,u1 — #3 E (0.48,12.75)
,u2 — [L13 6 (—0.66,12.92) . We can be 95% confident that these 3 intervals
simultaneously contain #1 — ,uznul —,u3 , and ,uZ — #3.  Are any of the means significantly different? 11 Exercise 1: MINITAB Tukey 95‘: Simultaneous Confldence Intervals
All Pairwise Comparisons among Levels of Applicator Individual Confidence level : 97.947 Appllcator : l subtracted from:
Appllcator Lower center Upper e7,021 e0.483 6.055
el2.746 e6.6l3 e0.480 Appllcatar : 2 subtracted from: Appllcator Lower Center Upper
3 712,924 76.130 0‘664 12 ...
View
Full
Document
This note was uploaded on 05/26/2010 for the course MATH Math2107 taught by Professor Lanihaque during the Winter '10 term at Carleton.
 Winter '10
 LaniHaque
 Linear Algebra, Algebra

Click to edit the document details