This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: AN OVA (source WWW.statsoft.com) The Purpose of Analysis of Variance
In general, the purpose of analysis of variance (ANOVA) is to test for signiﬁcant differences between means. If we are only comparing two means, then AN OVA will give the same results as the t test for independent samples (if we are comparing two different groups of cases or
observations), or the ttest for dependent samples (if we are comparing two variables in one
set of cases or observations). t—test for independent samples
Purpose, Assumptions. The ttest is the most commonly used method to evaluate the differences in means between two groups. For example, the t—test can be used to test for a
difference in test scores between a group of patients who were given a drug and a control
group who received a placebo. Theoretically, the ttest can be used even if the sample sizes
are very small (e. g., as small as 10; some researchers claim that even smaller n's are possible),
as long as the variables are normally distributed within each group and the variation of scores
in the two groups is not reliably different. As mentioned before, the normality assumption can
be evaluated by looking at the distribution of the data (Via histograms) or by performing a
normality test. The equality of variances assumption can be veriﬁed with the F test, or you
can use the more robust Levene’s test. If these conditions are not met, then you can evaluate
the differences in means between two groups using one of the nonparametric alternatives to
the t— test . The plevel reported with a t—test represents the probability of error involved in accepting our
research hypothesis about the existence of a difference. Technically speaking, this is the
probability of error associated with rejecting the hypothesis of no difference between the two
categories of observations (corresponding to the groups) in the population when, in fact, the
hypothesis is true. Some researchers suggest that if the difference is in the predicted direction,
you can consider only one half (one "tail") of the probability distribution and thus divide the
standard p1evel reported with a t—test (a "twotailed" probability) by two. Others, however,
suggest that you should always report the standard, twotailed ttest probability. Arrangement of Data. In order to perform the ttest for independent samples, one independent (grouping) variable (e.g., Gender: male/female) and at least one dependent
variable (e.g., a test score) are required. The means of the dependent variable will be compared between selected groups based on the speciﬁed values (e.g., male and female) of
the independent variable. The following data set can be analyzed with a t—test comparing the average WCC score in males and females GENDER WCC E
Ecase 1 Emale 111 E
Ecase 2 Emale E110 E
Ecase 3 Emale E109 E
Ecase 4 Efemale 1E02 E
Ecase 5 Efemale E104 {AW—MM.“ WW.%WMW§MWW.WMWWWW._.M“M. E ttest graphs. In the t—test analysis, comparisons of means and measures of variation in the
two groups can be visualized 1n box and whisker plots (for an example, see the graph below). These graphs help you to quickly evaluate and "intuitively visualize" the strength of the
relatiOn between the grouping and the dependent variable. More Complex Group Comparisons. It often happens in research practice that you need to
compare more than two groups (e. g., drug 1, drug 2, and placebo), or compare groups created
by more than one independent variable while controlling for the separate inﬂuence of each of
them (e.g., Gender, type of Drug, and size of'Dose). In these cases, you need to analyze the
data using Analysis of Variance, which can be considered to be a generalization of the t—test.
In fact, for two group comparisons, AN OVA will give results identical to a t—test (t**2 [df] =
F [ 1,6197). However, when the design is more complex, AN OVA offers numerous advantages
that t—tests cannot provide (even if you run a series of t— tests comparing various cells of the
design). , Why the name analysis of variance? It may seem odd to you that a procedure that compares
means is called analysis of variance. However, this name is derived from the fact that in order
to test for statistical signiﬁcance between means, we are actually comparing (i.e., analyzing) variances. The Partioning of Sums of Squares
At the heart of ANOVA is the fact that variances can be divided up, that is, partitioned. Remember that the variance is computed as the sum of squared deviations from the overall
mean, divided by 71] (sample size minus one). Thus, given a certain n, the variance is a
function of the sums of (deviation) squares, or SS for short. Partitioning of variance works as follows. Consider the following data set:
gGroup l EGroup 2 WW.. M. “mum...“wm WWW wwwww Em .m www.., Wm.. ......r a ..... 2
x
f
E
g’
x $0bservation 1 E2 E6 _
$0bservation 2 E3 E7 E
Observation 3 El E5 Total Sums of Squares £28 The means for the two groups are quite different (2 and 6, respectively). The sums of squares
within each group are equal to 2. Adding them together, we get 4.. If we now repeat these
computations, ignoring group membership, that is, if we compute the total SS based on the overall mean, we get the number 28. In other words, computing the variance (sums of
squares) based on the within—group variability yields a much smaller estimate of variance than
computing it based on the total variability (the overall mean). The reason for this in the above
example is of course that there is a large difference between means, and it is this difference
that accounts for the difference in the SS. In fact, if we were to perform an ANOVA on the above data we would get the following result:
:MAIN EFFECT 3" "MW...“ WW... .ss‘ gdf MS "EF 3p :
J gEffect 24. 0 E1 ’240 0 124. 0g 008
Error” 4.0 4 As you can see, in the above table the total SS (28) was partitioned into the SS due to within
group variability (2+2=4) and variability due to differences between means (28(2 +2) =24).
SS Error and SS Effect. The withingroup variability (SS) is usually referred to as Error
variance. This term denotes the fact that we cannot readily explain or account for it in the
current design. However, the SS Eﬁect we can explain. Namely, it is due to the d1fferences 1n
means between the groups. Put another way, group memberShip explains this variability
because we know that it is due to the differences 1n means. Signiﬁcance testing.
In AN OVA as in very many statistical test this is done by means of ratios of explained to unexplained variability. In ANOVA, we base this test on a comparison of the variance due to
the between— groups variability (called Mean Square Eﬁect, or MSeﬁect) with the within group
variability (called Mean Square Error, or Msmar; this term was ﬁrst used by Edgeworth,
1885). Under the null hypothesis (that there are no mean differences between groups in the
population), we would still expect some minor random ﬂuctuation in the means for the two
groups when taking small samples (as in our example). Therefore, under the null hypothesis,
the variance estimated based on Withingroup variability should be about the same as the
variance due to betweengroups variability. We can compare those two estimates of variance
via the F test which tests whether the ratio of the two variance estimates is signiﬁcantly
greater than 1. In our example above, that test is highly signiﬁcant, and we would in fact
conclude that the means for the two groups are signiﬁcantly different from each other. Summary of the basic logic of ANOVA. To summarize the discussion up to this point, the
purpose of analysis of variance is to test differences in means (for groups or variables) for
statistical signiﬁcance. This is accomplished by analyzing the variance, that is, by partitioning
the total variance into the component that is due to true random error (i.e., within group SS)
and the components that are due to differences between means. These latter variance components are then tested for statistical signiﬁcance, and, if signiﬁcant, we reject the null
hypothesis of no differences between means, and accept the alternative hypothesis that the
means (in the population) are different from each other. Dependent and independent variables. The variables that are measured (e.g., a test score)
are called dependent variables. The variables that are manipulated or controlled (e.g., a
teaching method or some other criterion used to divide observations into groups that are compared) are called factors or independent variables. MultiFactor ANOVA
In the simple example above, it may have occurred to you that we could have simply
computed a t test for independent samples to arrive at the same conclusion. And, indeed, we
would get the identical result if we were to compare the two groups using this test. However,
ANOVA is a much more ﬂexible and powerful technique that can be applied to much more
complex research issues. Multiple factors. The world is complex and multivariate in nature, and instances when a
single variable completely explains a phenomenon are rare. For example, when trying to
explore how to grow a bigger tomato, we would need to consider factors that have to do with
the plants' genetic makeup, soil conditions, lighting, temperature, etc. Thus, in a typical
experiment, many factors are taken into account. One important reason for using AN OVA
methods rather than multiple twogroup studies analyzed Via t tests is that the former method
is more eﬂicient, and with fewer observations we can gain more information. Let us expand
on this statement. Controlling for factors. Suppose that in the above twogroup example we introduce another
grouping factor, for example, Gender. Imagine that in each group we have .3 males and 3
females We could summarize this de51gn 1n a 2 by 2 table: Experimental iExperlmentalf i i
g
2 Group l iGroup 2 g7
gs
iMean . i6 .. Females *8 Before performing any computations, it appears that we can partition the total variance into at
least 3 sources: (1) error (withingroup) variability, (2) variability due to experimental group
membership, and (3) variability due to gender. (Note that there is an additional source —
interaction  that we will discuss shortly.) What would have happened had we not included
gender as a factor in the study but rather computed a simple t test? If you compute the SS
ignoring the gender factor (use the withingroup means ignoring or collapsing across gender;
the result is SS=10+10=20), you Will see that the resulting withingroup SS is larger than it is
when we include gender (use the within— group, withingender means to compute those SS;
they will be equal to 2 in each group, thus the combined SSwithin is equal to 2+2+2+2=8).
This difference is due to the fact that the means for males are systematically lower than those
for females, and this difference in means adds variability if we ignore this factor. Controlling
for error variance increases the sensitivity (power) of a test. This example demonstrates
another principal of AN OVA that makes it preferable over simple two—group t test studies: In
ANOVA we can test each factor while controlling for all others; this is actually the reason
why AN OVA is more statistically powerful (i.e., we need fewer observations to ﬁnd a
signiﬁcant effect) than the simple t test. Interaction Effects
There is another advantage of ANOVA over simple t—tests: ANOVA allows us to detect interaction effects between variables, and, therefore, to test more complex hypotheses about
reality. Let us consider another example to illustrate this point. (The term interaction was ﬁrst
used by Fisher, 1926.) Main effects, twoway interaction. Imagine that we have a sample of highly achievement
oriented students and another of achievement "avoiders." We now create two random halves in each sample, and give one half of each sample a challenging test, the other an easy test. We
measure how hard the students work on the test. The means of this (ﬁctitious) study are as follows:
E EAchievement Achievement .
Eoriented Eavoiders E EChallenging Testi lE0 E5 E
EEasy Test E5 E How can we summarize these results? Is it appropriate to conclude that (l) challenging tests
make students work harder, (2) achievementoriented students work harder than achievement
avoiders? None of these statements captures the essence of this clearly systematic pattern of
means. The appropriate way to summarize the result would be to say that challenging tests
make only achievement—oriented students work harder, while easy tests make only
achievement— avoiders work harder. In other words, the type of achievement orientation and
test difﬁculty interact in their effect on effort; speciﬁcally, this is an example of a twoway
interaction between achievement orientation and test difﬁculty. Note that statements 1 and 2 above describe socalled main eﬁ’ects. Higher order interactions. While the previous two—way interaction can be put into words
relatively easily, higher order interactions are increasingly difﬁcult to verbalize. Imagine that
we had included factor Gender in the achievement study above, and we had obtained the following pattern of means: Females Achievement. * Eoriented ,
Challenging Test 10 E
EEasy Test s5 3
EMales E E Achievemeht AchievementI
Eoriented avorders
Challenglng TestE El 6 E
Easy Test E6 E1 E How could we now summarize the results of our study? Graphs of means for all effects
greatly facilitate the interp‘fetation of complex effects. The pattern shown in the table above
(and in the graph below) represents a threeway interaction between factors. Thus we may summarize this pattern by saying that for females there is a twoway interaction
between achievementorientation type and test difﬁculty: Achievementoriented females work
harder on challenging tests than on easy tests, achievement—avoiding females work harder on
easy tests than on difﬁcult tests. For males, this interaction is reversed. As you can see, the
description of the interaction has become much more involved. A general way to express interactions. A general way to express all interactions is to say
that an effect is modiﬁed (qualiﬁed) by another effect. Let us try this with the two—way
interaction above. The main effect for test difﬁculty is modiﬁed by achievement orientation.
For the three—way interaction in the previous paragraph, we may summarize that the twoway
interaction between test difﬁculty and achievement orientation is modiﬁed (qualiﬁed) by
gender. If we have a fourway interaction, we may say that the threeway interaction is
modiﬁed by the fourth variable, that is, that there are different types of interactions in the
different levels of the fourth variable. As it turns out, in many areas of research ﬁve— or higher way interactions are not that uncommon. Contrast Analysis
Brieﬂy, contrast analysis allows us to test the statistical signiﬁcance of predicted speciﬁc differences in particular parts of our complex design. It is a major and indispensable
component of the analysis of every complex AN OVA design Post hoc Comparisons
Sometimes we ﬁnd effects in our experiment that were not expected. Even though in most cases a creative experimenter will be able to explain almost any pattern of means, it would not
be appropriate to analyze and evaluate that pattern as if one had predicted it all along. The
problem here is one of capitalizing on chance when performing multiple tests post hoc, that is,
without a priori hypotheses. To illustrate this point, let us consider the following
"experiment." Imagine we were to write down a number between 1 and 10 on 100 pieces of
paper. We then put all of those pieces into a hat and draw 20 samples (of pieces of paper) of 5
observations each, and compute the means (from the numbers written on the pieces of paper)
for each group. How likely do you think it is that we will ﬁnd two sample means that are
signiﬁcantly different from each other? It is very likely! Selecting the extreme means obtained
from 20 samples is very different from taking only 2 samples from the hat in the ﬁrst place,
which is what the test via the contrast analysis implies. Without going into further detail, there
are several socalled post hoc tests that are explicitly based on the ﬁrst scenario (taking the
extremes from 20 samples), that is, they are based on the assumption that we have chosen for
our comparison the most extreme (different) means out of k total means in the design. Those tests apply "corrections" that are designed to offset the advantage of post hoc selection of the
most extreme comparisons. Assumptions.
It is assumed that the dependent variable is measured on at least an interval scale level. Moreover, the dependent variable should be normally distributed within groups. Effects of violations. Overall, the F test is remarkably robust to deviations from normality
(see Lindman, 1974, for a summary). If the n per cell is fairly large, then deviations from
normality do not matter much at all because of the central limit theorem, according to which
the sampling distribution of the mean approximates the normal distribution, regardless of the
distribution of the variable in the population. A detailed discussion of the robustness of the F
statistic can be found in Box and Anderson (1955), or Lindman (1974). Homogeneity of Variances
Assumptions. It is assumed that the variances in the different groups of the design are identical; this assumption is called the homogeneity of variances assumption. Remember that
at the beginning of this section we computed the error variance (SS error) by adding up the
sums of squares within each group. If the variances in the two groups are different from each
other, then adding the two together is not appropriate, and will not yield an estimate of the common withingroup variance (since no common variance exists).
Effects of violations. Lindman (1974, p. 33) shows that the F statistic is quite robust against vio1ations of this assumption (heterogeneity of variances; see also Box, 1954a, 1954b; Hsu,
193 8). Special case: correlated means and variances. However, one instance when the F statistic is
very misleading is when the means are correlated with variances across cells of the design. A
scatterplot of variances or standard deviations against the means will detect such correlations.
The reason why this is a "dangerous" violation is the following: Imagine that you have 8 cells
in the design, 7 with about equal means but one with a much higher mean. The F statistic may
suggest to you a statistically signiﬁcant effect. However, suppose that there also is a much
larger variance in the cell with the highest mean, that is, the means and the variances are
correlated across cells (the higher the mean the larger the variance). In that case, the high
mean in the one cell is actually quite unreliable, as is indicated by the large variance.
However, because the overall F statistic is based on a pooled within—cell variance estimate,
the high mean is identiﬁed as signiﬁcantly different from the others, when in fact it is not at
all signiﬁcantly different if one based the test on the within—cell variance in that cell alone.
This pattern  a high mean and a large variance in one cell  frequently occurs when there
are outliers present in the data. One or two extreme cases in a cell with only 10 cases can
greatly bias the mean, and will dramatically increase the variance. ...
View
Full Document
 Spring '10
 MarnikVuylsteke

Click to edit the document details