STAT E-50 - Introduction to Statistics One-Way Analysis of Variance To compare two or more groups: •To test two proportions, use a z-test •To test more than two proportions, use a χ2test •To test two means, use a t-test •To test more than two means, use Analysis of Variance (ANOVA)In Analysis of Variance, we are testing for the equality of the means of several levels of a variable. The technique is to compare the variation betweenthe levels and the variation withineach level. (The levels of the variable are also referred to as groups, or treatments.) If the variation due to the level (variation betweenlevels) is significantly larger than the variation withineach level, then we can conclude that the means of the levels are not all the same. We will test the hypothesis H0: μ1= μ2= … = μkvs. Ha: the means are not all equal using the ratio: differences among meansvariability within the groupsWhen the numerator is large compared to the denominator, we will reject the null hypothesis. •The numerator measures the variation betweengroups; this is called the Treatment Mean Square, or MST•The denominator measures the variation withingroups; this is called the ErrorMean Square, or MSE•The test statistic is TEMSF =MS; reject H0when F is large. •MSThas k - 1 degrees of freedom, where k = the number of groups •MSEhas k(n - 1) degrees of freedom, where n = the number of observations from each group
This preview has intentionally blurred sections.
Sign up to view the full version.
Assumptions and ConditionsFirst plot the data in side-by-side boxplots. Look for outliers, similar spreads, similar centers. Independence AssumptionThe groups must be independent of each other. The data within each group must be independent data drawn at random from a homogeneous population, or generated by a randomized comparative experiment. Randomization Condition: Was the data collected randomly? Equal Variance AssumptionWe need the variances of the treatment groups to be equal, so we can find a pooled variance Similar Variance Condition: •Compare the spreads of the groups in the side-by-side boxplots •Are the spreads changing systematically with the centers? •Plot the residuals vs. the predicted values to be sure you don’t have larger residuals for larger values Normal Population AssumptionNearly Normal Condition: Compare the boxplots for skewness Pool the residuals and create a histogram or NPP The ANOVA table: Source df SS MS F p Factor Error Total TTTSSMS=dfEEESSMS=dfTEMSF =MSPage 2