Unformatted text preview: BASIC BUSINESS STATISTICS BASIC BUSINESS STATISTICS Eleventh Edition Berenson Levine Krehbiel Slides Modified by Eugene Li s s s s An Introduction to Analysis of Variance (11.1) OneWay Analysis of Variance: Testing for the Equality of More than one Population Means (11.1) Multiple Comparison Procedures (11.1) Levene Test of Homogenity of Variance (11.1) Topic 4 (Part 1) Topic 4 (Part 1) Analysis of Variance – Data Analysis Method (Ch. 11) Slide2 An Introduction to Analysis of Variance An Introduction to Analysis of Variance
s s Analysis of Variance (ANOVA) can be used to test for the equality of three or more population means using interval/ratio scale data obtained from observational or experimental studies. We want to use the sample results to test the following hypotheses. H0: µ 1 = µ 2 = µ 3 =
. . . = µ . . . c Ha: Not all population means are equal
s s If H0 is rejected, we cannot conclude that all population means are different. Rejecting H0 means that at least two population means have different values.
Slide3 Requirements for Analysis of Variance Requirements for Analysis of Variance
s s s s For each population, the response variable is normally distributed. The variance of the response variable, denoted σ 2, is the same for all of the populations. (Note: Make sure to state the above two in the context of the problem!) The observations must be independent. Slide4 Analysis of Variance: Analysis of Variance: Testing for the Equality of c Population Means
s s s s s AmongGroup Estimate of Population Variance (AmongGroup Variation) WithinGroup Estimate of Population Variance (WithinGroup Variation) Comparing the Variance Estimates: The F Test The ANOVA Table (Note: Some other text books may use Between Samples/Treatments and WithinSamples/Treatments instead of AmongGroup and WithinGroup, respectively) Slide5 AmongGroup Estimate AmongGroup Estimate of Population Variance
s The amonggroup estimate of σ among, MSA.
c 2 2 is called the mean square MSA = n j ( x j − x) 2 ∑
j =1 c −1 s s The numerator of MSA is called the sum of squares among groups (SSA). The denominator of MSA represents the degrees of freedom associated with SSA.
Slide6 WithinGroup Estimate WithinGroup Estimate of Population Variance
s The estimate of σ 2 based on the variation of the sample observations within each group is called the mean square within, MSW (we can call this as mean square error, MSE).
MSW = (n j −1) s 2 ∑ j
j= 1 c n −c = ( xij − x j ) 2 ∑∑
j = i= 1 1 c nj n −c s s The numerator of MSW is called the sum of squares within groups (SSW). The denominator of MSW represents the degrees of freedom associated with SSW.
Slide7 Comparing the Variance Estimates: The F Test Comparing the Variance Estimates: The s s s If the null hypothesis is true and the ANOVA assumptions are valid, the sampling distribution of MSA/MSW is an F distribution with MSA d.f. equal to c 1 and MSW d.f. equal to n c. If the means of the c populations are not equal, the value of MSA/MSW will be inflated because MSA overestimates σ 2. Hence, we will reject H0 if the resulting value of MSA/MSW appears to be too large to have been selected at random from the appropriate F distribution. Slide8 Test for the Equality of c Population Means Test for the Equality of s Hypotheses H0: µ 1 = µ 2 = µ 3 =
. . . = µ . . . c Ha: Not all population means are equal
s Test Statistic F = MSA/MSW s Rejection Rule Reject H0 if F > Fα where the value of Fα is based on an F distribution with c where the value of is based on an 1 numerator degrees of freedom and n c denominator degrees of freedom.
Slide9 Sampling Distribution of MSA/MSW Sampling Distribution of MSA/MSW
s The figure below shows the rejection region associated with a level of significance equal to α where Fα denotes the critical value. Do Not Reject H0 Reject H0 F=MSA/MSW Fα Critical Value Slide10 The ANOVA Table The ANOVA Table
Source of Sum of Degrees of Mean Variation Squares Freedom Squares F Among SSA c 1 MSA MSA/MSW Within SSW n c MSW Total SST n 1 SST divided by its degrees of freedom n 1 is simply the overall sample variance that would be obtained if we treated the entire n observations as one data set. SST = ∑ ∑ ( xij − x) 2 = SSA + SSW
j =1 i =1 c nj Note: NOT necessary to find mean square total (MST)!!!
Slide11 Example: Reed Manufacturing Example: Reed Manufacturing
J. R. Reed would like to know if the mean number of hours worked per week is the same for the department managers at her three manufacturing plants (Surrey, Port Coquitlam, and North Vancouver). A simple random sample of 5 managers from each of the three plants was taken and the number of hours worked by each manager for the previous week is shown on the next slide. Slide12 Example: Reed Manufacturing
Plant 1 Plant 2 Plant 1 Surrey Port Coq. 48 73 54 63 57 66 54 64 62 74 68 26.5 Plant 3 N. Van. 51 63 61 54 56 57 24.5
Slide13 Observation 1 2 3 4 5 Sample Mean 55 Sample Variance 26.0 Example: Reed Manufacturing
s Analysis of Variance • Hypotheses H0: µ 1 = µ 2 =µ 3 Ha: Not all the means are equal where: µ 1 = mean number of hours worked per 1 week by the managers at Plant 1 µ 2 = mean number of hours worked per 2 week by the managers at Plant 2 µ 3 = mean number of hours worked per 3 week by the managers at Plant 3 Slide14 Example: Reed Manufacturing
Analysis of Variance • Mean Square Among Since the sample sizes are all equal = x = (55 + 68 + 57)/3 = 60 SSA = 5(55 60)2 + 5(68 60)2 + 5(57 60)2 = 490 MSA = 490/(3 1) = 245 • Mean Square Within SSW = 4(26.0) + 4(26.5) + 4(24.5) = 308 MSW = 308/(15 3) = 25.667 Question: Can we get the SST and SSA, then obtain SSW by subtracting SSA from SST?
s Slide15 Example: Reed Manufacturing
s Analysis of Variance • F Test If H0 is true, the ratio MSA/MSW should be near 1 since both MSA and MSW are estimating σ
2 . If Ha is true, the ratio should be significantly larger than 1 since MSA tends to overestimate σ 2. • Rejection Rule Assuming α = .05, F.05 = 3.89 (2 d.f. numerator, 12 d.f. denominator). Reject H0 if F > 3.89 • Test Statistic F = MSA/MSW = 245/25.667 = 9.55
Slide16 Example: Reed Manufacturing
s Analysis of Variance • Conclusion Since F = 9.55 > F.05 = 3.89, we reject H0. The mean number of hours worked per week by department managers is not the same at each plant. • ANOVA Table Source of Sum of Degrees of Mean Variation Squares Freedom Squares F Among 490 2 245 9.55 Within 308 12 25.667 Total 798 14 Slide17 Don’t forget to check the requirements Don’t forget to check the requirements
s s s We need to check the normality using histogram or normal probability plot. If normality is violated, we can use Kruskal Wallis test instead. This KruskalWallis test can be considered as the nonparametric ANOVA. We don’t cover in this course! One good thing about ANOVA is that if we have the same sample sizes, even though small departure of normality requirement, we can still use ANOVA because ANOVA is “robust”. (This isn’t stated clearly enough in the textbook) Also, if we have equal sample sizes, even if the population variances are unequal, the F distribution isn’t seriously affected. But if not equal sample sizes, too bad, we can’t do anything now because the techniques to “correct” them won’t be discussed in this course. (Note: Take more statistics courses!)
Slide 18 Multiple Comparison Procedures Multiple Comparison Procedures
Suppose that analysis of variance has provided statistical evidence to reject the null hypothesis of equal population means. Fisher’s least significance difference (LSD) procedure can be used to determine where the differences occur and may need Bonferroni Adjustment to Type I error. Sometimes we can use TukeyKramer Multiple Comparison totally instead, and this is what we are going to have a look. Slide19 Multiple Comparisons
s s Two means are considered different if the difference between the corresponding sample means is larger than a critical number. Then, the larger sample mean is believed to be associated with a larger population mean. Conditions common to all the methods here: • The ANOVA model is the current one (here it is one way analysis of variance) • The conditions required to perform the ANOVA are satisfied. • The experiment is fixedeffect (I’ll explain what fixed– effect means later) Slide20 Multiple Comparisons : TukeyKramer Procedure
s The test procedure: • Find a critical range (CR) as follows: CR = Qα MSW 2 α = significance level Qα = the upper-tail critical value of studentized range distribution with degrees of freedom c and n-c (Table E. 10) The two nj ‘s are the two sample sizes MSW = mean square within Note: If the sample sizes are the same for all groups, then one CR will do the whole job.
Slide21 1 1 + n n j' j TukeyKramer Multiple Comparisons
s Select a pair of means. Calculate the absolute difference between two sample means. x j −x j ' s If there is enough evidence to conclude that x j − x j ' > CR there is a difference between the two population means . s Repeat this procedure for each pair of samples. Slide22 TukeyKramer Multiple Comparisons
s s Still the Reed Manufacturing Example We had three populations (three plants). c = 3, Sample sizes were equal. n1 = n2 = n3 = 5 (this means one CR is good enough), nc = 153 = 12, MSW = 25.667.
MSW 2 1 1 = 3.77 25.667 1 + 1 = 8.5417 + n 2 5 5 j n j' CR = Qα Population Mean Plant 1 Plant 2 Plant 3 55 68 57 x j − x j'
Slide23 Plant 1 vs. Plant 2: |55 – 68| = 13 Plant 1 vs. Plant 3: |55 – 57| = 2 Plant 2 vs. Plant 3: |68 – 57| = 11 Levene Test of Homogenity (Equality) Levene Test of Homogenity (Equality) of Variance
s s ANOVA (F test) is relatively robust to departure of equal variance; however, if serious difference, it will still affect the accuracy of the test. We should still need to check this requirement by using Levene test for testing:
2 H0: σ 12 = σ 2 = ..... = σ c2 Ha: Not all the variances are equal Slide24 Levene Test
s The test procedures: • Find the median in each group • Find the absolute difference of each value from the group median • Use these absolute differences to perform a oneway ANOVA • If F>Fα, conclude there is at least one variance is different. Slide25 Example: The Reed Manufacturing
s s s Median of Plant 1: 54 Median of Plant 2: 66 Median of Plant 3: 56 Therefore, the absolute differences are: Plant 1 |4954|=6 |5454|=0 |7454|=3 |5454|=0 |6254|=8 Plant 2 Plant 3 |7366|=7 |5156|=5 |6366|=3 |6356|=7 |6666|=0 |6156|=5 |6466|=2 |5456|=2 |7460|=14 |5656|=0
Slide26 Example: Reed Manufacturing
Analysis of Variance on the absolute differences ANOVA Table Source of Sum of Degrees of Mean Variation Squares Freedom Squares F Among 8.93333 2 4.46667 0.26 Within 204.8 12 17.6667 Total 213.733 14
s Conclusion: Since F = 0.26 < F0.05 = 3.89, we fail to reject H0. Since That means we don’t have enough evidence to conclude at least one variance is different. Therefore, we can assume they are equal. we
Slide27 Levene Test
s s Note: If using SGC, the result is 0.04375 with pvalue = 0.957345. The test statistic result is a hugh difference than the one we obtain. Don’t’ worry about the test statistic value because from the pvalue, we still have the same conclusion. Slide28 The End of Topic 4 (Part 1) The End of Topic 4 (Part 1) Slide29 ...
View Full Document