This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Lessons in Business Statistics
Prepared By
P.K. Viswanathan Chapter 9: ChiSquare Test and
Analysis of Variance(ANOVA) Introduction
In the previous chapter, we have made inferences about
difference between two population means based on the
corresponding sample means. Suppose we are interested
in testing the equality of means involving more than two
populations, we have an elegant technique known as
ANOVA developed by Ronald Fisher, the father of
statistics in the year 1920. The specialty of ANOVA is
that it is part of the domain called "Experimental Design"
which deals with causeeffect relationship in an effective
manner. Causeeffect relationship is also reflected in
association of attributes. Association of attributes is
effectively answered by the chisquare test. This chapter
covers the basic models of chisquare test and ANOVA. 1) ChiSquare AnalysisBasics
ChiSquare analysis is widely used in research studies
for testing hypothesis involving nominal data.
Nominal data are also known by two names2
categorical data and attribute data. The symbol statistics is used to designate the chisquare
distribution whose value depends on the number of
degrees of freedom (d.f.). A chisquare distribution is
a skewed distribution particularly with smaller d.f. As
2
the sample size and therefore the d.f. increases, the distributions becomes a symmetrical distribution
2
approaching normality. The general shape of the distributions for smaller d.f. is given in the next slide 1) ChiSquare AnalysisBasicsPicture 1) ChiSquare AnalysisBasics
2 tests is a nonparametric test. Nonparametric
means no assumption needs to be made about the form
of the original probability distribution from which the
samples are drawn. It is a classic nonparametric test involving data
measurement in nominal scale. Please note that all parametric tests make the
assumption that the samples are drawn from a
specified or assumed population. Thus, nonparametric
methods are also called “distribution free” methods. The 1) ChiSquare AnalysisBasics
Conditions for Using ChiSquare Test The sample observations drawn from a population
must be independent and random The data must be in frequency (counting) form. If
the original data are in percentages, they must be
converted into frequency. No frequency in any cell/category must be less than
5. If the frequency is less than 5 for a category, you
have to do some regrouping 2) ChiSquare TestGoodness of Fit 2
Nominal Data: Test Goodness of Fit : This test is used to examine whether a set of
observed frequencies comes from a universe
that has a particular distribution (e.g. normal
distribution). This can also be used to know
whether some observed pattern of frequencies
fit well with an expected pattern of
frequencies. 2) ChiSquare TestGoodness of Fit
Test Statistic
2
(O E)
2 χ E Where O
E = Observed Frequency
= Expected Frequency 2) ChiSquare TestGoodness of Fit Example:
Assume that a marketer wishes to compare five different package
designs. He is interested in knowing which is the most preferred
one so that the same can be introduced in the market. A random
sample of 200 consumers gives the following picture:
Package Design
A
B
C
D
E
Total Preference by Consumers
36
52
40
35
37
200 Are the consumer preferences for the designs show any significant
differences? 2) ChiSquare TestGoodness of Fit
Solution:
Null Hypothesis: All package designs are equally preferred.
Alternative Hypothesis: No, they are not equally preferred
Package
Design
A
B
C
D
E
Total Observed(O)
36
52
40
35
37
200 2
(O E)
χ E 2 Expected(E)
40
40
40
40
40
200 2
(O E) E (O 2
E) 16
144
0
25
9 0.400
3.600
0.000
0.625
0.225
4.850 4.850
= The critical χ for 4 d.f at 5% level of significance is 9.49. Since the calculated
value of is less than critical at 5% level, accept the null hypothesis of equal
preference. The conclusion is that all packages are equally preferred and difference
in preference in the sample survey may have arisen due to chance.
2 3) ChiSquare TestCross Tab The goodnessoffit test is suitable for situations
involving one categorical variable (e.g. package
design). If there are two categorical variables, and our
interest is to find out whether these two variables are
associated with each other, the test of independence is
the appropriate technique to use. This test is very
popular for analyzing crosstabulations in which an
investigator is keen to find out whether the two
categorical variables are having any relationship with
each other. 3) ChiSquare TestCross Tab
Example:
In a market survey conducted to examine whether the choice of a brand is
related to the income strata of the consumers, a random sample of 600
consumers reveal the following:
Income Strata
(Income Per month) Brand1 Brand2 Brand3 Total Less thanRs.10000
Rs1000015000
Rs1500020000
Above Rs 20000 132
62
30
16 128
60
30
22 50
28
26
16 310
150
86
54 Total 240 240 120 600 The manger who conducted this survey wants to know whether the brand
preference is associated with the income strata. 3) ChiSquare TestCross Tab
Solution:
The null hypothesis is that there is no association between the
brand preference and the income level (These two are
independent). The alternative hypothesis is that the brand and
income level are associated (dependent).
Let us take a level of significance of 5%.
In order to calculate the value, you need to work out the expected
frequency in each cell in the contingency table. In our example,
there are 4 rows and 3 columns amounting to 12 elements. There
will be 12 expected frequencies. 3) ChiSquare TestCross Tab Observed Frequencies
Brand1 Brand2 Expected Frequencies Brand3 Brand2 Brand3 124 124 62 60 60 30 34.4 34.4 17.2 21.6 21.6 10.8 Income Strata Income Strata 132 128 50
Less than 10000 Less than 10000 62 60 28
10000 to 15000 10000 to 15000 30 30 26
15000 to 20000 15000 to 20000 16
Above 20000 Brand1 22 16
Above 20000 3) ChiSquare TestCross Tab
Compute 2
(O E)
χ E. 2 There are 12 observed frequencies (O) and 12 expected frequencies (E).
As in the case of the goodness of fit, calculate this value. In our case, the
computed 2
(O E)
χ E 2 =12.76. 2
χ value at 5% level for 6 d.f =12.59.
The upper The null hypothesis is rejected. The conclusion is that the brand
preference and income level are associated. 4) ANOVA Basics This technique is part of the domain
called “Experimental Designs”. This
helps in establishing in a precise
fashion the Cause  Effect relation
amongst variables. 4) ANOVA Basics
The beauty of ANOVA is that it performs the test
of equality of more than two population means by
actually analyzing the variance. In simple terms,
ANOVA decomposes the total variation into
components of variation. That is, explaining the
changes in the response variable caused by these
components. To put it succinctly, the total sum of
squares is equal to the sum of squares due to causes. 5) ANOVAOne Way Classification
A supermarket is interested in knowing
whether it should go for a quarterpage,
halfpage, or a fullpage advertisement for a
Product. In order to choose the size
of the advertisement that will bring in the most
store traffic, the supermarket can use ANOVA
technique. Here, you are trying to establish a
causeeffect relationship between store traffic
and the various sizes of advertisement. 5) ANOVAOne Way Classification
How OneWay Classification Works in Practice?
You are going to first decompose the total sum of
squares into some of squares due to causes. Here you
are assuming that the Total Sum of Squares =
Treatment Sum of Squares + Error Sum of Squares.
The word treatment is generic and as such may denote
different methods, machines, different advertisement
copy platforms, different strategies, different brands
and the like. The variation in sum of squares of the
response variable (dependent variable) is caused only
by treatment and any thing unexplained by the
treatment is attributed to error term. 5) ANOVAOne Way Classification
Example:
A consumer marketing group desired to examine whether
supermarket chains operating in a city differed in their “out of
stock” levels for advertised specials. The group identified the
relevant response variable as the percentage of the items
advertised not in stock. The following table provides the data
collected from three supermarket chains in the city.
Chain1
Chain2
Chain3
15
10
17
14
14
12
20
9
14
15
10
15
16
11
12 5) ANOVAOne Way Classification
Example Continues
The marketing group would like to know whether there are
significant differences among the three chains with regard to
mean percentage out of stock on advertised specials. How
would you analyze this situation? 5) ANOVAOne Way Classification
Solution:
Using Microsoft Excel or Formula Method, the following
ANOVA table is obtained.
Source of Variation SS df MS Treatment (Between Groups) 68.8 2 34.40 Error(Within Groups) 54.8 12 4.57 123.6 14 Total F computed 7.53 F critical 3.89 5) ANOVAOne Way Classification
Solution continues
Formulation of the Null and Alternative hypothesis
H0: The population means of percentage stock out position for all the
three chains are equal H1: The population means of percentage stock out position for all the
three chains are not equal Decision Rule: If the computed F is greater than the critical F, reject the null
hypothesis H0 and accept the alternative H1.
At 5% level from the ANOVA output of Excel, we have the computed F = 7.53
and the critical F(2,12) =3.89. So, reject the null hypothesis and accept the
alternative. The inference is that the population means of percentage stock out
are not the same for all the three chains. So, what do you do? Now, look at the
point estimates from the summary table. Chain 1 has a mean stock out of 16%,
chain 2 has a mean stock out of 10.8% and chain 3 has a mean stock out of 14%.
Chain 2 has the least stock out percentage followed by chain 3 and then chain 1. 5) ANOVAOne Way Classification
Assumptions involved in using ANOVA The samples drawn from different
independent and random. In our case
independently and randomly drawn
supermarket chains. populations are
the samples are
from the three The response variables of all the populations are
normally distributed. In our example, the response
variable namely the percentage stock out is normally
distributed. The variances of all the populations are equal. In our
example, the variances of the three chains are equal. 6) ANOVATwo Way Classification
Example:
A supermarket that has a chain of stores is concerned
about its service quality reputation perceived by its
customers. The Table below shows the perceived
service quality with regard to politeness of the staff.
The number in each cell of the table is the percentage of
people who have said that the staff is polite. Perform
the twoway ANOVA and draw your inferences about
the population means of politeness corresponding to the
days as well as the stores. 6) ANOVATwo Way Classification
Day
Monday
Tuesday
Wednesday
Thursday
Friday Store A B C D E 79
78
81
80
70 81
86
87
83
74 74
89
84
81
77 77
97
94
88
89 66
86
82
83
68 6) ANOVATwo Way Classification
Sourceof Variation
Rows
Columns
Error
Total SS
617.36
461.76
282.64
1361.76 df
4
4
16
24 MS
F
Pvalue Fcrit
154.34 8.737051 0.000614 3.006917
115.44 6.534956 0.002575 3.006917
17.665 6) ANOVATwo Way Classification
Interpretation of the results:
Rows are the days and columns are the stores. The F
value computed in both cases is greater than the
critical F. So reject the null hypothesis of equality of
means in both the cases. The conclusion is that the
stores (columns) as well as the days (rows) reveal
different patterns in politeness level. The highest
politeness level is witnessed on Tuesday and Store D
extends the maximum politeness level. ...
View
Full
Document
This note was uploaded on 02/24/2012 for the course BUSINESS 281 taught by Professor Gray during the Spring '12 term at Florida State College.
 Spring '12
 gray
 Business

Click to edit the document details