Bstat9 - Lessons in Business Statistics Prepared By P.K....

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lessons in Business Statistics Prepared By P.K. Viswanathan Chapter 9: Chi-Square Test and Analysis of Variance(ANOVA) Introduction In the previous chapter, we have made inferences about difference between two population means based on the corresponding sample means. Suppose we are interested in testing the equality of means involving more than two populations, we have an elegant technique known as ANOVA developed by Ronald Fisher, the father of statistics in the year 1920. The specialty of ANOVA is that it is part of the domain called "Experimental Design" which deals with cause-effect relationship in an effective manner. Cause-effect relationship is also reflected in association of attributes. Association of attributes is effectively answered by the chi-square test. This chapter covers the basic models of chi-square test and ANOVA. 1) Chi-Square Analysis-Basics Chi-Square analysis is widely used in research studies for testing hypothesis involving nominal data. Nominal data are also known by two names2 categorical data and attribute data. The symbol statistics is used to designate the chi-square distribution whose value depends on the number of degrees of freedom (d.f.). A chi-square distribution is a skewed distribution particularly with smaller d.f. As 2 the sample size and therefore the d.f. increases, the distributions becomes a symmetrical distribution 2 approaching normality. The general shape of the distributions for smaller d.f. is given in the next slide 1) Chi-Square Analysis-Basics-Picture 1) Chi-Square Analysis-Basics 2 tests is a nonparametric test. Nonparametric means no assumption needs to be made about the form of the original probability distribution from which the samples are drawn. It is a classic nonparametric test involving data measurement in nominal scale. Please note that all parametric tests make the assumption that the samples are drawn from a specified or assumed population. Thus, nonparametric methods are also called “distribution free” methods. The 1) Chi-Square Analysis-Basics Conditions for Using Chi-Square Test The sample observations drawn from a population must be independent and random The data must be in frequency (counting) form. If the original data are in percentages, they must be converted into frequency. No frequency in any cell/category must be less than 5. If the frequency is less than 5 for a category, you have to do some regrouping 2) Chi-Square Test-Goodness of Fit 2 Nominal Data: Test Goodness of Fit : This test is used to examine whether a set of observed frequencies comes from a universe that has a particular distribution (e.g. normal distribution). This can also be used to know whether some observed pattern of frequencies fit well with an expected pattern of frequencies. 2) Chi-Square Test-Goodness of Fit Test Statistic 2 (O E) 2 χ E Where O E = Observed Frequency = Expected Frequency 2) Chi-Square Test-Goodness of Fit Example: Assume that a marketer wishes to compare five different package designs. He is interested in knowing which is the most preferred one so that the same can be introduced in the market. A random sample of 200 consumers gives the following picture: Package Design A B C D E Total Preference by Consumers 36 52 40 35 37 200 Are the consumer preferences for the designs show any significant differences? 2) Chi-Square Test-Goodness of Fit Solution: Null Hypothesis: All package designs are equally preferred. Alternative Hypothesis: No, they are not equally preferred Package Design A B C D E Total Observed(O) 36 52 40 35 37 200 2 (O E) χ E 2 Expected(E) 40 40 40 40 40 200 2 (O E) E (O 2 E) 16 144 0 25 9 0.400 3.600 0.000 0.625 0.225 4.850 4.850 = The critical χ for 4 d.f at 5% level of significance is 9.49. Since the calculated value of is less than critical at 5% level, accept the null hypothesis of equal preference. The conclusion is that all packages are equally preferred and difference in preference in the sample survey may have arisen due to chance. 2 3) Chi-Square Test-Cross Tab The goodness-of-fit test is suitable for situations involving one categorical variable (e.g. package design). If there are two categorical variables, and our interest is to find out whether these two variables are associated with each other, the test of independence is the appropriate technique to use. This test is very popular for analyzing cross-tabulations in which an investigator is keen to find out whether the two categorical variables are having any relationship with each other. 3) Chi-Square Test-Cross Tab Example: In a market survey conducted to examine whether the choice of a brand is related to the income strata of the consumers, a random sample of 600 consumers reveal the following: Income Strata (Income Per month) Brand1 Brand2 Brand3 Total Less thanRs.10000 Rs10000-15000 Rs15000-20000 Above Rs 20000 132 62 30 16 128 60 30 22 50 28 26 16 310 150 86 54 Total 240 240 120 600 The manger who conducted this survey wants to know whether the brand preference is associated with the income strata. 3) Chi-Square Test-Cross Tab Solution: The null hypothesis is that there is no association between the brand preference and the income level (These two are independent). The alternative hypothesis is that the brand and income level are associated (dependent). Let us take a level of significance of 5%. In order to calculate the value, you need to work out the expected frequency in each cell in the contingency table. In our example, there are 4 rows and 3 columns amounting to 12 elements. There will be 12 expected frequencies. 3) Chi-Square Test-Cross Tab Observed Frequencies Brand1 Brand2 Expected Frequencies Brand3 Brand2 Brand3 124 124 62 60 60 30 34.4 34.4 17.2 21.6 21.6 10.8 Income Strata Income Strata 132 128 50 Less than 10000 Less than 10000 62 60 28 10000 to 15000 10000 to 15000 30 30 26 15000 to 20000 15000 to 20000 16 Above 20000 Brand1 22 16 Above 20000 3) Chi-Square Test-Cross Tab Compute 2 (O E) χ E. 2 There are 12 observed frequencies (O) and 12 expected frequencies (E). As in the case of the goodness of fit, calculate this value. In our case, the computed 2 (O E) χ E 2 =12.76. 2 χ value at 5% level for 6 d.f =12.59. The upper The null hypothesis is rejected. The conclusion is that the brand preference and income level are associated. 4) ANOVA Basics This technique is part of the domain called “Experimental Designs”. This helps in establishing in a precise fashion the Cause - Effect relation amongst variables. 4) ANOVA Basics The beauty of ANOVA is that it performs the test of equality of more than two population means by actually analyzing the variance. In simple terms, ANOVA decomposes the total variation into components of variation. That is, explaining the changes in the response variable caused by these components. To put it succinctly, the total sum of squares is equal to the sum of squares due to causes. 5) ANOVA-One Way Classification A supermarket is interested in knowing whether it should go for a quarter-page, half-page, or a full-page advertisement for a Product. In order to choose the size of the advertisement that will bring in the most store traffic, the supermarket can use ANOVA technique. Here, you are trying to establish a cause-effect relationship between store traffic and the various sizes of advertisement. 5) ANOVA-One Way Classification How One-Way Classification Works in Practice? You are going to first decompose the total sum of squares into some of squares due to causes. Here you are assuming that the Total Sum of Squares = Treatment Sum of Squares + Error Sum of Squares. The word treatment is generic and as such may denote different methods, machines, different advertisement copy platforms, different strategies, different brands and the like. The variation in sum of squares of the response variable (dependent variable) is caused only by treatment and any thing unexplained by the treatment is attributed to error term. 5) ANOVA-One Way Classification Example: A consumer marketing group desired to examine whether supermarket chains operating in a city differed in their “out of stock” levels for advertised specials. The group identified the relevant response variable as the percentage of the items advertised not in stock. The following table provides the data collected from three supermarket chains in the city. Chain1 Chain2 Chain3 15 10 17 14 14 12 20 9 14 15 10 15 16 11 12 5) ANOVA-One Way Classification Example Continues The marketing group would like to know whether there are significant differences among the three chains with regard to mean percentage out of stock on advertised specials. How would you analyze this situation? 5) ANOVA-One Way Classification Solution: Using Microsoft Excel or Formula Method, the following ANOVA table is obtained. Source of Variation SS df MS Treatment (Between Groups) 68.8 2 34.40 Error(Within Groups) 54.8 12 4.57 123.6 14 Total F computed 7.53 F critical 3.89 5) ANOVA-One Way Classification Solution continues Formulation of the Null and Alternative hypothesis H0: The population means of percentage stock out position for all the three chains are equal H1: The population means of percentage stock out position for all the three chains are not equal Decision Rule: If the computed F is greater than the critical F, reject the null hypothesis H0 and accept the alternative H1. At 5% level from the ANOVA output of Excel, we have the computed F = 7.53 and the critical F(2,12) =3.89. So, reject the null hypothesis and accept the alternative. The inference is that the population means of percentage stock out are not the same for all the three chains. So, what do you do? Now, look at the point estimates from the summary table. Chain 1 has a mean stock out of 16%, chain 2 has a mean stock out of 10.8% and chain 3 has a mean stock out of 14%. Chain 2 has the least stock out percentage followed by chain 3 and then chain 1. 5) ANOVA-One Way Classification Assumptions involved in using ANOVA The samples drawn from different independent and random. In our case independently and randomly drawn supermarket chains. populations are the samples are from the three The response variables of all the populations are normally distributed. In our example, the response variable namely the percentage stock out is normally distributed. The variances of all the populations are equal. In our example, the variances of the three chains are equal. 6) ANOVA-Two Way Classification Example: A supermarket that has a chain of stores is concerned about its service quality reputation perceived by its customers. The Table below shows the perceived service quality with regard to politeness of the staff. The number in each cell of the table is the percentage of people who have said that the staff is polite. Perform the two-way ANOVA and draw your inferences about the population means of politeness corresponding to the days as well as the stores. 6) ANOVA-Two Way Classification Day Monday Tuesday Wednesday Thursday Friday Store A B C D E 79 78 81 80 70 81 86 87 83 74 74 89 84 81 77 77 97 94 88 89 66 86 82 83 68 6) ANOVA-Two Way Classification Sourceof Variation Rows Columns Error Total SS 617.36 461.76 282.64 1361.76 df 4 4 16 24 MS F P-value Fcrit 154.34 8.737051 0.000614 3.006917 115.44 6.534956 0.002575 3.006917 17.665 6) ANOVA-Two Way Classification Interpretation of the results: Rows are the days and columns are the stores. The F value computed in both cases is greater than the critical F. So reject the null hypothesis of equality of means in both the cases. The conclusion is that the stores (columns) as well as the days (rows) reveal different patterns in politeness level. The highest politeness level is witnessed on Tuesday and Store D extends the maximum politeness level. ...
View Full Document

This note was uploaded on 02/24/2012 for the course BUSINESS 281 taught by Professor Gray during the Spring '12 term at Florida State College.

Ask a homework question - tutors are online