The Chi Square Statistic

Types of Data: There are basically two types of random variables and they yield two types of data: numerical and categorical. A chi square ( X 2 ) statistic is used to investigate whether distributions of categorical variables differ from one another. Basically categorical variable yield data in the categories and numerical variables yield data in numerical form. Responses to such questions as "What is your major?" or Do you own a car?" are categorical because they yield data such as "biology" or "no." In contrast, responses to such questions as "How tall are you?" or "What is your G.P.A.?" are numerical. Numerical data can be either discrete or continuous. The table below may help you see the differences between these two variables. Data Type Question Type Possible Responses Categorical What is your sex? male or female Numerical Disrete- How many cars do you own? two or three Numerical Continuous - How tall are you? 72 inches Notice that discrete data arise fom a counting process, while continuous data arise from a measuring process. The Chi Square statistic compares the tallies or counts of categorical responses between two (or more) independent groups. (note: Chi square tests can only be used on actual numbers and not on percentages, proportions, means, etc.) 2 x 2 Contingency Table There are several types of chi square tests depending on the way the data was collected and the hypothesis being tested. We'll begin with the simplest case: a 2 x 2 contingency table. If we set the 2 x 2 table to the general notation shown below in Table 1, using the letters a, b, c, and d to denote the contents of the cells, then we would have the following table: Table 1. General notation for a 2 x 2 contingency table. Variable 1 Variable 2 Data type 1 Data type 2 Totals Category 1 a b a + b Category 2 c d c + d Total a + c b + d a + b + c + d = N For a 2 x 2 contingency table the Chi Square statistic is calculated by the formula: Note: notice that the four components of the denominator are the four totals from the table columns and rows. Suppose you conducted a drug trial on a group of animals and you hypothesized that the animals receiving the drug would survive better than those that did not receive the drug. You conduct the study and collect the following data: Ho: The survival of the animals is independent of drug treatment.
