. Now we need to look at table of the
2
- distribution on
page 594 with df = 5 and try to find 5.22 on that line. You note that it is not on that line.
However, we also note that
2
(5)
6.63
0.25
P
. A simple graph tells us that
2
(5)
5.22
p
value
P
> 0.25.

STA 6126 Chap 8, Page 3 of 20
5. Decision:
Do not
Reject Ho since p-value > any reasonable
α.
6. Conclusion:
The observed data strongly indicate that the die is not loaded.
B) Test of Homogeneity
Observe that in Section 7.2 we had two populations, two random samples from these populations
and a categorical random variable with only two categories.
Gender
Belief in Afterlife
Yes
No or Undecided
Total
Female
435
147
582
Male
375
134
509
Total
810
281
1091
We have decided that there is no significant difference between the males and females in their
belief in afterlife. Hence we say that the two populations are homogeneous with respect to their
belief in afterlife. Such a test is known as the
test of homogeneity.
In this section we will extend the above ideas to the case where the categorical variable has two
or more categories (s
ay r ≥ 2) and the number of populations are two or more (say c ≥ 2).
We summarize the sample data in an r by c (denoted as r×c) contingency table, i.e., a table with r
rows and c columns.
Categories
Total
Samples
1
2
…
c
1
O
11
O
12
…
O
1c
n
1.
2
O
21
O
22
…
O
2c
n
2.
.
.
.
.
.
.
.
.
.
…
…
…
.
.
.
.
.
.
r
O
r1
O
r2
…
.
n
r.
Total
n
.1
n
.2
…
.
n
..
We
test the hypothesis that the populations are homogeneous with respect to the
(categorical) variable of interest.

STA 6126 Chap 8, Page 4 of 20
The basic idea of obtaining a
“
pooled sample proportion
”
in the case of two-population, two-
category problem (data summarized in a 2×2 contingency table as above) is used in the general
case of where we have a c-population, r-category problem (data summarized in an r×c
contingency table).
If the assumption of homogeneity (Ho) is true, then π
ij
= π
j
for all of the j populations then we
need to estimate only one parameter (
j
) for the proportion in each category that applies to all
of the populations. The parameter,
j
is estimated by dividing the total of each category in the
sample with the total sample size (
.
..
ˆ
j
j
n
n
).
Then, based on these estimates, we calculate the
expected number of observations in each
category of each sample (i.e., for each cell in the table)
.
.
.
..
..
ˆ
j
i
j
ij
i
j
i
n
n
n
Rowtotal
Column total
E
n
n
n
n
Grand total
Next, we compare the observed values (Oij) with the expected values (E
ij
) in each cell of the r×c
contingency table with the following test statistic:
The test statistic is
2
2
2
(
)
~
ij
ij
df
all cells
ij
O
E
E
If the hypothesis of homogeneity is true, we expect the calculated value of the test statistic (
2
cal
)
to be small. Large values of
2
cal
leads to the rejection of Ho. How large depends on the degrees
of freedom and α, so that P(
2
(
)
df
≥
2
cal
) = p-
value ≤ α.
In such problems the variable of interest is called the
response
(also called the dependent)
variable and the code for the populations is called the
predictor
(or the independent) variable.