STAT503 — Fall 2008
Lecture Notes: Chapter 10
1
Chapter 10: Analysis of Categorical Data
April 6, 2009
Our observations fall into
categories
instead of being
continuous
variables. We count the
number of observations falling into each category. As usual we assume that the
sample
points are independent
.
If there are only
two categories
,
•
the number of observations in one category has a
binomial
distribution.
If there is
more than two categories
,
•
we can focus on one category and group others together (still binomial)
•
or, we can define probabilities for all categories (
p
1
,
p
2
,
. . .
).
We will use a new distribution called a
χ
2
distribution
(chisquared).
•
It is another cousin to the
Normal(0
,
1)
distribution.
•
Definition: If
Z
1
, Z
2
, . . . , Z
k
are independent Normal(0
,
1) random variables then
∑
k
i
=1
Z
2
i
has a
χ
2
k
distribution
(a chisquared distribution with
k
degrees of free
dom).
There are
several
different
tests
which
use the
χ
2
distribution
to determine critical
values.
They differ in their setup just like there are many
different kinds of
t
tests
.
[Draw
χ
2
curve.]
10.1 The
χ
2
Goodness of Fit Test
In this section we consider
testing if the
observed frequencies
for a categorical variable
are
compatible
with a null hypothesis
that specifies the
probabilities of the categories
.
Thus, we study if the data seem to
fit
the hypothetical distribution.
For example the question “Is this a fair coin?” may be answered by this method.
•
Description of such test for categorical data, based on a random sample of size
n
.
I
Need hypothesized values
for the population proportions
p
i
for each cate
gory. These are specified in or implied by the given problem.
Chapter10.tex; Last Modified: April 6, 2009 (W. Sharabati)
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
STAT503 — Fall 2008
Lecture Notes: Chapter 10
2
I
We calculate the
expected
number of observations in each category un
der
H
0
.
We will use the formula
np
i
(number of observations times the
population proportion).
I
The test is only approximate and works when the
sample size is large
. The
expected number
in each category should be
at least 5
.
The Test Statistic
The test statistic is computed as follows:
I
X
2
s
=
k
X
i
=1
(Observed

Expected)
2
Expected
=
k
X
i
=1
(
O

E
)
2
E
.
I
Under the null hypothesis
X
2
s
has approximately
χ
2
k

1
distribution
.
I
Table 9 gives critical values for the
χ
2
k
distribution.
I
Rejection rule
:
Large
values of
X
2
s
lead to the rejection of
H
0
.
Example
In the sweet pea, the allele for
purple
flower color (P) is dominant to the allele for
red
flowers (
p
), and the allele for long pollen grains (L) is dominant to the allele for round
pollen grains (l).
The first group (of grandparents) will be homozygous for the dominant alleles (PPLL)
and the second group (of grandparents) will be homozygous for the recessive alleles
(
pp
ll).
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 Staff
 Eye color, Chisquare distribution, W. Sharabati

Click to edit the document details