This preview shows page 1. Sign up to view the full content.
Unformatted text preview: nt whether or not
the unit is on …re. Statistically spreaking we are interested if these two random
variables are independent of each other. More speci…cally we want to see if
P(X = x; Y = y )=P(X = x)P(Y = y ). This data has been categorized into
four bins: males on …re, males not on …re, females on …re, and females not on
…re. Expressed more generally we have:
Y =0
Y =1
Total X=0
a
c
a+c X=1
b
d
b+d Total
a+b
c+d
a+b+c+d=n If we are interested in the expected number of males on …re given a sample
of size n then we would be interested in n P(X = 0) P(Y = 0) assuming that
we have independence. This becomes
(a + b)(a + c)
a+b a+c
=
n
n
n
This can be done for the remaining three boxes which will yeild a chart of the
expected values for each bin.
n X=0
Y =0
Y =1
Total (a+b)(a+c)
n
(c+d)(c+a)
n X=1
(b+a)(b+d)
n
(d+b)(d+c)
n 75:33
95:68 193 158
2 61:67
78:32 Total
137
174
351 We now have tables detailing the expected bins and the observed. The test
statistic for the general Pearson Test with k bins is
D= k
X (oi
i=1 ei )2
ei For i representing the bins. This is the mean squared di¤erence between the
theoretical and the observed in each bin divided by its relative frequency (which
so happens to be the theoretical number of items in the bin). This is not exactly
a 2 distribution; however, it is asymptotically 2 . We need to determine the
degrees of freedom for this statistic.
The majority of the time we will be interested in a 2 Test on a single
randomvariable model. This particular model was selected to help with multivariate categorical data and we have the degrees of freedom is (r 1)(c 1)
where r is the number of rows and c is the number of columns. In the example
there is one degree of freedom. However, if there was a third option where units
could be “sort of on …re” then the example would have (31)(21)=2 degrees of
,
freedom.
Using a calculator, we get an approximate test statistic of t=14.03 on 1
degree of freedom. Now following the math:
P(T > t) = 1 P(T t)
= 1 P(T 14:03)
> 1 0:999
> 0:0001 From Chapter Two we know that this is grounds for rejecting our hypothesis.
Therefore, we have very strong evidence against our hypothesis and we reject it
in favour of the alternate hypothesis: being on …re and sex are not independent
of each other.
The intricacies of using the 2 Test may not be apparent from the previous
example. So consider the following set of data:
47
128
66 199
408
61 120
47
25 59
48
91 204
79
217 This data comes from some unknown distribution and we would like to determine an empirical distribution function. All that is known is that the data
is discrete.
If you take a gander at the general distribution of a geometric distribution,
then you will see that the empirical CDF looks like the theoretical CDF of
a geometric. To start with an estimated distribution we need to determine
parameters. From the review of MLE’
s:
!1
15
1X
xi
0:00833
p=
^
15 i=1
3 4 Therefore, we hypothesize that the data follows a Geo(0.00...
View
Full
Document
 Winter '12
 RILEY

Click to edit the document details