s340text-KStestOnly

# The y represent whether or not the unit is on re

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: nt whether or not the unit is on …re. Statistically spreaking we are interested if these two random variables are independent of each other. More speci…cally we want to see if P(X = x; Y = y )=P(X = x)P(Y = y ). This data has been categorized into four bins: males on …re, males not on …re, females on …re, and females not on …re. Expressed more generally we have: Y =0 Y =1 Total X=0 a c a+c X=1 b d b+d Total a+b c+d a+b+c+d=n If we are interested in the expected number of males on …re given a sample of size n then we would be interested in n P(X = 0) P(Y = 0) assuming that we have independence. This becomes (a + b)(a + c) a+b a+c = n n n This can be done for the remaining three boxes which will yeild a chart of the expected values for each bin. n X=0 Y =0 Y =1 Total (a+b)(a+c) n (c+d)(c+a) n X=1 (b+a)(b+d) n (d+b)(d+c) n 75:33 95:68 193 158 2 61:67 78:32 Total 137 174 351 We now have tables detailing the expected bins and the observed. The test statistic for the general Pearson Test with k bins is D= k X (oi i=1 ei )2 ei For i representing the bins. This is the mean squared di¤erence between the theoretical and the observed in each bin divided by its relative frequency (which so happens to be the theoretical number of items in the bin). This is not exactly a 2 distribution; however, it is asymptotically 2 . We need to determine the degrees of freedom for this statistic. The majority of the time we will be interested in a 2 Test on a single random-variable model. This particular model was selected to help with multivariate categorical data and we have the degrees of freedom is (r 1)(c 1) where r is the number of rows and c is the number of columns. In the example there is one degree of freedom. However, if there was a third option where units could be “sort of on …re” then the example would have (3-1)(2-1)=2 degrees of , freedom. Using a calculator, we get an approximate test statistic of t=14.03 on 1 degree of freedom. Now following the math: P(T > t) = 1 P(T t) = 1 P(T 14:03) > 1 0:999 > 0:0001 From Chapter Two we know that this is grounds for rejecting our hypothesis. Therefore, we have very strong evidence against our hypothesis and we reject it in favour of the alternate hypothesis: being on …re and sex are not independent of each other. The intricacies of using the 2 Test may not be apparent from the previous example. So consider the following set of data: 47 128 66 199 408 61 120 47 25 59 48 91 204 79 217 This data comes from some unknown distribution and we would like to determine an empirical distribution function. All that is known is that the data is discrete. If you take a gander at the general distribution of a geometric distribution, then you will see that the empirical CDF looks like the theoretical CDF of a geometric. To start with an estimated distribution we need to determine parameters. From the review of MLE’ s: !1 15 1X xi 0:00833 p= ^ 15 i=1 3 4 Therefore, we hypothesize that the data follows a Geo(0.00...
View Full Document

## This note was uploaded on 09/27/2013 for the course STATS 340 taught by Professor Riley during the Winter '12 term at Waterloo.

Ask a homework question - tutors are online