# 5 20 x i 1 n i 5 2 and under the null hypothesis q

• Notes
• 20

This preview shows page 3 - 6 out of 20 pages.

5 20 X i =1 ( N i - 5) 2 , and under the null hypothesis, Q will be approximately distributed as χ 2 with 19 degrees of freedom. Suppose that we want to test whether a random sample of observations comes from a par- ticular distribution. The the following procedure can be adopted: (i) Partition the entire real line, or any particular interval that has probability 1, into a finite number of k disjoint subintervals. Generally, k is chosen so that the expected number of observations in each subinterval is at least 5, if H 0 is true. (ii) Determine the probability p (0) i that the particular hypothesized distribution would assign to the i -th subinterval, and calculate the expected number np (0) i of observations in the i -th subinterval, i = 1 , . . . , k . (iii) Count the number N i of observations in the sample that fall within the i -th subinterval. (iv) Calculate the value of Q as defined in (1). If the hypothesized distribution is correct, then Q will approximately follow a χ 2 distribution with k - 1 degrees of freedom.
3
In order to apply Wilk’s theorem (Theorem 9.1.4 in the book), the parameter space must be an open set in k -dimensional space. This is not true for the multinomial distribution if we let p to be the parameter (as k i =1 p i = 1). The set of probability vectors lies on a ( k - 1) dimensional set of R k . However, we can effectively treat the vector θ = ( p 1 , . . . , p k - 1 ) as the parameter, as p k = 1 - p 1 - . . . - p k - 1 is a function of θ . As along as we believe that all the coordinates of p are strictly between 0 and 1, the set of possible values of the ( k - 1)-dimensional parameter θ is open. Therefore, by the Wilk’s theorem, - 2 log Λ( X ) is approximately χ 2 with k - 1 degrees of freedom. Exercise: Suppose that Y 1 , . . . , Y n is a random sample from a population with density function given by f ( y | p ) = ( p i if y = j , where j = 1 , 2 , 3 0 otherwise , where p = ( p 1 , p 2 , p 3 ) is the vector of parameters such that p 1 + p 2 + p 3 = 1 and p j 0 for j = 1 , 2 , 3. Use the likelihood ratio test for testing H 0 : p 1 = p 2 = p 3 versus q H 1 : H 0 is not true. Use the level α = 0 . 05.
4
1.3 Goodness-of-fit for composite hypothesis We can extend the goodness-of-fit test to deal with the case in which the null hypothesis is that the distribution of our data belongs to a particular parametric family. The alternative hypothesis is that the data have a distribution that is not a member of that parametric family. Thus, in the statistic Q , the probabilities p (0) i
• • • 