Assignment Answers Categorical Data Analysis, CHL 5407H 1. You are asked to design a cross-sectional study investigating the ex- ercise habits of new graduate students at the University of Toronto. Students will be classiFed as being inactive if they exercise less than twice a week while students who exercise at least twice a week are clas- siFed as being active. How many students would have to be enrolled to ensure that a 90% conFdence interval about the estimated proportion of students being active is no wider than ± 0 . 05 ? An approximate 90% confdence interval For π is given by ˆ π ± 1 . 645 ± ˆ π (1 - ˆ π ) /n. As discussed in class this confdence interval will be widest when ˆ π =0 . 5. There- Fore solving the equation 0 . 05 = 1 . 645 ± 0 . 5 × 0 . 5 /n gives n = 270 . 60. Thus a sample oF 271 women is needed to be able to estimate the probability oF preFerring the new analgesic to within ± 0 . 05, with 90% confdence. 2. Data on smoking history from four studies of lung cancer patients are provided in Table 1. Use these data to answer the following questions. (a) Construct an hypothesis test using these data to test the null hy- pothesis that the true proportion of smokers does not vary across the four studies. Provide a brief explanation of your results. We are testing the null hypothesis that the true proportion oF smokers does not vary across the Four studies versus the alternative hypothesis that the true proportion oF smokers does vary across the Four studies. We can test this null hypothesis using a Pearson chi-square statistic and comparing it to a chi-square distribution with 3 degrees oF Freedom. χ 2 P = 2 ² i =1 3 ² j =1 ( O ij - E ij ) 2 E ij = (3 - [25 × 86 / 397]) 2 25 × 86 / 397 + ... + (70 - [372 × 82 / 397]) 2 372 × 82 / 397 =1 2 . 6004 ( p . 0056) . 1

We therefore reject the null hypothesis at the 5% level concluding that the observed diﬀerence among the four studies is statistically signiFcant. Note in particular that there was little diﬀerence in the proportion of smokers between study 1, study 2 or study 3 (see Appendix 1). Rather the statis- tical signiFcance is almost entirely a consequence of the relatively smaller proportion of smokers in study 4. One way of demonstrating this is using the
