**Before beginning this lab**, attend the lecture for Chapter 16and read pp. 349 - 351 in your text.

Until now, we have primarily considered quantitative response variables. Let us now address categorical responses. We are particularly interested in *bi*nary response variable: categorical responses with two possible outcomes. This is a common situation in public health statistics, where outcomes are "dichotomous" (e.g., diseased/not diseased, survived/died).

Whereas quantitative data are described with sums and averages, binary data are described with counts and proportions. We are particularly interested in two specific types of proportions in public health: prevalence proportions and incidence proportions.

Prevalence is the proportion of individuals that have a condition at a particular time. In contrast, incidence proportion ("risk") is the proportion of individuals who develop a particular condition over a period of time.

**1. Data and research question. **Let us return to the data in Table 1.4 of the textbook. Recall that data were derived from an investigation in which a high proportion of patients receiving chemotherapy experienced cerebellar toxicity. Open the data file (Table1.4LnameFname.sav) and review its codebook in Variable View.

**(a) Binary variables in data set. **List the names of the binary variables in this data set.

**(b) Research question and name of response variable. **We want to estimate the incidence proportion of cerebellar toxicity in patients receiving this type of chemotherapy. What is the name of the variable in the data set that contains this information?

**(c) The research question and sample type. **We will use the data for our 25 patients to estimate the incidence proportion in the population. Is this a one-sample, paired-sample, or independent-sample problem?

**2. Estimation. **Our goal is to estimate the incidence of cerebellar toxicity in the patient population. The sample proportion is denoted by the symbol *p*ˆ ("p hat"). Let

*p *which represents the true proportion in the population. Note that sample proportion *p*ˆ is the point estimator of parameter *p*.

**a. **

**Point estimate: **Sample proportion *p*ˆ = *x *, where *x *represents the observed

*n*

number of success in the sample and *n *represents the sample size. Use "Analyze > Descriptive Statistics > Frequencies" to determine *x *, *n*, and *p*ˆ for TOX in the sample.

*x *= __ __

*n *= __ __

*p*ˆ = __ __

**b. 95% CI for p. **The value of parameter

*p*is not known and will never be known exactly. It can, however, be estimated with confidence. The best method for calculating the confidence interval for

*p*by hand is the "plus-four" using this

~ ~ ~

~ ~*x *~ ~

*p *± *z*

formula:

1-*a *×

2

where *n *= *n *+ 4 ,

*x *= *x *+ 2 ,

*p *= , and

*n*

*q *= 1 - *p *. Calculate

~

the 95% CI for *p*.

*Optional*: Check your calculations with www.OpenEpi.com > Counts > Proportions. OpenEpi calculates several types of confidence intervals for *p*. The "Score(Wilson)" results will be similar (but not identical) to the plus-four results you calculated by hand.

**c. 99% CI for p. **Use the plus-four method to calculate a 99% CI for the incidence of toxicity.

**d. Reporting. **Report your results in the form "*x *(zz.z%) of the *n *patients experienced toxicity (95% CI for *p*: LCL to UCL; 99% CI: LCL to UCL)."

**3. Hypothesis test**. Let us assume that, with the best of care, 10% (0.10) of patients will experience cerebellar toxicity during treatment with the drug in question. Called this expected incidence proportion *p*0. We want to test whether this patient population experiences toxicity more frequently than expected.

**(a) Hypotheses: **The null hypothesis for one-sample tests of proportions is *H*0: *p *= *p*0, where *p*0 represents the expected proportion under the null hypothesis. The alternative hypothesis is either *H*a: *p *> *p*0 (one-sided to the right), *H*a: *p *< *p*0 (one- sided to the left), or *H*a: *p *≠ *p*0 (two-sided). Write the null and two-sided alternative hypotheses for the current research question.

**(b) ) Z statistic: **Let us use an unmodified *z *statistic to test the hypothesis.* The test

statistic is *z *=

*p*ˆ - *p*0

where *p*ˆ ≡ thesample proportion, *p*0

≡the value of *p *under

the null hypothesis; *q*0 = 1 - *p*0**. **Calculate the *z *statistic for this problem.

**(c) P-value: **Convert the *z*-statistic to a two-tailed *P*-value using Table B.

**(d) **Is the difference between the observed rate (24%) and expected rate (10%) non-significant, marginally significant, significant, or highly significant?

**(e) ***Optional*: Calculate a *P*-value for testing *H*0: *p *= .10 using the SPSS binomial function by selecting Analyze > Nonparametric Tests > Binomial.

* A case can be made not to use the unmodified z statistic, but we will ignore this for now.

When the dialogue box appears, select TOX as your "Test variable" and enter 0.10 as the proportion under the null hypothesis.

Report the one-tailed *P*-value from the SPSS binomial test = __ __

Note: SPSS provides one-tailed binomial test result only. These results will be similar but not identical to that of the *z *test. Hand calculation of the binomial *P*-value is beyond the scope of the class.

**4. Relationship between the CIs and the hypothesis test. **Use the 95% CI to address whether the observed incidence of 24% is significantly different from an expected incidence of 10% at α = .05. Then use the 99% CI to address the question at α = .01. Explain your reasoning in each instance.