lec2 - That is if we look for the area under the curve...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: That is, if we look for the area under the curve between any two numbers on the original scale (height in this example), this quantity di ers for di erent normal curves. There is not a single answer that applies for all normal curves. However, when looking for areas under the curve between two numbers expressed as z -scores (the number of SDs above or below the mean), then the areas are the same for any normal curve! Notice that for women, X ; 1 X = 65 ; 1(2:5) = 62:5 and X = 65 + 1 X = 65 + 1(2:5) = 67:5 correspond to z -scores of -1 and +1: 62:5 ; X = 62:5 ; 65 = ;2:5 = ;1 2:5 2:5 X 67:5 ; X = 67:5 ; 65 = 2:5 = 1 2:5 2:5 X The probability of women's height falling between z -scores of 1 is 68.26%. And for men, Y ; 1 Y = 70 ; 1(3) = 67 and Y +1 Y = 70+1(3) = 73 correspond to z-scores of -1 and +1: 67 ; Y = 67 ; 70 = ;3 = ;1 3 3 Y 73 ; Y = 73 ; 70 = 3 = 1 3 3 Y The probability of men's height falling between z-scores of 1 is 68.26%. The point of all of this is that by transforming to z;scores we can compute probabilities for any normal distribution (no matter its mean and variance) using a single reference distribution | the standard normal distribution N (0 1). Fact: If X N( 2 ), then Z= X; N (0 1): 101 For a normally distributed r.v. X in nding probabilities like N( 2 ) often we will be interested P (c < X < d) or P (X c) or P (X < c) etc: where c d are given constants. E.g., we might want to know the percentage of women with heights between 60 and 65 inches, or the percentage of women with heights greater than or equal to 68 inches, or less than 61 inches, etc. How do we use Z ;scores to get such probabilities? Facts about inequalities: 1. X c if and only if X d c d. { Here can be replaced by any other inequality or equality ( > < =) and the statement would still be true. { This means that P (X c) = P (X d c d) e.g., P (X 4) = P (X ; 3 4 ; 3) and, P (X > 7) = P (X + 2 > 7 + 2): 2. c X d if and only if c + b] X + b] d + b]. { Again, can be replaced by any other inequality or equality ( > < =). { This means that P (c X d) = P ( c + b] X + b] d + b]) E.g., P (3 X 7) = P ( 3 ; 2] X ; 2] 7 ; 2]) = P (1 X ; 2 5) and, P (1 > X > ;3) = P ( 1 + 9] > X + 9] > ;3 + 9]) = P (10 > X + 9 > 6): 102 3. For c any constant and b a constant which is 0, X c if and only if bX bc: If b is a negative number, then multiplying by b reverses the inequality: X c if and only if bX bc: { Again, can be replaced by any other inequality or equality and the statement would still be true. { This means that P (bX bc P (X c) = P (bX bc) iif b < 0, ) fb 0 E.g., P (X 5) = P (3X 3(5)) = P (3X 15) and, P (X > ;1) = P (;2X < (;2)(;1)) = P (;2X < 2): 4. Result 3 extends to double inequalities. That is, for b 0, c X d if and only bc bX bd and for b < 0, c X d if and only bc bX bd: { Again, can be replaced by any other inequality or equality and the statement would still be true. { This means that P P (c X d) = P (bc bX bd) iif b < 0, (bc bX bd) f b 0 E.g., P (3 X 9) = P ((2)(3) 2X (2)(9)) = P (6 2X 18) and, P (3 X ;1) = P ((;1)(3) ;X (;1)(9)) = P (;3 ;X 103 ;9): In summary, one can add or subtract any number or multiply or divide by any non-negative number on both (all) sides of an inequality without changing the inequality. If we multiply or divide by a negative number that switches the direction of the inequality. These results allow us to use the standard normal distribution N (0 1) to compute probabilities associated with a normal distribution N ( 2 ) for any and 2 . Examples: i. To detect whether patients have had a stroke, one measure which is sometimes used is the cerebral blood ow (CBF) in the brain. Stroke patients tend to have lower levels of CBF than healthy patients. Assume that in the general population, X =CBF follows a N (75 172) distribution. A patient is classi ed as \probable stroke" if his or her CBF is less than 40. What proportion of healthy patients will be mistakenly classi ed as probable stroke victims? Answer: X N ( 2 ) where = 75, = 17. We want to nd P (X < 40): P (X < 40) = P (X ; < 40 ; ) = P X ; < 40 ; = P Z < 40 ; where Z N (0 1) ; = P Z < 40 17 75 = P (Z < ;2:06) Now the probability P (Z < c) for any number c can be computed from a computer program. For instance, in Minitab we select Calc ! Probability Distributions ! Normal::: and then select \cumulative probability" (which gives the probability to the left of c), set the mean and standard deviation to 0 and 1, respectively, and input c in the eld \input constant". Hitting OK gives the answer: P (X < 40) = P (Z < ;2:06) = :0197 104 Note that -2.06 is just the Z score associated with 40. Note that Minitab allows you to set the mean and the standard deviation to anything you want. So, we actually could have computed P (X < 40) here directly without transforming to Z scores by setting the mean and standard deviation to 75 and 17, respectively, and setting \input constant" to 40. Other computer programs also have normal probability functions. E.g., in Excel, the function NORMDIST(c, , ,TRUE) gives P (X < c) for X N ( 2 ). While transforming to Z scores is not necessary with one of these computer functions, it is necessary for using a standard normal probability table. Standard normal tables are given in a variety of formats. Some give P (Z < c) for selected values c 0, some give P (;c < Z < c) for selected values c 0, and others give P (0 < Z < c) for selected values c 0. (See handout). Any of these formats can be used to compute any desired normal probability if we use some logic and the facts that a. the normal distribution is symmetric, so (for example) P (Z < ;c) = P (Z > c) for any positive constant c, and b. the area under the normal distribution is 1, so that P (Z > c) = 1 ; P (Z c) = 1 ; P (Z < c) for any constant c. When computing normal probabilities from a table it is very useful to draw a picture in order to gure out exactly how to use the table and these facts to get the desired probability. 105 Back to the example: We want P (Z < ;2:06). Picture: 0.3 0.2 N(0,1) p.d.f. at Z 0.0 0.1 0.2 0.1 0.0 N(0,1) p.d.f. at Z 0.3 0.4 Standard normal p.d.f. 0.4 Standard normal p.d.f. -4 -2 0 Z value 2 4 -4 -2 0 2 4 Z value To use the rst table that gives P (Z < c) for c 0 we reason as follows: P (X < 40) = P (Z < ;2:06) = P (Z > 2:06) = 1 ; P (Z 2:06) = 1 ; P (Z < 2:06) = 1 ; :98030 = :0197 To use the second table that gives P (;c < Z < c) for c 0 we reason as follows: 1 P (X < 40) = P (Z < ;2:06) = 2 f1 ; P (;2:06 < Z < 2:06)g 1 f1 ; P (;2:05 < Z < 2:05)g = 1 (1 ; :9596) = :0202 2 2 which is slightly o because 2.06 didn't appear in our table and we had to use 2.05 instead. To use the third table that gives P (0 < Z < c) for c 0 we reason as follows: P (X < 40) = P (Z < ;2:06) = P (Z > 2:06) 1 = 1 ; P (0 < Z < 2:06) = 2 ; :4803 = :0197 2 106 ii. Suppose that a mild hypertensive is de ned as a person whose distolic blood pressure is between 90 and 100 mm Hg (inclusive). Suppose also that 35{44 year-old males have diastolic blood pressure which is normally distributed with mean 80 and variance 144. What is the probability that a randomly selected 35{44 year old male is hypertensive? I.e., if X N (80 144), nd P (90 X 100). 0.2 0.1 0.0 N(0,1) p.d.f. at Z 0.3 0.4 Standard normal p.d.f. -4 -2 0 2 4 Z value Answer: P (90 X 0 100) = P (90 ; X ; 100 ; ) 1 B X; 100 ; C = P 90 ; 80 Z C = P B 90 ; @ A 12 | {z } =Z 100 ; 80 12 = P (:83 Z 1:67) = P (Z 1:67) ; P (Z < :83) = P (Z < 1:67) ; P (Z < :83) = :95254 ; :79673 = :15581 Our book actually gives a fourth form of normal table (Table A.3 in Appendix A) which gives P (Z > c) for selected values c 0. To use that table for this problem we would notice that P (90 X 100) = P (Z < 1:67) ; P (Z < :83) = 1 ; P (Z 1:67)] ; 1 ; P (Z :83)] = (1 ; :047) ; (1 ; :203) = :156 107 iii. Glaucoma is an eye disease characterized by high intraocular pressure (IOP). Suppose that the distribution of X =IOP in the general population is N ( 2 ) where = 16 mm Hg, and = 3 mm Hg. If the normal (i.e. healthy) range of IOP is de ned as between 12 and 20 mm Hg, what percentage of the general population would fall in this range? Answer: P (12 0 B X 20) = P B 12 ; @ = P 12 ; 16 3 X; | {z } =Z 20 ; 1 C C A 20 ; 16 = P (;1:33 < Z < 1:33) 3 = 2P (0 < Z < 1:33) = 2 1 ; P (Z 1:33) = 2(:5 ; :092) = 0:816 2 Z or 81.6%. Normal Percentiles: Sometimes, we'd like to work backward and gure out what value of X is associated with a particular normal probability, rather than what normal probability is associated with a particular value of X , for a normal r.v. X N ( 2 ). That is, we'd sometimes like to nd the pth percentile for a random variable X N ( 2 ) for any given values and 2 . Fact: For X N ( 2 ) the 100pth percentile of the distribution of X (xp , say) is related to zp , the 100pth percentile of the standard normal distribution, via xp = + zp : () Here, zp can be looked up in a normal table like the rst one in the handout by nding p in the body of the table, and then nding zp from the margins of the table. 108 Examples: iv. Recall that for 35{44 year old men, X =diastolic blood pressure follows a N (80 12) distribution. What is the 95th percentile of diastolic blood pressure in this population? We want x:95 . To get it, rst nd z:95 and then use the relationship given by (*). Using the rst normal table in the handout, we look up .95 in the body of the table. .95 doesn't appear there, but .94950 and .95053 do, which gives z:94950 = 1:64 and z:95053 = 1:65 Therefore, z:95 should be about half way between 1.64 and 1.65 or z:95 = 1:645: { An exact value for zp for any p can be obtained via a computer program. For example, in Minitab we follow the steps given before, but select \Inverse cumulative probability" rather than \Cumulative probability", and then set \Input constant" to p. { Using Minitab we can nd that the exact value for p = :95 is z:95 = 1:64485. Now we use the relationship (*) to get the 95th percenitle for X : x:95 = + z:95 = 80 + 1:64485(12) = 99:7 mm Hg. { Note that the table in the back of our book gives P (Z > c) rather than p = P (Z c), but since P (Z > c) = 1 ; P (Z c) = 1 ; p we can obtain zp from the table in our book by looking up 1 ; p in the body of the table. { E.g., looking up 1 ; :95 = :05 in that table, we again nd that z:95 = 1:645. 109 v. Find the 10th percentile of diastolic blood pressure among 35{44 year old males. 0.3 0.2 N(0,1) p.d.f. at Z 0.0 0.1 0.2 0.1 0.0 N(0,1) p.d.f. at Z 0.3 0.4 Standard normal p.d.f. 0.4 Standard normal p.d.f. -4 -2 0 Z value 2 4 -4 -2 0 2 4 Z value From the above picture, it is clear that z:10 = ;z:90 or, more generally, zp = ;z1;p Using the normal table in the back of our book, we look up .10 in the body of the table to give z:90 = 1:28, so z:10 = ;1:28 and x:10 = + z:10 = 80 + (;1:28)(12) = 64:6: 110 Normal Approximation to the Binomial: Recall that if X = the number of successes out of n independent, identical trials with constant success probability p, then X has a binomial distribution. We will write this as X B in(n p) Let's look at the binomial probability distribution for a particular value of p, p = :4, say, as n gets bigger. Below we plot the Bin(n p = :4) probability distribution for n = 3 n = 6 n = 9 n = 12 n = 15, and n = 18. P(X=x) 0.0 0.10 0.3 0.25 Probability distribution of X~Bin(6,0.4) 0.1 P(X=x) Probability distribution of X~Bin(3,0.4) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 5 Probability distribution of X~Bin(9,0.4) Probability distribution of X~Bin(12,0.4) 0.0 0.10 P(X=x) 0.20 0.10 0.0 P(X=x) 6 x 0.20 x 0 2 4 6 8 0 2 4 6 8 10 Probability distribution of X~Bin(15,0.4) 12 x Probability distribution of X~Bin(18,0.4) 0.10 P(X=x) 0.0 0.10 0.0 P(X=x) 0.20 x 0 5 10 15 0 5 10 x 15 x Notice that the binomial distribution looks more and more normal as n gets large! { There is one important di erence: binomial is discrete, normal is continuous. But this becomes less and less of a factor as n gets large, and, as we'll see, we can adjust for this di erence anyway. 111 So, the binomial looks more and more similar to a normal distribution as n gets large, but which normal distribution is the best approximation to the distribution of X B in(n p)? The answer is: the normal distribution with the same mean and variance as X . That is, for n large, Bin(n p) is well approximated by N (np np(1 ; p)): Example: Suppose again that 55% of UGA undergrads are women. Sup- pose I take a random sample of n = 20 undergrads. What's the probability that X =number of women in the sample turns out to be 12? Based on X B in(n p) where n = 20, p = :55, we can compute this probability exactly. Using the binomial probability function, P (X = 12) = 20 :5512(1 ; :55)20;12 = :1623: 12 Notice, though, that this is a relatively hard calculation. E.g., 20! = 2:4329 1018 . Since X B in(n p), its mean and variance are E (X ) = np = 20(:55) = 11 and var(X ) = np(1 ; p) = 4:95 So, the distribution of X should be well approximated by a N (11 4:95). Here's the actual binomial probability distribution with a N (11 4:95) superimposed: 0.10 0.05 0.0 P(X=x) 0.15 Probability distribution of X~Bin(20,0.55) 0 5 10 15 20 x With N(11,4.95) p.d.f. superimposed 112 2 2 Let Y N ( Y Y ), where Y = 11, Y = 4:95. Then the normal approximation to the binomial probability we want is P (X = 12) P (11:5 < Y < 12:5) = P 11:5 ; Y < Y ; Y < 12:5 ; Y Y Y Y p p = P 11:5 ; 11 < Z < 12:5 ; 11 4:95 4:95 = P (:22 < Z < :67) = P (Z < :67) ; P (Z < :22) = :74857 ; :58706 = :16151 which agrees with the true answer when rounded to three decimal places. Another example: to nd the binomial probability of 15 of more women in the sample, we would use the approximation P (X 15) P (Y 14:5) where Y N (11 4:95) Remember, n must be large for this approximation to work well. In fact, it will only work well if n is large and p is not too close to 0 or 1. Rule of thumb: the normal approximation to a Bin(n p) distribution can be expected to work well if np 5 and n(1 ; p) 5. 113 Sampling Distribution of the Mean* Sampling Distributions: Sample statistics, such as the sample mean, sample standard deviation, sample median, etc., are random variables. Why? Because they are computed on a random sample. Therefore, if we were to repeat the process of taking a random sample, any sample statistic (the mean, say) would vary from sample to sample, in a way that is random, because the sampling was done at random. Of course in practice, we generally only draw one random sample, but any statistic from that sample is still a random quantity. Hence, any sample statistic has { a probability distribution, { an expected value, or long run average over all the possible random samples we could possibly take, and { a population variance, or long run variance over all of the possible random samples we could take. The probability distribution of a sample statistic is called the sampling distribution of that statistic. That sampling distribution has a (population) mean, variance, standard deviation, etc. The estimated standard deviation of a statistic is called the standard error of the statistic. Right now we will focus on the sample mean, its sampling distribution, and standard error, but it is important to realize that any statistic has a sampling distribution and standard error. * Read Ch.8 of our text. 114 The sample mean: The sample mean of observations x1 x2 : : : xn has an expected value and variance that depends upon the expected value and variance of x1 x2 : : : xn . { E.g., if x1 x2 : : : xn are big, we expect the sample mean to be big. If x1 x2 : : : xn vary a lot, we would expect their mean to be highly variable too. Consider a random sample of observationsn x1 x2 : : : xn , where each xi 1P has mean and variance 2 . Let x = n i=1 xi denote the sample mean of the xi 's. We assume that the observations x1 : : : xn are independent of each other. This is typically satis ed as a consequence of random sampling. Then without knowing the probability distribution of the xi 's, we cannot make an exact statement about the entire probability distribution of x, but we can say that the sampling distribution of x | has mean , and | has variance 2 =n. True for x1 : : : xn drawn from any probability distribution with mean and variance 2 . These results makes sense: The sample mean should be centered at around the same place as the xi 's, and The sample mean should have variance that depends upon 2 , the variance of the xi 's, but which also should be smaller than the variance of the xi 's. Notice that var(x) = n2 depends on n. When the sample size is large, the sample mean has small variance. 115 If we know the full probability distribution of the xi 's then we can say more. In particular: If x1 : : : xn are each normally distributed with mean and variance 2 (i.e., if xi N ( 2 ) for each i) then x N( 2 =n): Central Limit Theorem: So, we have seen that if the xi 's have mean and variance 2 , then it is always true that E(x) = var(x) = 2 =n and if the xi 's are also normal, then x is normal too. One of the most important theoreical results in statistics, the central limit theorem allows us to go even farther: If the xi 's have mean and variance 2 , then regardless of the distribution of the xi 's, their sample mean is approximately normally distributed if the sample size n is su ciently large. I.e., x : N ( 2 =n) for large enough n or, if we standardize x (i.e., switch to Z scores): p Z = x=; n : N (0 1) for large enough n This remarkable results is the most important reason why the normal distribution plays such a key role in statistics. Among other things, the CLT allows statistical inference procedures based on sample means (e.g., we typically use the sample mean to make inferences on an unknown population mean) can be based on the normal distribution (even if the original observations are not normally distributed). 116 Example | Body Weights Although human heights are pretty close to normally distributed, weights are not. Especially in the US, the distribution of body weight is skewed right. That is, there are more very heavy people than there are very light people. Suppose that among US males, the average weight is 78 kg with a standard deviation of 13 kg, and the distribution is skewed right. Let x1 : : : Pn be a random sample of the weights of n = 30 US males and x 1 let x = 30 30 xi be the sample mean weight. i=1 Then even though the weights of individual subjects are not normally distributed, the CLT implies that x is approximately distributed as 2 132 = N (78 5:63): xN n = N 78 30 Suppose we were to take samples of size n = 30 repeatedly, and compute the sample mean each time. What proportion of those sample means would be between 2 kg of the population mean weight (between 76 and 80 kg)? We can translate this question to, What is P (76 x 80)? Without knowing the exact distribution of weight, we don't know the exact distribution of x, so we can't compute this probability exactly. However, assuming that n = 30 is large enough, x is approximately N (78 5:63), so we can approximate this probability as follows: P (76 x 80) P (76 ; x ; 80 ; ) p p p = P 76=; n x=; n 80=; n p p = P 76 ; 78 Z 80 ; 78 13= 30 13= 30 = P (;:84 Z :84) = 1 ; 2P (Z > :84) = 1 ; 2(:200) = :6 or approximately 60% of the samples will have sample means between 76 and 80 kg (within 2kg of the true mean). 117 Now what body weight cuts o the upper 5% of the sampling distribution of the sample mean, for n = 30? I.e., what is the 95th percentile of the sampling distribution of x? { This would be the weight such that, if the xi 's each have mean 78 kg and = 13 kg, we would expect to observe a sample mean weight at least this small only 5% of the time. Again, assuming that n = 30 is large enough for the CLT to hold, then the sampling distribution is approximately normal with mean 78 and standard p deviation 13= 30 = 2:373. So, p x; p Z N (0 1) ) x ; Z ( = n) =n x p + Z ( = n) Therefore, the 95th percentile of the distribution of x is related to the 95th percentile of the standard normal distribution via x:95 p + z:95( = n) = 78 + z:95 (2:373) We can get z:95 by looking up 1 ; :95 = :05 in Table A.3 in the back of our book, which yields z:95 = 1:645, so x:95 = 78 + (1:645)(2:373) = 81:9: So, the 95th percentile of the distribution of x, the sample mean weight based on a sample of size n = 30 is 81.9 kg. This means that when taking a sample of size 30 of weights that have true mean 78 and true sd 13, 95% of the time we would expect a sample mean less than 81.9 kg. Based on this result, what would you conclude if you took a sample of size 30 and found the mean to be 82.4 kg (say)? { You'd either have gotten a very unusual sample, or { the sample really didn't come from a distribution with mean 78 and standard deviation 13 in the rst place. 118 Now what weights enclose 95% of the sample means of size n = 30? That is, what are the weights (kg) such that 95% of the time we would expect to get sample mean weights between those values? 0.0 0.05 p.d.f. 0.10 0.15 Approximate probability density of sample mean based on n=30 72 74 76 78 80 82 84 Weight (kg) the N(78,5.63) distribution This translates into nding x:025 and x:975 the 2.5th and 97.5th percentiles of the weight distribution. By looking up .025 in Table A.3, we can nd that z:975 = 1:96 and z:025 = ;1:96: Therefore, x:975 and x:025 p + z:975 ( = n) = 78 + 1:96(2:373) = 82:65 p + z:025( = n) = 78 + (;1:96)(2:373) = 73:35 So if weights have population mean 78 and population sd 13, we expect the sample mean of 30 observations to fall between 73.35 and 82.56 kg about 95% of the time. { Again, if we took a single sample of size 30 and calculated a sample mean outside of this range, we'd either have observed an unusual result or we might be tempted to conclude that the weights didn't have mean 78 and sd 13 in the rst place. 119 Con dence Intervals* Another way to look at the previous calculation is that we have used the fact that P (;1:96 Z 1:96) = :95 and the CLT to infer that p P ;1:96 x=; n 1:96 :95 ) ) P P n x; ;1:96 p ;x ; 1:96 p ; n P +x + 1:96 pn ) ) P x ; 1:96 pn 1:96 pn :95 ;x + 1:96 p n :95 +x ; 1:96 pn :95 x + 1:96 pn :95 Because of the above probability statement, we say that the interval computed as p p (x ; 1:96 = n x + 1:96 = n) forms a (approximate) 95% con dence interval for . Note what is random here. is a population mean. It is a xed (unknown) constant. x is random, because it is computed on a random sample. Therefore, we are attaching a probability to where x lies, not where lies. The interpretation here is that if we were to repeat the process by which the upper and lower limits were calculated (drawing the sample, computing the sample mean, etc.) 95% of the time we would get an interval that covers the true population mean . * Read Ch.9 of our text. 120 The con dence interval that we just introduced is an example of one of the methods of statistical inference. Statistical Inference: The typical paradigm for statistical inference is that we are interested in some population characteristic or parameter: { e.g., the average cholesterol level of 40-49 year old American females, { the proportion of the US voting age population that approves of the job that the president is doing, { the population variance in the cost of a certain medical procedure at US hospitals. So, we collect a random sample representative of the population of interest, and use variables measured on the sample to infer what is true of the corresponding population parameter. There are two main aspects of statistical inference: estimation and hypothesis testing. 1. Estimation a. Point estimation. In point estimation, we simply use a sample statistic to give a numerical estimate of the corresponding population value (parameter). { E.g., sample mean to estimation population mean, sample proportion to estimate population proportion, sample sd to estimate the population sd. { Good estimates should be unbiased (on target) and have small variance (be precise). b. Interval estimation. Almost all point estimates are likely to be wrong. They may be close to the quantity being estimated, but there is almost certainly some error (hopefully small). Con dence intervals quantify the uncertainty or error in our estimate by nding an interval within which the population parameter can be expected to lie with high probability. { Hopefully, that interval is narrow, meaning we're highly con dent that there is little error in our estimate i.e., it is a precise estimate. { Con dence intervals must be interpreted carefully. 121 2. Hypothesis testing. In hypothesis testing we make a decision about the population parameter based upon what we know about the corresponding sample estimate. { E.g., we decide whether the population mean is equal to a certain value { we decide whether the population variance is equal to a certain value { we decide whether two population proportions are equal to each other, etc. { There is always the possibility that our decision will be wrong, but in statistical hypothesis testing, we know the probability that we have made the wrong decision. Hypothesis testing and con dence interval estimation are really ipsides of the same coin. That is, they are two di erent ways to look at the problem of statistical inference. { They always give compatible results, but in some cases it may be more useful to frame an inference problem in terms of interval estimation and in other cases it may be more useful to conduct hypothesis tests. Point Estimation: A statistic T is an unbiased estimator of a parameter if E(T ) = Otherwise, T is said to be biased. A statistic can always be thought of as an estimator of its expected value, or long run average. All things being equal, we would always prefer an unbiased estimator over a biased one. The precision of an estimator refers to the amount of variance in its sampling distribution. The more variance in an estimator, the more spread out its values, the less precise it is. 122 Bias and precision can be understood through the following picture: Accuracy of an estimator combines bias and precision. An accurate estimator is one that has low bias and high precision. 123 Estimation of a population mean: Suppose we have a random sample from a normal distribution. That is, let x1 : : : xn be independent random variables, each with a N ( 2 ) distribution. For now, suppose we know the value of 2 , the variance of each xi . Based on a sample of size n we wish to make inference on . 1P A natural estimate of is the sample mean x = n n=1 xi . i Why? Because its expected value if . Recall E(x) = . x is a point estimate of . Because x1 : : : xn were assumed normal, x N ( 2 =n). Even if x1 : : : xn were not normal, though, the CLT implies that x : N ( 2 =n) (approximately normal) if n is large. Precision of x: Remember, the precision of a statistic is related to that statistic's variance (the variance of its sampling distribution). The variance of x is var(x) = 2 =n, so x is more precise when the sample size is large because that makes 2 =n small and x is more precise when 2 , the variance of the original data, is small, because that also makes 2 =n small. 124 Back on p.120, we went through some calculations to show that for x computed from a random sample with mean and variance 2 x ; 1:96 pn P x ; 1:96 pn :95 This probability becomes exact if the sample is drawn from a normal distribution. Therefore, we say that p p p (x ; 1:96 = n x + 1:96 = n) = x 1:96 = n is a 95% con dence interval for . It is an exact 95% interval for samples drawn from a normal distribution, an approximate 95% interval for samples drawn from nonnormal distributions. Interpretation: If we were to { draw a random sample of size n, { compute x p { construct the interval x 1:96 = n { repeat this process many, many times, then 95% of these intervals will contain . 125 p The precision of x is re ected in the width of interval, which is 2(1:96) = n. E.g., again suppose we have a random sample x1 : : : xn of size n of the weights (kg) of US males, and suppose that E(xi ) = (unknown) and var(xi ) = 2 (known) for each i (each subject). Then here are the widths of approximate 95% con dence intervals for for di erent values of n and : Population SD ( ) 8 13 18 23 Sample Size (n) 15 30 60 8.1 5.7 4.0 13.2 9.3 6.6 18.2 12.9 9.1 23.3 16.5 11.6 120 2.9 4.7 6.4 8.2 Notice the con dence intervals get narrower (more precise) as { Sample size increases. { Population SD decreases. Notice that the 95% con dence is x z1;:05=2 pn | {z } =z:975 =1:96 How do we get a 90% interval or a 99% interval? General formula for CI for : For a random sample x1 ldots xn from a normal distribution with common mean and common known variance 2 , a 100(1 ; ) con dence interval for is given by x z1; =2 pn : This con dence interval is exact for normal distributions, approximate for non-normal distributions by the CLT. Beware that our book uses the notation zp for the value of the standard normal that cuts o 100p% in the upper tail. We use zp to denote the value that cuts o 100p% in the lower tail. 126 Example | Birthweights of SIDS Babies In 1976{77 there were 78 cases of crib death (SIDS) in King Co., WA. The average birthweight in this sample was x = 2994 g. Based on nationwide surveys of millions of deliveries, the mean birthweight in the US is 3300 g, with a standard deviation of 800 g. Suppose that this sample of n = 78 babies is a random sample from the total population of SIDS cases (it's not, but we'll assume so for illustration purposes). Find a 95% con dence interval for the population mean birthweight of SIDS cases in the US. Since we have speci ed that we want a 95% interval, 100 (1 ; )% = 95% Therefore, 1 ; =2 = and ) = z1; =2 = Thus, the 95% interval for is x z1; =2 pn = = If we assume that birthweights are normally distributed, then this is an exact 95% CI for . Otherwise, it is an approximate 95% CI for . 127 Interpretation (short version): On the basis of these data, we are 95% con dent that the population mean birthweight for SIDS infants in the US is covered by the interval ( ) It is conventional to form 95% intervals. However, that is just tradition without any theoretical basis. Sometimes we may want other con dence levels. Suppose we had wanted a 90% con dence interval for . Then 100(1 ; )% = 90% which implies that = so that and 1 ; =2 = z1; =2 = Thus the 90% interval is given by x z1; =2 pn = = Note that this interval is narrower than a 95% interval. { As the con dence level goes up, the width of the con dence interval increases as well. { Intuition: for me to be very highly con dent that my interval covers , I have to make my interval wide. 128 One-Sided Con dence Intervals: So far, we have just talked about two sided con dence intervals: con dence intervals with a lower and upper bound that straddle the population mean with some pre-speci ed probability (95%, say). In some situations, we are interested only in nding an upper bound, which will fall above the population mean with some probability. Or perhaps, a lower bound, which falls below the mean with some probability. Example | Cholesterol Level High cholesterol is considered a risk factor for heart disease. There is little concern about low cholesterol levels | basically, the lower, the better. So, we might be interested in estimating the mean cholesterol level in the normal (healthy) population, and placing an upper bound on that mean, such that we can be 95% sure that the population mean falls below that upper bound. This would be useful for deciding whether a patient with a given cholesterol level has elevated cholesterol relative to the healthy population. Suppose that the population standard deviation for cholesterol level among healthy people is known to be 25 mg/dL. Cholesterol levels are known to be somewhat skewed right. Suppose that a random sample of 28 normal adults was obtained and the sample mean cholesterol level was 168.3 mg/dL. Obtain a 95% upper bound on , the mean cholesterol level in the healthy population. 129 Answer: If we assume that the sample size is large enough for the CLT to hold, then x:N 2 n: or, if we switch to Z scores, this statement is equivalent to x ; : N (0 1): =pn We know that P (Z ;1:645) = :95 because ;1:645 = z:05 , the 5th percentile of the standard normal distribution. Therefore, :95 = P (Z ;1:645) p P x=; n ;1:645 ; p =P x; ;1:645 = n ; p = P ; ;x ; 1:645 = n ; x + 1:645 =pn =P That is, p P ( x + 1:645 = n) :95 () p so that x + 1:645 = n is a 95% upper con dence bound for . Note that if cholesterol levels had been normal to begin with, then (*) would have been an exact equality, and our con dence bound would have been an exact 95% bound. Since cholesterol level was non-normal, we used the CLT to establish the approximate relationship given by (*) and our bound is an approximate 95% bound. 130 So, in the example, the upper bound is given by p p x + 1:645 = n = 168:3 + 1:645(25)= 24 = 176:7 So, we can be 95% con dent that the population mean cholesterol level for healthy adults falls below 176.7. The general formula for a 100(1 ; )% upper con dence bound on based on a sample of size n from a population with standard deviation is p x + z1; s:e:(x) = x + z1; = n: A 100(1 ; )% lower con dence bound on is given by p x ; z1; s:e:(x) = x ; z1; = n: The case when is unknown: To this point, we have assumed that we know the population sd . Occasionally, this may be the case, but typically, is unknown and must be estimated from the data just as must. What should we expect this to do to our con dence intervals? Well if we have to estimate an additional parameter , one should expect that that would introduce additional uncertainty, and make our con dence intervals wider. As we'll see, this intuition is correct. 131 Student's t distribution: In the case when was known, we based our con dence interval for on the fact that for a random sample from a N ( 2 ) distribution, the sample mean follows a normal distribution: p Z = x=; n N (0 1) (y) (This result is only approximately true for a sample from a nonnormal distribution when n is su ciently large.) p That is, in the known case, we considered the distribution of x=; n to derive a con dence interval for In the unknown case, therefore, a natural starting point is to consider the distribution of t = x=; n sp where we've replaced by its sample estimate, the sample standard deviation s. This makes sense, but once we replace by s, this quantity no longer follows a standard normal distribution. In fact, it can be shown that t follows a distribution that looks like the normal, but is more spread out. That distribution is called Student's t distribution. This distribution is named after Student, which is the pseudonym of the author who discovered it. 132 Student's t distribution is more spread out because having to estimate introduces additional uncertainty (error) and makes t a more variable quantity than Z . How much more variable? That depends upon how precise s is as an estimate of , which is determined by the sample size n, or equivalently, by the divisor n ; 1 in the formula for s, which is called the degrees of freedom of the t distribution. That is, there is a distinct t distribution for every possible value of the degrees of freedom n ; 1. { I.e., the t distribution is a parametric distribution with parameter n ; 1, called its degrees of freedom. As n grows, s becomes a better estimate of , and the t distribution get's less spread out relative to the normal. Here are t distributions for degrees of freedom equal to 3 6 9 relative to a standard normal distribution. 0.4 Standard normal and t distributions (p.d.f.s) 0.2 0.1 0.0 p.d.f. of x 0.3 N(0,1) t(3) t(6) t(9) -4 -2 0 2 4 x Vertical lines are 97.5th percentiles We denote the t distribution with d degrees of freedom by t(d). Here we have the t(3), t(6), and t(9) degrees of freedom as well as the N (0 1). Notice that the spread in the t distribution decreases with the degees of freedom n ; 1. 133 In fact, if n ; 1 is large enough, the t(n ; 1) and N (0 1) become almost indistinguishable. Here is the t(30) compared to the N (0 1): 0.4 Standard normal and t(30) distribution (p.d.f.s) 0.2 0.0 0.1 p.d.f. of x 0.3 N(0,1) t(30) -4 -2 0 2 4 x Vertical lines are 97.5th percentiles On these plots we've also plotted vertical lines for z:975 = 1:96, the 97.5th percentile of the standard normal distribution as well as, the corresponding 97.5th percentiles for the t distributions: t:975(3) t:975 (6), t:975(9), and, in the second plot, t:975 (30). Recall z:975 = 1:96 was the multiplier for obtaining a 95% CI for in the known case. In that case the 95% CI for was given by x z:{z } s:e:(x) = x 1:96 pn | 975 =1:96 In the unknown case, a 95% CI for based on a sample of size n is given by s x t:975 (n ; 1)s:e:(x) = x t:975 (n ; 1) pn Thus, in the unknown case, our multipler chages from z:975 to t:975(n ; 1). { As we can see from the plots, t:975 (n ; 1) is a bigger number, especially when n ; 1 is small, so we get a wider interval. 134 Note that for degrees of freedom n ; 1 = 30, the z and t multipliers are very close. This observation has led to the often given rule of thumb that for n ; 1 30 we can use the z multiplier in place of the t to form a con dence interval for even when is unknown. { This replacement introduces some error in the calculation, but not much, especially for n ; 1 much larger than 30. General Formula: In general, for a sample of size n drawn from a normally distributed population with mean and variance 2 , a 100(1 ; )% CI for is given by p x t1; =2(n ; 1)s:e:(x) = x t1; =2(n ; 1)s= n This interval is approximately correct even if the sample is drawn from a non-normal population, as long as the sample size is large. Example { Lead Content in Boston Drinking Water Recall the following data on the lead content (mg/liter) in 12 samples of drinking water in the city of Boston, MA. :035 :060 :055 :035 :031 :039 :038 :049 :073 :047 :031 :016 Assuming that lead content is normally distributed, form a 90% CI for the mean lead content in Boston drinking water. Answer: In this case, we do not know the population mean or population standard deviation, so they must be estimated from the sample data: n 1 X = 1 (:035 + x= n i=1 n v u u s=t 1 f( n X n ; 1 i=1 + :016) = :0424 1 x2 ) ; nx2g = 11 f:0352 + i The standard error of x is + :0162 ) ; 12(:04242 )g = :0153 0153 s s:e:(x) = pn = :p = :00441 12 135 Here we want a 90% interval, so 100(1 ; ) = 90, or = and 1 ; =2 = Since the population sd is unknown, we must use the t distribution to form our interval. The sample size is n = 12, so the appropriate degrees of freedom is n ; 1 = 12 ; 1 = 11. Going to the back of our book, Table A.4, we nd t1; =2(n ; 1) = t (11) = Therefore, a 90% CI for is given by x t1; =2 (n ; 1)s:e:(x) = :0424 ( =( ) )(:00441) We are 90% con dent that the mean lead content in Boston drinking water lies between and . Note that this is an exact 90% interval because the sample was drawn from a normal population. However, this would be an approximate 90% CI if lead content was not normally distributed, provided that the sample size was large enough for the CLT to hold. How large does the sample size have to be for the CLT to hold? Tough question. It depends upon how close to normal the population was that we sampled from. { If we drew a sample from a very non-normal population (highly skewed and/or highly discrete) then it requires a larger sample size in order for sample means from that population to follow a normal sampling distribution. { If the population we drew from to begin with is nearly normal, though, a much smaller sample size may su ce. The sample size necessary for the CLT to hold can be quite small in cases | as small as n = 5 sometimes | but to be safe, we generally need samples of size 25 or more to be fairly con dent that normal distribution will provide a good approximation to the sampling distribution of x. 136 Hypothesis Testing* The other main aspect of statistical inference (besides point and interval estimation) is hypothesis testing. In hypothesis testing we make a decision about the true state of the population based upon what we know concerning the sample. This decision is guided by probability. A good metaphor for the approach used in statistical hypothesis testing is the American legal system. \Innocent until proven guilty" means | we assume innocence | we collect and examine evidence to contradict innocence | if evidence is strongly againsth innocence (beyond a reasonable doubt) we reject innocence and conclude the alternative, guilt. | if not, we haven't proven innocence, only failed to proved guilt and assumption of innocence is maintained. The prosecutor's hypothesis is that the defendant is guilty, so he/she assumes the oppositive and tries to disprove it. In statistical hypothesis testing, the researcher plays the role of the prosecutor. His/her research hypothesis is \guilt" so he/she assumes the opposite, which is called the null hypothesis, and is typically represented as H0 (\H naught"). For example, H0 : defendant is innocent or H0 : no association between obesity and diabetes * Read Ch.10 of our text. 137 The hypothesis that the researcher is trying to prove is called the alternative hypothesis, denoted HA , or sometimes, H1 . For example, HA : defendant is not innocent (guilty) or HA : there is an association between obesity and diabetes The alternative hypothesis is always framed in such a way that it is the only other possibility under consideration, if the null hypothesis is not true. That is, the alternative hypothesis is HA : not H0 Typically, we are interested in the true state of nature in the population operationalized in terms of the true value of some parameter, or parameters. The simplest case: we want to test a hypothesis about a population mean. Example | Birthweights of SIDS Cases Based on nationwide surveys of millions of deliveries, the mean birthweight in the US is 3300 g, with a standard deviation of 800 g. We want to investigate whether the population mean birthweight of SIDS cases is di erent from that of the general population. Recall that in 1976{77, there were 78 SIDS cases in King County, WA. The sample mean birthweight among the King Co. cases was x = 2994 g. We will assume that these cases are a random sample from the population of SIDS cases nationwide (strong, questionable assumption). We will also assume that SIDS birthweights are normally distributed, with population sd = 800, the same as in the general population. Is the mean birthweight among SIDS cases di erent than in the general population? 138 Let be the population mean birthweight among SIDS cases in the US. The null hypothesis is what we want to disprove. In this case, then, our null hypothesis is H0 : = 3300g The value that we assume for under the null hypothesis is called the null value for and is denoted as 0 . That is, our null hypothesis is of the form H0 : = where 0 0 = 3300 g How about the alternative hypothesis? Here, there are three possibilities for the truth: < 0 = 0 or > 0 In a one-sided alternative hypothesis situation, the researcher/analyst makes an a priori assumption and dismisses either < 0 or > 0 as out of the realm of possibility. In the SIDS example, the researcher may be willing to assume a priori that there is no possible way that the mean SIDS birthweight could be greater than in the general population. In that case HA : not H0 becomes HA : < where 0 0 = 3300 g In a two-sided alternative hypothesis situation, the researcher/analyst makes no such a priori assumption, so that the alternative hypothesis becomes HA : 6= 0 where 0 = 3300 g We will concentrate on one-sided alternatives rst, and then discuss how things change when we instead use a two-sided alternative. 139 Type I and II Errors In performing a hypothesis test there are two possible states of nature and two possible conclusions that can be made: State of Nature H0 is true H0 is false Conclusion Fail to Reject H0 Correct Type II Error Reject H0 Type I Error Correct We can make errors in two ways: I We can incorrectly reject H0 | A Type I Error II We can incorrectly fail to reject H0 | A Type II Error Ideally, we rarely make errors of wither type. Let = P (we make a Type I error) = P (we make a Type II error) We would like to simultaneouly minimize both and . However, the only one of these that we have complete control over is . depends upon how false the null hypothesis is. { Why? Because if the true value of is far from 0, its a lot easier to reject H0 : = 0 , than if is closer to 0 . So, we construct our test in such a way that is small. 140 Example | Birthweights of SIDS Cases (Continued) Suppose that we are interested in the one-sided alternative, so we want to test H0 : = 0 versus HA : < 0 where 0 = 3300 g That is, we're willing to dismiss the possibility that SIDS cases might have birthweights greater than the general population. Given that we don't know , how do we decide in favor of H0 or HA ? Answer: we look at how much smaller x is than 0 . If x is much smaller than 0 = 3300 then there's strong evidence against H0 and we reject H0 in favor of HA . { Suppose x had been 1100 g. That seems very far from 0 = 3300, so we would have little trouble concluding in favor of HA . { However, what if x had been 3250 g? That's smaller than 0 = 3300, but is it small enough enough to conclude that < 3300? How much smaller than 3300 must x be before we're willing to conclude that < 3300? In the legal system the evidence against the null hypothesis of innocence must be \beyond a reasonable doubt." In hypothesis testing, \beyond a reasonable doubt" is , the probability that we reject H0 when it is really true (the probability of convicting an innocent person). We set this probability low, to some pre-speci ed level called the signi cance level of the test. { The conventional choice for the signi cance level is = :05, but this is just convention. Other values such as = :1 or = :01 are also sometimes used. How do set the signi cance level low? By requiring x to be smaller than 0 by enough so that such an extreme value would be unlikely, if the null hypothesis were true. 141 So, we look at how unlikely it is, given that the null hypothesis is true, to have observed an x at least as far from 0 as the one we got. This probability, the probability of obtaining a result at least as unlikely as the one obtained, given that the null hypothesis is true, is called the p-value of the test. Then if the p-value is small enough (smaller than the pre-speci ed signi cance level ), then we reject H0 . In the SIDS example, suppose we decide to test H0 : = 3300 using signi cance level = :05. That is, we are going to require that x be fairly unusually small, something that occurs 5% of the time, assuming that the null hypothesis is true, before we decide that the null hypothesis isn't really true. Recall that in our sample of n = 78 cases, the sample mean was x = 2994 g. This is less than 0 = 3300, but how unlikely is it to get a sample mean that's as small as 2994 given that the population mean is = 3300 (given that H0 is true)? That is, what's the p-value here? 142 Since we assumed that SIDS birthweights are N ( then based on a sample size of n = 78, xN 2 ), where = 800, 8002 n = N 3300 78 2 assuming that H0 : = 3300 is true. Therefore, our p-value is p = P (x 2994) = P (x ; 2994 ; ) ; p = P x=; n 2994pn = ; = P Z 2994p 800= 78 p = P Z 2994 ; 3300 assuming H0 : = 3300 is true 800= 78 = P (Z ;3:38) = P (Z 3:38) = :00036 So, under the null hypothesis, we would expect to get a sample mean at least as small as the one we got with probability .00036 (or only .036% of the time). Therefore, either H0 : = 3300 is true and we've observed a very unusual event, or H0 : = 3300 is not true. Since, p = :00036 is less than our chosen signi cance level of = :05, its such an unusual event under H0 , that we're willing to reject H0 : = 3300 in favor of HA : < 3300. 143 Steps in a statistical hypothesis test: 1. State the research question in terms of the null and alternative hypotheses { In the previous SIDS example, H0 : = 0 versus HA : < 0 , where 0 = 3300 g. 2. Specify the signi cance level. { In the SIDS example, we used = :05. 3. Choose an appropriate test statistic. { In the SIDS example, since we're testing a hypothesis on the population mean , we based our test on the sample mean x. { Speci cally, however, we looked at how much smaller x was p than 0 relative to the standard error of x, = n. That is, we looked at the test statistic: ;0 z = x =pn : 4. Collect the data and compute the necessary sample statistics and test statistic. { We collected the data and computed x and then the test statistic z , which turned out to be z = ;3:38. 5. Calculate the p-value, compare it to the signi cance level , and state the conclusion. It is good practice to report not only the result of the test (reject, fail to reject) but also the numeric value of the test statistic and the numeric p-value. { We found that p = :00036, so we rejected H0 : = 3300 in favor of < 3300. Our conclusion was that the population mean birthweight for SIDS cases is less than 3300 g, the mean birthweight in the general population (z = ;3:38, p = :00036). 144 We have emphasized the p-value approach to making the decision whether to reject, or fail to reject the null hypothesis. Compute the p-value and reject if p < . This is the preferred method of conducting the test, but you should be aware that there is another, equivalent approach for making our conclusion known as the critical value approach. To understand the critical value approach, think back to our SIDS example. There, we observed x = 2994, which was low relative to the null value of 0 = 3300. ;0 This led to a test statistic of z = x=pn = ;3:38. Notice that if x had been closer to 0 , then { the test statistic would have been closer to 0, { and the p-value would have been larger. E.g., if x had been 3250, say, then the test statistic would have been ;0 p z = x =pn = 3250 ; 3300 = ;:55 800= 78 which has p-value p = P (Z < ;:55) = :291, which is > = :05 and we would not have rejected H0 . Thus di erent values of x lead to di erent test statistics. The rejection region of a test, is the set of values of the test statistic which lead to rejection of H0 . Equivalently, the set of values that lead to p values less than . For a given level , the critical value of a test statistic is the boundary of the rejection region. I.e., it is the value of the test statistic which is just barely large enough in magnitude to lead to rejection of H0 at a given signi cance level . 145 In the SIDS example, and in general for testing H0 : = 0 versus a onesided alternative for a normal sample with known sd , our test statistic is ;0 z = x =pn distributed as N (0 1) under H0 0.4 Below is a picture of the distribution of this statistic under H0 : 0.2 0.0 0.1 p.d.f. of z 0.3 z_.05, the 5th %’ile of N(0,1) z = -3.38, value of our test stat -4 -2 0 2 4 value of z, the test statistic The solid vertical line is z:05 = ;1:645, the 5th percentile of the N (0 1) distribution. That is, the area under the curve to the left of that line is .05. Therefore, for an = :05-level test, if our test statistic had turned out to < ;1:645, then we would reject H0 if it had been > ;1:645 then we would have failed to reject H0 . Thus, (;1 ;1:645) is the rejection region of our test, (;1:645 +1) is the acceptance region, and -1.645 is the critical value because it is the boundary of the rejection region. Thus, instead of computing the p-value of our observed test statistic z = ;3:38 and comparing it to = :05, we could instead have compared z = ;3:38 to the critical value z:05 = ;1:645. Since z = ;3:38 < z:05 = ;1:645, we reject H0 . 146 What if our alternative hypothesis had been HA : > 0 rather than HA : < 0 ? In that case, we would have been looking for large values of our test statistic ;0 z = x=pn . In particular, we would have rejected H0 in favor of HA : > ;0 z = x =pn > z:95 = 1:645 0 if Notice that for either direction of the one-sided alternative we rejected H0 if jzj > z:95 = 1:645. General method for an -level test of H0 : = 0 versus a one-sided alternative based on a sample of size n from the N ( 2 ) distribution when 2 is known: Critical value approach: reject H0 if x; 0 is consistent with the alternative hypothesis and x; 0 >z : jz j = 1; =pn Otherwise, we fail to reject. p-value approach: reject H0 if p < . Let Z denote a N (0 1) random variable, and z the value of our test statistic. The p-value is computed as P (Z < z) f the alternative is H < p = P (Z > z) iif the alternative is HA :: > 0,, A 0 147 The Case When is Unknown: If is unknown, the logic of testing H0 : = 0 doesn't change at all. ;0 However, the test statistic z = x=pn is no longer available to us, because is unknown. Instead, we do the obvious thing and replace byits sample estimate, s. That substitution changes our test statistic from z to t, where t = x ; n0 s=p and s is the sample standard deviation. Of course, this also changes the distribution of our test statistic. While z N (0 1) under H0 , t t(n ; 1) under H0 . This a ects how we compute the p-value and critical value for our test, but not the basic logic of the testing procedure or the steps taken to implement the test. Myocardial Infarction (Heart Attack) A topic of recent clinical interest is the possibility of using drugs to reduce the size of the infarct (area of tissue death due to loss of blood ow) who have had a myocardial infarction within the last 24 hours. Suppose we know that in untreated patients, the mean infarct size is 25 (ck-g-EQ=m2 ). Furthermore, in 8 patients treated with a drug, the sample mean infarct size was 16 with a sample standard deviation of s = 10. Do the treated patients have smaller than average infarct size? Let = population mean infarct size for patients treated with the drug. Then our hypotheses that we'd like to test are H0 : = 0 versus HA : < Suppose we want an = :05-level test. 148 0 where 0 = 25. The logic here remains the same as before. Since we're interest in a population mean , we examine the sample mean x. Speci cally, we calculate the p-value: the probability of observing a sample mean at least as extreme (as small, in this case) as the one we got (16), under the null hypothesis that = 25: p = P (x 16) = P (x ; 1 0 B x ; 16 ; C = P B s=pn s=pn C B C @| {z } A t(n;1) = P t(n ; 1) 16 ; ) 16 p ; s= n 0 1 B Bt(n ; 1) 16 ; 0 C C pC =PB B s={z n }C @ A | assuming H0 : = 0 is true = t, our test stat p = P t(n ; 1) 16 ; 25 10= 8 = P (t(n ; 1) ;2:55) = :0191 Here, P (t(n ; 1) ;2:55) = :0191 was computed in Minitab. Conclusion: since p = :0191 < = :05, we reject H0 : = 25 and conclude that the mean infarct size among treated patients is smaller than the average infarct size of untreated patients. 149 The basic steps of hypothesis testing haven't changed: 1. State the research question in terms of the null and alternative hypotheses { H0 : = 0 versus HA : < 0 , where 0 = 25. 2. Specify the signi cance level. { We used = :05. 3. Choose an appropriate test statistic. { We based our test on the sample mean x and formed a test statistic equal to t = x ; n0 : s=p 4. Collect the data and compute the necessary sample statistics and test statistic. { We collected the data and computed x = 16 and then the test statistic t, which turned out to be t = ;2:55. 5. Calculate the p-value, compare it to the signi cance level , and state the conclusion. { We found that p = :0191 < = :05, so we rejected H0 : = 25 in favor of < 25. 150 0.4 0 Below is a picture of the distribution of our test statistic t = x; n s=p under H0 : = 0 : 0.2 0.0 0.1 p.d.f. of t 0.3 t_.05(7), the 5th %’ile of t(7) t = -2.55, value of our test stat -4 -2 0 2 4 value of t, the test statistic In the critical value approach, instead of comparing p to .05, we would compare our test statistic t to the critical value t:05 (7), the 5th percentile of the t distribution on n ; 1 = 7 degrees of freedom. Equivalently, we can compare jtj = j ; 2:55j = 2:55 to t:95 (7) which is just -1 times t:05 (7). From Table A.4 in the back of our book, t:95 (7) = 1:895, so since jtj = 2:55 > 1:895, we reject H0 at level = :05. 151 General method for an -level test of H0 : = 0 versus a one-sided alternative based on a sample of size n from the N ( 2 ) distribution when 2 is unknown: Critical value approach: reject H0 if x; 0 is consistent with the alternative hypothesis and x ; 0 > t (n ; 1): jtj = 1; s=pn Otherwise, we fail to reject. p-value approach: reject H0 if p < . Let t(n ; 1) denote a random variable with this distribution, and t the value of our test statistic. The p-value is computed as is H < p = P (t(n ; 1) < t) iif the alternative is HA :: > 0,, P (t(n ; 1) > t) f the alternative A 0 Two-sided Alternatives: Often, we are not willing to dismiss either > 0 or < 0 , the two possible alternatives to = 0 . In such cases, the appropriate set of hypotheses to test are H0 : = 0 versus 6= 0 How does this a ect our testing procedure? Again, the answer is that it doesn't really change the logic or the steps in the procedure, it just changes how we compute the p-value or critical value. 152 Example | Serum Cholesterol of Asians vs. Americans Suppose we want to compare the mean serum-cholesterol level among recent Asian immigrants to the US with the population mean in the US. Suppose we assume that cholesterol levels in women aged 21-40 years in the US are normally distributed with population mean 190 mg/dL and population sd 40 mg/dL. Suppose that we take a random sample of n = 100 recent Asian immigrant women in this age range, and measure cholesterol level on these subjects. The average cholesterol level in this sample was x = 181:52 mg/dL and we are willing to assume the population SD among these Asian immigrants is = 40, the same as it is among Americans. Is the mean cholesterol level among recent Asian immigrant women the same as that of the coresponding general US population? Steps for conducting hypothesis test to address this question: 1. State the research question in terms of the null and alternative hypotheses { Let = the mean cholesterol level among the Asian population. Then our hypotheses are H0 : = 0 versus HA : 6= 0 where 0 = 190: { This is a two-sided alternative situation because if the Asians di er from the general US population, we can't be sure that their cholesterol levels will be lower or higher. 2. Specify the signi cance level. { We'll stick with = :05 for now. 153 3. Choose an appropriate test statistic. { Since we're interested in and whether it di ers from 0, it still makes sense to examine x and how far it di ers from 0 . { In addition, we know here. { Therefore, it still makes sense to base inference on the test statistic ; z = x =pn0 4. Collect the data and compute the necessary sample statistics and test statistic. { We collected the data and computed x = 181:52. The test statistic is computed as ; p z = x =pn0 = 181:52 ; 190 = ;2:12: 40= 100 5. Calculate the p-value, compare it to the signi cance level , and state the conclusion. { Here's where things di er from the one-sided alternative case. 154 0.4 { The p-value is the probability of getting a result at least as extreme as the one that we obtained. That is, the probability of a result which provides evidence at least as strong against the null hypothesis (in favor of the alternative). Picture: 0.2 0.1 0.0 p.d.f. of z 0.3 z=-2.12, value of our test stat z=2.12, a test stat equally in favor of H_A -4 -2 0 2 4 value of z, the test statistic { In the picture above, our test statistic is t = ;2:12. Notice, that any value of the test statistic ;2:12 and any value of the test statistic 2:12 would provide at least as much evidence in favor of HA : 6= 0 . { Therefore, the p-value here is computed as p = P (Z ;2:12)+P (Z 2:12) = 2P (Z 2:12) = 2(:017) = :034 { Notice that this is exactly twice as large as the p-value we would have obtained for a one-sided laternative HA : < 0 . Since p = :034 < = :05, we reject H0 and conclude that recent Asian immigrant women between the ages of 21 and 40 years have di erent (in this case lower) mean cholesterol level that the corresponding US population (z = ;2:12, p = :034). 155 0.4 To understand how the critical value approach di ers in the two-sided alternative case, a picture is again helpful: 0.2 0.0 0.1 p.d.f. of z 0.3 z=-2.12, value of our test stat z_.025-=-1.96, lower boundary of acceptance region z_.975=1.96, upper boundary of acceptance region -4 -2 0 2 4 value of z, the test statistic The solid line at z:025 = ;1:96 is the value such that 2.5% of the area under the curve falls to the left of that line. Since the p-value in a two-sided alternative situation is twice the probability in one-tail, a value of the test statistic equal to z:025 = ;1:96 would have had a p-value of p = 2(:025) = :05. Similarly, if the value of the test statistic had been equal to z:975 the p-value would also have been p = 2(:025) = :05. Therefore, the rejection region of our test would include all values of the test statistic z:025 = ;1:96 and all values z:975 = 1:96. Thus, there are two boundaries of the rejection region and hence two critical values: z:025 = ;1:96 and z:975 = 1:96. So, based on the critical value approach, we would reject H0 if our test statistic z < z:025 or if z > z:975 . Equivalently, we reject H0 at level = :05 if jz j > z:975 = z1;:025 = z1;:05=2: 156 General method for an -level test of H0 : = 0 versus a two-sided alternative HA : 6= 0 based on a sample of size n from the N ( 2 ) distribution when 2 is known: Critical value approach: reject H0 if jz j = x; 0 >z 1; =2 : =pn Otherwise, we fail to reject. p-value approach: reject H0 if p < . Let Z denote a N (0 1) random variable, and z the value of our test statistic. The p-value is computed as p = 2P (Z > jzj) General method for an -level test of H0 : = 0 versus a two-sided alternative HA : 6= 0 based on a sample of size n from the N ( 2 ) distribution when 2 is unknown: Critical value approach: reject H0 if jtj = x; 0 >t 1; =2 (n ; 1): s=pn Otherwise, we fail to reject. p-value approach: reject H0 if p < . Let t(n ; 1) denote a random variable distributed at t(n ; 1), and t the value of our test statistic. The p-value is computed as p = 2P (t(n ; 1) > jtj) 157 Example | Serum-Creatinine The mean serum-creatinine level measured in 12 patients 24 hours after they received a newly proposed antibiotic was 1.2 mg/dL. The sample sd was 0.6 mg/dL. Suppose that it is known that the general population has a mean serum-creatinine of 1.0 mg/dL. Does the population mean serum-creatinine level among patients treated with the antibiotic di er from that of the general population? We assume that serum-creatinine in the population of interest is normally distributed with mean and unknown variance. We also assume that the 12 patients are randomly sampled from the population of interest (all patients given this antibiotic). 1. H0 : = 0 versus HA : 6= 0 where 0 = 1:0 2. For variety's sake, let's test at = :01 for a change. 0 3. We test based on t = x; n . s=p { We reject H0 at level if jtj > t1; =2 (n ; 1). { Or, equivalently, we reject H0 at level if p < . 4. x = 1:2, s = 0:6, so our test statistic is p t = x ; n0 = 1:2 ; 1:0 = 1:15 s=p 0:6= 12 5. The p-value is p = 2P (t(n ; 1) > jtj) = 2P (t(11) > 1:15) = 2(:1373) = :2746 Since p > = :01 we fail to reject H0 . 158 5 Equivalently, we could compare jtj to the critical value t1; =2(n ; 1) = t1;:01=2(11) = t:995(11) = 3:106 Since jtj = 1:15 < t:995 (11) = 3:106, we fail to reject H0 . Conclusion: There is insu cient evidence to conclude that the mean serum-creatinine level among patients treated with the antibiotic di ers from the mean serum-creatinine in the general population. Power and Sample Size Recall from our discussion of error types when conducting a statistical hypothesis test that = P (we make a Type I error) = P (reject H0 when H0 is true) = P (we make a Type II error) = P (not reject H0 when H0 is false) We construct our test in such a way to ensure that is equal to some prespeci ed small value (e.g., = :05). We constructed our test to control to be small. We'd like to be small too, but we noted that depends upon \how false" the null hypothesis is. 159 Example | Birthweights of SIDS Cases Recall that we had a sample of n birthweights of SIDS babies with a sample mean of x = 2994. We assumed that = 800 and we used the 1-sample z test to test H0 : = versus HA : < 0 where 0 0 = 3300. Picture: 0.0020 Sample size is n=15 here 0.0020 Sample size is n=15 here 0.0010 pdf of xbar 0.0015 True p.d.f. of xbar True mean, mu=2850 Null value, mu0=3300 0.0 0.0005 0.0010 0.0005 0.0 pdf of xbar 0.0015 True p.d.f. of xbar True mean, mu=3250 Null value, mu0=3300 2000 2500 3000 3500 4000 2000 xbar 2500 3000 3500 4000 xbar Here, we've assumed that the true value of , the mean birthweight of SIDS cases is = 3250 on the left and = 2850 on the right. We've also assumed a sample size of n = 15, so that the true distribution of x is x N( 2 =n) = N ( 8002 =15) = N ( 42666:67) Clearly, , is smaller when is far from 160 0 (in the plot on the right). is the probability of failing to reject H0 when it is false. That is, failing to detect the truth of HA . (In the current context, is the probability of failing to detect a true di erence between and 0 ). Often it is more convenient to think in terms of the probability of detecting the truth of HA (detecting a di erence between and 0). This probability is called the power of the test and it is simply power = P (rejecting H0 jH0 is false) = 1 ; P (not rejecting H0 jH0 is false) = 1 ; Thus, the further is from 0 , the smaller is and the larger the power is. { It is easier to reject H0 (power is high) when H0 is \very false" (plot on the right) than when H0 is only slightly false (plot on the left). As we noted, though, we can't control how false H0 is, because we can't control the true population mean . However, power also depends upon the spread in the distribution of x. Suppose that instead of the picture on the previous page, we had less spread in the distribution of x: Sample size is n=78 here 0.004 0.004 Sample size is n=78 here pdf of xbar 0.002 0.003 True p.d.f. of xbar True mean, mu=2850 Null value, mu0=3300 0.0 0.001 0.002 0.001 0.0 pdf of xbar 0.003 True p.d.f. of xbar True mean, mu=3250 Null value, mu0=3300 2000 2500 3000 xbar 3500 4000 2000 2500 3000 3500 xbar 161 4000 Clearly, now it is easier to reject H0 in both cases. This is because the spread in the distribution of x has decreased: x N( 2 =n) = N ( 8002 =78) = N ( 8205:13) That is, the less spread in the distribution of x, the greater the power. The spread in the distribution of x is quanti ed by var(x) = 2 =n. So, this spread depends on { 2 (Power increases as 2 decreases.) { n, the sample size. (Power increases as n increases.) Note that we can't control 2 , but we can control n, the sample size, when we design the study. So, power and sample size are intimately related. A given sample size implies a certain power, and a certain power implies a certain sample size. Typically, at the design stage of a study, the speci c hypothesis test that will be used to analyze the study is identi ed, and then the minimum sample size is determined so as to achieve a prespeci ed desired level of power. { Typically, it is desirable to have power of 80% or higher. Otherwise, there's a pretty good chance (20%) that we won't be able to detect the di erence (e ect) we are interested in even if it's real, which makes the study not worth doing. 162 Of course, power depends upon a variety of other factors besides sample size. It depends on i. Sample size { The larger the sample size, the greater the power. { Can be controlled in the design of the study. ii. the true di erence we are trying to detect (how false H0 is, or the true di erence ; 0 ). { Bigger di erences are easier to detect (result in higher power). { Unknown, so must be assumed. iii. the population SD . { The less variable the population is (smaller ), the easier it is to detect e ects (easier to to detect a signal when there's not much noise (static)). iv. , the signi cance level. { Similar to senstivity and speci city in diagnostic testing, there's a trade-o between and (and hence between and power). { Decreasing makes it harder to reject H0 , which decreases power (increases ). 163 To understand the trade-o betwen and , recall that in the one-sample z test of H0 : = 0 versus HA : < 0 we reject H0 if p ;0 z = x =pn < z = ;z1; or, equivalently, if x < 0 ; z1; = n Consider the following picture: n=78, sigma=800, and alpha=.001 here 0.004 0.004 n=78, sigma=800, and alpha=.05 here pdf of xbar 0.002 0.003 True p.d.f. of xbar True mean, mu=2850 Null value, mu0=3300 xbar-z_(1-alpha)*sigma/sqrt(n), alpha=.001 0.0 0.001 0.002 0.001 0.0 pdf of xbar 0.003 True p.d.f. of xbar True mean, mu=2850 Null value, mu0=3300 xbar-z_(1-alpha)*sigma/sqrt(n), alpha=.05 2000 2500 3000 xbar 3500 4000 2000 2500 3000 3500 4000 xbar In both pictures, the true mean is = 2850, the true population SD is = 800, and the sample size is n = 78. In the picture on the left we are testing at = :05, and on the right we are testing at = :001. { Note that decreasing makes it harder to reject H0 : = 0 = 3300, so we need to observe a value of x which is more inconsistent with the null hypothesis. That is, we need to observe a smaller x to reject H0 if is small. { In the plot on the left, = :05 so we reject if 800 x < 0;z1;:05 pn = 3300;1:645 p = 3151:01 = dashed line 78 and on the right = :001, so we reject if 800 x < 0;z1;:001 pn = 3300;3:090 p = 3020:08 = dashed line 78 164 If the true population mean is = 2850, then the bell-shaped curve in the pictures is the true p.d.f. of x. { The area under that curve to the right of the dashed line is , the probability of getting a value of x that would lead us to fail to reject H0 even though it is false. So, as decreases, increases, and hence the power decreases too. Example | Determining Power for A Proposed Study A new drug is proposed for people with high intraocular pressure (IOP), to prevent the development of glaucoma. A pilot study was conducted with the drug among 10 patients and their mean IOP decreased by 5 mm Hg with a SD of 10 mm Hg after 1 month of using the drug. The investigators propose to study n = 50 patients in the main study. What would the power of such a study be to detect a reduction of 5 mm Hg after 1 month of use of the drug? For now, we will assume that the true population SD is known to be 10 as obtained in the pilot study. We will also assume that the test to be used will be an = :05-level z test of H0 : = 0 with a one-sided alternative HA : < 0 . { Here 0 is the population mean IOP among untreated subjects. Of course, this null value is known. The power is given by power = P (reject H0 given that H0 is false and ; 0 = ;5) ;0 = P x =pn < ;z1; 0 = + 5 p p = P x ;=p; 5 < ;z1; = P x=; n ; =5 n < ;z1; n p p = P x=; n < ;z1; + =5 n p! p 50 = P Z < ;z1; + 5 n = P Z < ;1:645 + 5 10 = P (Z < 1:89) = 1 ; P (Z 1:89) = 1 ; :029 = :971 165 What if we had used a two-sided alternative? In the IOP example suppose instead that we wished to test H0 : = 0 versus HA : 6= 0 In this case, we would reject H0 if x; 0 >z 1; =2 =pn or, equivalently, if x ; 0 < ;z x; 0 >z p 1; =2 or if =n =pn 1; =2 Thus, the power is given by power = P (reject H0 given that H0 is false and ; 0 = ;5) ;0 ;0 = P x =pn < ;z1; =2 0 = + 5 + P x =pn > z1; =2 0 = + 5 = P x ;=p; 5 < ;z1; =2 + P x ;=p; 5 > z1; =2 n n p p p p = P x=; n ; =5 n < ;z1; =2 + P x=; n ; =5 n > z1; =2 p p 5 n +P Z >z 5n = P Z < ;z + + 1; =2 1; p =2 p = P Z < ;z1; =2 + 5 n + P Z < ;z1; =2 ; 5 n p p = P Z < ;1:96 + 5 + P Z < ;1:96 ; 5 10= 50 10= 50 = P (Z < 1:58) + P (Z < ;5:50) = 1 ; P (Z 1:58) + P (Z > 5:50) = 1 ; :057 + 0:000 = :943 Note that the test with a one-sided alternative is more powerful than the test with a two-sided alternative. 166 General Result for Power of One-Sample z Test: The power of an -level one-sample z test of H0 : = tion, known population variance ) is given by 8 > = P Z < ;z1; + j jpn > < pn power > = P Z < ;z1; =2 ; > +P Z < ;z1; =2 + : 0 (normal popula- for a one-sided alternative pn for a two-sided alternative where = ; 0, the di erence between the true population mean and the null value 0 . here is the e ect we want to detect. In the example it was = ;5, a reduction of 5 mm Hg in IOP. 167 Sample Size: Typically, at the design stage we x power at a desired level and compute the sample size necessary to achieve that power rather than the other way around. One way to determine sample size for a given power is to use the methods we've just outlined to gure out the power for each of a range of values for n. Then select the smallest n that gives a power to the power we want. E.g., suppose we want to determine the minimum sample size necessary to ensure at least 90% power for the IOP example using a one-sided alternative and a z test. Then repeating the calculations of p.165 for several n values we get: n 10 15 20 25 30 35 40 Power .4746 .6147 .7228 .8038 .8630 .9054 .9354 Narrowing our search we nd: n 30 31 32 33 34 35 Power .8630 .8727 .8817 .8902 .8981 .9054 So that we need sample size of n = 35 to achieve power of at least .90 (90%) given our set of assumptions. 168 Alternatively, we can reason as follows to solve the problem more directly (rather than by trial and error): For a z -test with a one-sided alternative, we determined that 0 1 BZ < ;z + j jpn C power = P B A @ | 1; {z }C () If we want power equal to p, say, then this implies that ( ) should be the 100pth percentile of the Z distribution. That is, p jj n z = ;z + 1; p Solving for n we have zp + z1; = jj p n ) ) n = (zp + jz1; ) j 2 (z + z p 1; )2 : n= p 2 E.g., in the IOP example if we want power of p = :90 and if we set = :05, = 10, 102 (z:90 + z1;:05)2 = 100(1:2816 + 1:645)2 = 34:36 35 n= 52 25 General Result for Sample Size for a One-Sample z Test: The sample size necessary to achieve power equal to p for an -level onesample z test of H0 : = 0 (normal population, known population variance ) is given by ( z +z1; 2 2 (z +z 1; 2 2( p )2 p 2 =2 ) for a one-sided alternative for a two-sided alternative where = ; 0, the di erence between the true population mean and the null value 0 . n= 169 Comparison of Two Means* In the last two chapters, we studied how to do inference on a single population mean based upon a single sample of data from that population. We now take up the problem of inference on two means 1 and 2 based upon two samples of data. When considering inference based upon two samples, it is important to distinguish between two scenarios for which di erent methodologies are appropriate: Paired Samples vs. Independent Samples. In either case, we have data that we will represent as follows: Sample 1 x11 x21 .. . xn1 1 Sample 2 x12 x22 .. . xn2 2 1. Paired Samples. For paired data, the sample size is the same in each sample. That is, n1 = n2 = n. In addition, the rst observation in sample 1 corresponds to the rst observation in sample 2, the second observation in sample 1 corresponds to the second observation in sample 2, etc. { That is, the ith observation in samples 1 and 2 are paired, in some sense. By \paired" we mean that they are connected in such a way so that it is not reasonable to consider them to be independent random variables. * Read Ch.11 of our text. 170 Pairing can occur in many di erent ways. E.g., { Variables xi1 and xi2 might be pretest and posttest measurements on the same patients (study involves n paitients, indexed by i = 1 : : : n). { Variables xi1 and xi2 might be measurements or observations taken on the same unit (e.g., x-ray) by two di erent observers (e.g., radiologists), or taken with two di erent measuring devices. { Variables xi1 and xi2 might be measurements of the same response variable on the same subjects at two di erent time points (blood pressure at time 1, time 2), or two di erent locations (intraocular pressure (IOP) in the right eye and left eye). { Variables xi1 and xi2 might be measurements of the same variable from two di erent family members (e.g., husband and wife, in a study involving n married couples). In all of these situations, we would expect that xi1 and xi2, the measurements taken on the ith subject (or pair) might be similar to one another, or statistically dependent, because of common characteristics of the subject or pair. { It would be reasonable to assume that observations from subject to subject (pair to pair) are independent, but that two observations from the same subject (or pair) would be dependent. 2. Independent Samples. Alternatively, the two samples might not be paired, and therefore, the data would be independent both within samples and between samples. In this situation, xi1 and xi2 are not paired in any sense (don't come from a common source), and we can have samples of di erent sizes. That is, n1 is not necessarily equal to n2. 171 Independent samples are common as well. Examples include: { n1 subjects randomly assigned to group 1 (e.g., they receive an active treatment) and n2 other subjects randomly assigned to group 2 (e.g., a placebo, or control, group), and then the same response measured on each subject. { n subjects in the study, but n1 subjects (selected at random) are measured at time 1 and the remaining n2 = n ; n1 subjects measured at time 2. { Same as before, but n1 subjects could have IOP measured in their left eye, n2 could have IOP measured in their right eye. { n1 husbands measured, n2 wives measured from n = n1 + n2 married couples (no one in the sample married to each other). Paired Samples: The paired sample problem is the easier of the two because it can be handled by the methods we have already studied. For paired data, what is typically of interest is the di erence = 1 ;2 where 1 is the population mean corresponding to sample 1, and 2 is the population mean corresponding to sample 2. Notice that , the di erence in the population means, can also be thought of as the population mean of the di erences. In a paired situation, instead of thinking about having two samples, its really more appropriate to say that we have a single sample of di erences whose population mean is = 1 ; 2. 172 Example | Systolic Blood Pressure and Oral Contraceptives A study of the e ects of taking oral contraceptives (OCs) on systolic blood pressure (SBP) was conducted in which a random sample of n = 10 women had their SBP measured before starting to use OCs (i.e., at baseline) and after having taken OCs for 6 months. The data are as follows: Subject Number i 1 2 3 4 5 6 7 8 9 10 Sample 1 xi1=Baseline SBP 115 112 107 119 115 138 126 105 104 115 Sample 2 xi2 =SBP using OCs 128 115 106 128 122 145 132 109 102 117 Di erence di -13 -3 1 -9 -7 -7 -6 -4 2 -2 The data are paired here because samples 1 and 2 correspond to 2 measurements on the same women. { If a woman has high SBP at baseline, she's more likely to have relatively high SBP at the second measurement occasion, too. Therefore, these measurements are dependent. Let = population mean SBP when not taking OCs, 2 = population mean SBP when taking OCs, = 1; 2 1 173 There are two types of inferences taht we might be interested in concerning : Hypothesis test: we may want to test H0 : = 0 versus HA : 6= 0 or, perhaps, versus HA : < 0 Con dence interval: we may instead prefer to estimate and form a 100(1 ; )% (e.g., 95%) CI for . Both of these problems are ones which we already know how to handle, if we just notice that we can think of this as a one sample problem. Here we have a single sample of di erences: d1 : : : dn , where di = xi1 ; xi2 i = 1 ::: n We assume that the di 's are independent, each with distribution di N ( = 1 ;2 d) 2 We estimate the population mean and population sd d with the corresponding sample quantities: n 1 X d = 1 (;13 + (;3) + d= n i 10 i=1 + (;2)) = ;4:80 v v u1X u1 X n u u f( n d2) ; nd2g 2=t t sd = n ; 1 (di ; d) n ; 1 i=1 i i=1 r = 1 f(;13)2 + + (;2)2 ; 10(;4:80)2 g = 4:566 9 Therefore, inference for can be done with the one sample methods we've already learned. 174 E.g., assuming that d is unknown, and for a two-tailed alternative HA : 6= 0, we have the following t-test of H0 : = 0 : Test statistic: d0 ; t = s ;pn = ;4:80 p 0 = ;3:32 4:566= 10 d= Two-sided p;value: p = 2P (t(n ; 1) > jtj) = 2P (t(9) > 3:32) = 2f1 ; P (t(9) < 3:32)g = 2f1 ; :9956g = :0089 So, at level = :05, we reject H0 and conclude that there is a signi cant di erence between the mean SBP with and without OC use. The mean SBP when using OCs is higher. A 95% two-sided CI for would be given by s : d t1; =2(n ; 1) pd = ;4:80 t:975 (9) 4p566 | {z } 10 n =2:2622 = (;8:066 ;1:534) We are 95% con dent that the true mean di erence between the SBP at baseline and the SBP when using OCs lies between -8.066 and -1.534. A negative di erence here means that the SBP at baseline is lower. If d , the population sd of the di erence between the measurements in the two samples had been known, we would have used a z-test and z-based con dence interval rather than the t-based inferences illustrated here. 175 Independent Samples: In the independent samples case, we can't reduce the problem to one which we already know how to solve. Instead, we're going to need some new methodology. We consider testing rst. As in the one-sample problem, we will assume that we have samples from normally distributed populations. If not, then our results will not hold exactly, but will be approximately valid if the sample size is reasonably large by the CLT. In particular, we assume that for sample 1 x11 x21 : : : xn1 1 are independent, with xi1 N ( and for sample 2 x12 x22 : : : xn1 2 are independent, with xi2 N ( 1 2 1 ) 2 2 2 ) and we assume that samples 1 and 2 are independent of each other. That is, we have two normal samples with population means 1 and 2 and population SDs 1 and 2 . The steps we take in conducting a hypothesis test in this setting are the same as always: 1. State the research question in terms of the null and alternative hypothesis. { The null hypothesis that we are interested in will be H0 : 1 = 2 , or equivalently, H0 : 1 ; 2 = 0 versus HA : or, perhaps, versus HA : 2. Specify a signi cance level. { E.g., = :05. 176 ; 2 = 0 (two-sided) 6 1 ; 2 < 0(> 0) (one-sided) 1 3. Select an appropriate test statistic. { Since we are interested in whether 1 ; 2 = 0, it is natural to examine how far x1 ; x2 is from 0. Here x1 is the sample mean from sample 1, x2 is the sample mean from sample 2. { Similar to the one-sample problem, we judge how far x1 ; x2 is from its null value, 0, relative to its standard error. That is, our test statistic is going to be of the general form: x1 ; x2 ; 0 = px1 ; x2 ; 0 s:e:(x1 ; x2) var(x1 ; x2 ) ^ { (Recall that the standard error of a statistic is its estimated standard deviation i.e., the square root of its estimated variance.) { The exact form of this test statistic depends upon what we assume about the population SDs, 1 and 2 . { Speci cally, the standard error in the denominator of our test statistic depends upon whether 1 and 2 are assumed (i) known or unknown, and assumed (ii) equal or unequal. 4. Collect the data and compute the test statistic. 5. Calculate the p-value and make conclusion. { The computation of the p-value depends upon which test statistic is appropriate given our assumptions regarding 1 and 2 (step 3). Di erent test statistics have di erent distributions under H0 , which a ects the p-value or critical value. 177 In general, under the assumptions of independent samples such that x11 x21 : : : xn1 1 are independent, with xi1 N ( x12 x22 : : : xn2 2 are independent, with xi2 N ( then x1 ; x2 N 2 1 ;2 2 2 2 1 n1 + n2 1 2 2 1 2 2 ) ) () 2 i.e., var(x1 ; x2 ) = n1 + n2 . 1 2 If we standardize (convert to z scores), then (*) becomes x1 ;q2 ; ( 1 ; 2 ) N (0 1) x 2 2 1+ 2 n1 n2 Under H0 : 1 ;2 () = 0, (**) becomes x q 1 ; x2 N (0 1) + | n {z n } 2 1 1 (y) 2 2 2 test statistic Cases: Case 1: both known (may or may not be equal). In this case, the standard error in the denominator of our test statistic above is s2 2 s:e:(x1 ; x2 ) = n1 + n2 2 1 2 2 1 2 which can be computed directly. Therefore, our test statistic and its distribution are given by (y). 178 Example | SBP and OC Use, Two-Sample Experiment Suppose that instead of the paired design described before in which each woman was measured twice, once when not using OCs and once when using OCs, the following design was used: A random sample of n1 = 8 35 to 39-year-old nonpregnant, premenopausal OC users and a random sample of n2 = 21 35 to 39year-old nonpregnant, premenopausal non-OC users were obtained. The OC users were found to have a mean SBP of x1 = 132:86 mm Hg, and the non-OC user's were found to have a mean SBP of x2 = 127:44 mm Hg. 1 , the population SD of SBP among OC users and 2 , the population mean SD among non-OC users are assumed to be the same, equal to the common value 1 = 2 = = 16:0 mm Hg. Then our test statistic is x q 16:0 16:02 z = q 12; x2 2 = 132:862; 127:44 = 0:815 1+ 2 8 + 21 n1 n2 Since our test statistic z is distributed as N (0 1), the p-value for a twosided test is p = 2P (Z > :815) = 2(:207) = :414 and we would fail to reject H0 : 1 = 2 based on an = :05 level test. There is insu cient evidence to conclude that the mean SBP is different for the OC users than for the non-OC users. General Rule under Case 1: ; One-sided alternative: reject H0 if jz j > z1; where z = qx12 x2 2 . 1+ 2 n2 n1 Equivalently, reject H0 if p < where p = P (Z > jzj). Two-sided alternative: reject H0 if jz j > z1; =2 . Equivalently, reject H0 if p < where p = 2P (Z > jz j). 179 Case 2: If 2 1 = 2 1 2 2 2 2 2 unknown, but assumed equal ( 1 = 2 2 = 2 , say). = 2 , then the test statistic in (y) becomes x q 12; x2 2 = r x1 ; x2 2 1+1 n1 + n2 n n 1 2 which would still be N (0 1) if we know 2 . However, we don't know 2 . Obvious thing to do: replace estimate. Two possible estimators come to mind: s2 = sample variance from 1st sample 1 s2 = sample variance from 2nd sample 2 2 by a sample 2 2 Under the assumption that 1 = 2 = 2 , both are estimators of the same quantity, 2 , each based on only a portion of the total number of relevant observations available. Better idea: combine these two estimators by taking their (weighted) average: (n1 ; 1)s2 + (n2 ; 1)s2 1 2 2 2 ^ = sP = n1 + n2 ; 2 s s ) 11 11 s:e:(x1 ; x2 ) = ^ 2 n + n = s2 n + n P1 1 2 2 ) test stat. = t = r x1 ; x2 s2 n11 + n12 P t(n1 + n2 ; 2) by an estimate s2 (which is known as the P pooled estimate of ) changes the distribution of our test statistic from N (0 1) to t(n1 + n2 ; 2). Note that replacing 2 2 180 Example | SBP and OC Use, Two-Sample Experiment In the same set-up as before, now assume that 1 , the population SD of SBP among OC users, and 2 , the population SD of SBP among OC non-users, are assumed to be equal, but their common value = 1 = 2 is unknown. Suppose also that the sample SD among OC users was s1 = 15:34 mm Hg, and the sample SD among OC non-users was s2 = 18:23 mm Hg. The pooled estimate of 2 , the common variance in the two populations, is (n1 ; 1)s2 + (n2 ; 1)s2 = (8 ; 1)15:342 + (21 ; 1)18:232 = 307:18 1 2 sP = n1 + n2 ; 2 8 + 21 ; 2 2 Therefore, our test statistic is 132 t = r x1 ; x2 = q :86 ; 127:44 = 0:74 ;1 1 1+1 2 307:18 8 + 12 sP n1 n2 which we compare to the t(n1 + n2 ; 2) = t(8+21 ; 2) = t(27) distribution, the distribution of this test statistic under the null hypothesis. For a two-sided alternative hypothesis, the p-value would be p = 2P (t(n1 + n2 ; 2) > jtj) = 2P (t(27) > :74) = 2f1 ; P (t(27) < :74)g = 2(1 ; :7684) = :4632 and the critical value for a .05-level test is t1; =2 (n1 + n2 ; 2) = t:975(27) = 2:052. Since p = :4632 > = :05 (or, equivalently, since jtj = :74 < t:975(27) = 2:052) we fail to reject H0 . There is insu cient evidence to conclude that the mean SBP for OC users is di erent from that of OC non-users. 181 General Rule under Case 2: One-sided alternative: reject H0 if jtj > t1; (n1 + n2 ; 2) where t = r x1 ; x2 s2 n11 + n12 P (n1 ; 1)s2 + (n2 ; 1)s2 1 2 sP = n1 + n2 ; 2 2 Equivalently, reject H0 if p < where p = P (t(n1 + n2 ; 2) > jtj). Two-sided alternative: reject H0 if jtj > t1; =2 (n1 + n2 ; 2). Equivalently, reject H0 if p < where p = 2P (t(n1 + n2 ; 2) > jtj). Case 3: both unknown but assumed di erent. In this case, the test statistic in (y): 2 1 2 2 x q 1 ; x2 2 1 1 2 2 n + n2 is not available because we don't know 2 1 and 2 2 . 2 Obvious solution: replace 1 by s2 , the sample SD from the rst sample, 1 2 by s2 , the sample SD from the second sample. and replace 2 2 The resulting test statistic is x t = q 12; x2 2 s1 s2 n1 + n2 182 Problem: even though this test statistic makes good sense, its distribution under H0 is di cult to derive mathematically. However, it can be shown that this test statistic has a null distribution which is well approximated by a t distribution with degrees of freedom that can be approximated from the data. That is, x t = q 12; x2s2 : t( ) s1 2 n1 + n2 where = s2 1 n1 2 s2 + s2 1 2 n1 n2 2 s2 =(n1 ; 1) + n22 2 =(n2 ; 1) : Note that this quantity should be rounded down to the nearest integer to give an approximate degrees of freedom for the distribution of t under H0 . The approximation to the distribution of t under H0 given above is based on what is known as Satterthwaite's approximation. Example | SBP and OC Use, Two-Sample Experiment In the same set-up as before, now assume that 1 , the population SD of SBP among OC users, and 2 , the population SD of SBP among OC non-users, are unknown and we are not willing to assume that they are equal. Suppose again that the sample SD among OC users was s1 = 15:34 mm Hg, and the sample SD among OC non-users was s2 = 18:23 mm Hg. In this situation, our test statistic becomes x 132 t = q 12; x2 2 = q :862; 127:44 = :8058 s1 s2 15:34 :23 + 1821 2 8 n1 + n2 183 Using Sattherthwaite's approximation, this test statistic is approximately distributed as t( ) under H0 where = s2 1 n1 2 = ; 15:342 8 s2 + s2 2 1 n1 n2 2 s2 2 =(n ; 1) =(n1 ; 1) + n22 2 2 15:342 + 18:232 2 ; 18:23 2 =(21 ; 1) = 15:04 =(8 ; 1) + 21 8 21 2 which we round down to = 15. Therefore, our p-value is p = 2P (t( ) > jtj) = 2P (t(15) > :8058) = 2f1 ; P (t(15) < :8058)g = 2(1 ; :7835) = :433 and our .05-level critical value is t1; =2( ) = t:975 (15) = 2:131 Therefore, since p = :433 > = :05 (or, equivalently, because jtj = :8058 < t:975(15) = 2:131) we fail to reject H0 . There is insu cient evidence here to conclude that the mean SBP of OC users di ers from that of OC non-users. General Rule under Case 3: One-sided alternative: reject H0 if jtj > t1; ( ) where x t = q 12; x2 2 s1 s2 n1 + n2 s2 + s2 2 1 2 n1 n2 = (s2 =n1 )2 (s2 =n2 )2 1 2 n1 ;1 + n2 ;1 Equivalently, reject H0 if p < where p = P (t( ) > jtj). Two-sided alternative: reject H0 if jtj > t1; =2 ( ). Equivalently, reject H0 if p < where p = 2P (t( ) > jtj). 184 Con dence Intervals for 1 ; 2 As we've learned, the acceptance region of an level test forms a 110(1 ; )% con dence interval. Therefore, the tests we have just derived for the two independent samples problem can all be inverted to form con dence intervals. General Rule for Con dence Limits under Case 1: One-sided limits: a 100(1 ; )% upper con dence bound on 1 ; 2 under case 1 is given by s 2 1 (x1 ; x2 ) + z1; A 100(1 ; 2 2 2 1 2 2 n1 + n2 )% lower con dence bound on 1 ; 2 is given by (x1 ; x2 ) ; z1; s n1 + n2 Two-sided limits: a 100(1 ; )% con dence interval on 1 is given by s 2 2 (x1 ; x2 ) z1; =2 n1 + n2 1 2 185 1; 2 under case General Rule for Con dence Limits under Case 2: One-sided limits: a 100(1 ; )% upper con dence bound on 1 ; case 2 is given by s 11 (x1 ; x2 ) + t1; (n1 + n2 ; 2) s2 n + n P1 2 A 100(1 ; )% lower con dence bound on 1 ; 2 is given by 2 under s 11 (x1 ; x2 ) ; t1; (n1 + n2 ; 2) s2 n + n P1 2 Two-sided limits: a 100(1 ; )% con dence interval on 1 ; 2 is given by s 11 (x1 ; x2 ) t1; =2 (n1 + n2 ; 2) s2 n + n P1 2 General Rule for Con dence Limits under Case 3: One-sided limits: a 100(1 ; )% upper con dence bound on case 3 is given by s A 100(1 ; 2 under case 1; 2 under s2 + s2 (x1 ; x2 ) + t1; ( ) n1 n2 1 2 )% lower con dence bound on 1 ; 2 is given by s2 2 ss (x1 ; x2 ) ; t1; ( ) n1 + n2 1 2 Two-sided limits: a 100(1 ; )% con dence interval on 3 is given by s2 2 ss (x1 ; x2 ) t1; =2( ) n1 + n2 1 2 1; 2 under case Notice that all of these con dence intervals are of the same general form: x1 ; x2 plus or minus tcrit or zcrit standard errors of x1 ; x2 . 186 Example | Blood Glucose Level and Stenosis A study was performed concerning risk factors for carotid artery stenosis (narrowing) among 464 men born in 1914 and residing in the city of Malmo, Sweden. The following data were reported for blood-glucose level (mmol/L): Stenosis Status No Stenosis Stenosis n 356 108 Sample Mean 5.3 5.1 Sample SD 1.4 0.8 Using an appropriate procedure, test whether there is a signi cant difference between the mean blood-glucose levels of men with and without stenosis. Use = :01. In addition, form a 99% con dence interval for the di erence in the population mean blood-glucose levels of those with and without stenosis. Let 1 =population mean blood-glucose of men with stenosis, and 2 be the corresponding mean for those without stenosis. We are interested in testing H0 : 1 ; 2 = 0 versus HA : 1 ; 2 6= 0 and forming a 99% CI for 1 ; 2 . We do not know the SDs for the two populations here, so we know that we are going to use a t test here rather than a z test. However, are we in case 2 (equal population SDs) or in case 3 (unequal population SDs)? To answer this question, we can choose between cases 2 and 3 based upon looking at whether the sample SDs are close to each other and by using our medical knowledge/judgement as to whether its reasonable to assume equal variability in blood glucose level in these two groups. 187 Alternatively, we can do a formal hypothesis test of H0 : 2 1 = 2 2 versus HA : 2 1 6= 2 2 (z) There exists a statistical test of this hypothesis for data from two independent normally distributed samples. It is called the F test for equal variances, and it is performed as follows: The test statistic for H0 is given by s2=s2 if 2 2 12 F = s2=s2 ifss1< ss2 2 2 21 1 2 Under H0 , this statistic follows the F distribution. The F distribution has two parameters, called the numerator degrees of freedom which is equal to one less than the sample size associated with the variance in the numerator, and the denominator degrees of freedom which is one less than the sample size associated with the variance in the denominator of F . We will denote this distribution as F (num df denom df) and the 100pth percentile by Fp (num df denom df). We reject H0 at level , if F > Fcrit where Fcrit is given by 2 2 Fcrit = F1; (n1 ; 1 n2 ; 1) if s1 < s2 F1; (n2 ; 1 n1 ; 1) if s2 s2 1 2 Critical values of the F distribution are given in table A.5 in the back of our book. Equivalently, we reject H0 at level if p < where 2P (F (n1 ; 1 n2 ; 1) > F ) if s2 s2 p = 2P (F (n ; 1 n ; 1) > F ) if s1 < s2 2 2 2 1 1 2 Probabilitites associated with the F distribution can be computed with computer programs such as Minitab. 188 Back to the example: We will conduct the t test of H0 : 1 ; 2 = 0 versus a two-sided alternative under both cases 2 and 3, but then we will do the F test for equal variances to see which case is more appropriate for these data. Under case 2, our test statistic is t = r x1 ; x2 s2 n11 + n12 P where So, 2 1) 2 1 s2 = (n1 ; n s+ + (n22; 1)s2 P n2 ; 1 2 2 = (356 ; 1)1:4+ + (1082; 1)0:8 = 1:654 356 108 ; :2 = :1413 = 1:416 t = q 5:3;; 5:1 1 1 1:654 356 + 108 and our p-value and critical value are p = 2P (t(n1 + n2 ; 2) > jtj) = 2P (t(462) > 1:416) = 2f1 ; P (t(462) < 1:416)g = 2(1 ; :9212) = :158 and t1; =2(n1 + n2 ; 2) = t:995 (462) = 2:587 So, we fail to reject H0 at level = :01 because p = :1413 > = :01 (equivalently, because jtj = 1:416 < tcrit = 2:587). Under case 2, a 99% CI for 1 ; 2 would be s 11 (x1 ;x2 ) t1; =2 (n1 +n2 ;2) s2 n + n = :2 2:587(:1413) = (;:165 :565) P1 2 189 Under case 3, our test statistic is :2 x t = q 12; x2s2 = q 2:2 2 = :1069 = 1:871 s1 1:4 + 0:8 2 356 108 n1 + n2 The approximate degrees of freedom for Satterthwaite's approximation are = s2 1 n1 = ; 1:42 356 2 s2 + s2 1 2 n1 n2 2 s2 2 =(n1 ; 1) + n22 =(n2 ; 1) :82 2 + 108 ; :82 2 =(108 ; 1) = 315:97 2 =(356 ; 1) + 1:42 356 108 which we round down to = 315. Therefore, our p-value and critical value are p = 2P (t( ) > 1:871) = 2f1 ; P (t(315) < 1:871)g = 2f1 ; :9689g = :062 and t1; =2( ) = t:995 (315) = 2:592 So, again, we fail to reject H0 at = :01 (although our p-value is now considerably smaller than in case 2). Under case 3, a 99% CI for 1 ; 2 would be s s2 + s2 = :2 2:592(:1069) = (;:077 :477) (x1 ; x2 ) t1; =2( ) n1 n2 1 2 190 Now to choose between cases 2 and 3 by conducting an F test of H0 : 2 2. Note that s1 = 1:4 > :8 = s2 , so we compute 2 1 = s2 = 1:42 = 3:06 1 F = s2 :82 2 The p-value and critical value here are p = 2P (F (n1 ; 1 n2 ; 1) > F ) = 2P (F (355 107) > 3:06) = 2f1 ; P (F (355 107) < 3:06)g = 2f1 ; 1:000g = 0:000 and Fcrit = F1; (n1 ; 1 n2 ; 1) = F:99 (355 107) = 1:46 So, because p = 0:000 < = :01 (or, equivalently, because F = 3:06 > Fcrit ), we reject H0 , and conclude that the population variances are di erent here, so that the case 3 analysis was more appropriate. 191 Inference for Proportions* So far we have con ned our discussion of inference to means of continuous random variables. However, dichotomous (also known as binary, or 0-1, or Bernoulli) variables are also very common in the health sciences. Examples of dichotomous random variables: { Disease status (0=disease free, 1=diseased) { Mortality (0=dead, 1=alive) { Pregnancy (0=not pregnant, 1=pregnant) { Adherence to a protocol (0=no, 1=yes) { Gender (0=male, 1=female) Note that these are all essentially qualitative variables, but we assign the numbers 0 and 1 to make them numeric to allow analysis. Note also that the sample mean of a 0-1 variable is the proportion of the sample members who fall in the \1" category. A population mean of a 0-1 variable is the corresponding population proportion in the \1" category, which also has the interpretation as the probability of being in the \1" category. As always we can express proportions and probabilities as percentages by multiplying by 100%. Given that a proportion is a mean, and given that the CLT says that means of even non-normally distributed random variables are approximately normal, for large sample sizes, it should be no suprise that the normal-theory inference that we have just been studied can be extended to proportions and justi ed as approximately valid for large sample sizes. * Read Ch.14 of our text. 192 Normal Approximation to the Binomial Recall the binomial distribution gives the probability function for a random variable X de ned as the number of successes that occur out of n trials, where the trials are independent, identically distributed with constant success probability p. We write this as X B in(n p). { Recall that E(X ) = np var(X ) = np(1 ; p) Recall also from pp.111-113 of these notes that the CLT implies that the normal distribution can approximate the binomial distribution well when the sample size is large. Which normal distribution? The one with the same mean and variance as the binomial distribution that we are trying to approximate. That is, if np 5 and n(1 ; p) 5, then for X B in(n p), X : N (np np(1 ; p)) What does this have to do with inference for a proportion? Notice that if X =the number of successes out of n trials, then the proportion of successes out of n trials is just p = X=n ^ B in(n p) : N (np np(1 ; p)) then 1 p = X n Bin(n p) : n N (np np(1 ; p)) = N (p p(1 ; p)=n) ^n1 Since X So, we have that p : N (p p(1 ; p)=n) ^ () which says that a sample proportion p is approximately normally ^ distributed with mean p, the corresponding population proportion, and variance p(1 ; p)=n. 193 One-Sample Con dence Intervals for p Based on the distributional result (*), we can standardize (convert to z scores) to get the following result: ^ : N (0 1) z = p p;p () p(1 ; p)=n p ^ Therefore, for example, z = pp(1;pp)=n should fall between -1.96 and ; 1.96 approximately 95% of the time. p ^ z = pp(1;pp)=n should fall between -1.645 and 1.645 approximately ; 90% of the time. p ^ In general, z = pp(1;pp)=n should fall between ;z1; =2 and z1; =2 ; approximately 100(1 ; )% of the time. That is, we can make the probability statement: ^ 1:96) :95 P (;1:96 p p ; p p(1 ; p)=n If we rearrange the left-hand side so that p falls in the middle of the inequality, we get p p P (^ ; 1:96 p(1 ; p)=n p p + 1:96 p(1 ; p)=n) :95 p ^ p p ^ Therefore, (^; 1:96 p(1 ; p)=n p +1:96 p(1 ; p)=n) is an approxp imate 95% CI for p. Note that the endpoints of this interval depend upon p, the true value of the population proportion, which is of course unknown. Therefore, we replace p by p, leading to ^ p p ^ ^^ ^ ^ (^ ; 1:96 p(1 ; p)=n p + 1:96 p(1 ; p)=n) p as an approximate 95% CI for p. 194 More generally, for np 5 and n(1 ; p) 5, an approximate 100(1 ; )% ^ ^ CI for p is given by p p z1; =2 p(1 ; p)=n ^ ^ ^ Notice that this interval is of the usual form: estimator plus or minus some number of standard errors. pp(1 ; p)=n and the multi^ { Here, the standard error of p is ^ ^ plier is the upper =2th critical value of a z (standard normal) distribution. Example | Prevalence of Breast Cancer Suppose we are interested in estimating the prevalence (population proportion with a condition or characteristic) of breast cancer among 50{54-year old women whose mothers have had breast cancer. Suppose that in a random sample of 1,000 such women, 40 are found to have had breast cancer at some point in their lives. Obtain a point estimate and 99% con dence interval for the prevalence of breast cancer in this population. The best point estimate of p is the sample proportion, p = x=n where ^ x =the number with breast cancer, and n is the sample size. So, our estimate of p is p = 1000 = :040 ^ 40 or 4%. To check whether the sample size is large enough in this problem to justify our normal theory con dence interval, we notice that np = 1000(:040) = 40 5 and n(1 ; p) = 1000(1 ; :040) = 960 5 ^ so we should be OK. 195 For a 99% CI, 100(1 ; ) = 99 so = :01. Therefore, z1; =2 = z:995 = 2:576 (back of the book) The standard error of p is ^ p p s:e:(^) = p(1 ; p)=n = :040(1 ; :040)=1000 = :00620 p ^ ^ so that our approximate 99% CI for p is p ^ ^ p z1; =2 p(1 ; p)=n = :040 2:576(:00620) = (:024 :056) ^ Thus, we are 99% con dent that the true prevalence of breast cancer among 50{54-year-old women whose mothers had breast cancer lies between 2.4% and 5.6%. Occasionally, we want a one-sided interval (lower or upper bound). Here is the general result: For np 5 and n(1 ; p) 5, an approximate 100(1 ; )% lower bound ^ ^ on p is given by p p ; z1; p(1 ; p)=n ^ ^ ^ An approximate 100(1 ; )% upper bound on p is given by p + z1; ^ p p(1 ; p)=n ^ ^ 196 One-Sample Hypothesis Tests for p Suppose that in the breast cancer example, it is known that the population prevalence of breast cancer among women with no family history of breast cancer is 2%. Then to determine whether a family history of breast cancer is a risk factor for this disease, we may be interested in testing the hypothesis H0 : p = p0 versus HA : p > p0 where p0 = :02. How can we test such a hypothesis? Recall from (**) that p ^ pp(1; pp)=n : N (0 1) ; where p is the true population proportion. Under the null hypothesis, p = p0 so this result becomes ^ : N (0 1) z = p p ; p0 p0 (1 ; p0 )=n Since z compares the sample proportion p to the null value p0 (rel^ ative to the standard error of p), and since the distribution of z is ^ known, z is the natural test statistic for testing H0 : p = p0 . 197 General method for an approximate -level test of H0 : p = p0 versus a one- or two-sided alternative: Critical value approach: reject H0 if p;p0 is consistent with the alternative ^ hypothesis and if p ; p0 ^ > z1; =2 for a one-sided alternative jz j = p z1; for a two-sided alternative. p(1 ; p)=n ^ ^ Otherwise, we fail to reject. p-value approach: reject H0 if p < . The p-value is computed as 8 P (Z < z) if the alternative is H : p < p , < A 0 p = : P (Z > z) if the alternative is HA : p > p0, 2P (Z > jz j) if the alternative is HA : p 6= p0 Here, Z denotes a N (0 1) random variable, and z is the value of our test statistic. This normal-theory test can be justi ed by the CLT, and should work well provided that np0 5 and n(1 ; p0 ) 5. Example | Breast Cancer Prevalence Suppose we wish to conduct an = :01-level test of H0 : p = p0 versus HA : p > p0 where p0 = :02. Our test statistic is ^ z = p p ; p0 = p :040 ; :02 = 4:52 p0 (1 ; p0)=n :02(1 ; :02)=1000 Since p = :040 > p0 = :02, the sample results provide evidence in ^ favor of HA : p > :02. Our critical value here is z1;:01 = z:99 = 2:327, so since jz j = 4:52 > z:99 = 2:327, we reject H0 in favor of HA : p > p0 . The conclusion is that there is a signi cantly higher prevalence (at level .01) for women whose mothers had breast cancer. The p-value for our test would be p = P (Z > z) = P (Z > 4:52) = :0000031 (from Minitab) 198 Power and Sample Size for Testing a Proportion We have already studied power and sample size calculation methods for one-sample z tests. Therefore, when using normal-approximation methods (z tests) for inference on p, a population proportion, the power and sample size methods we've already learned apply with little modi cation. Example | Breast Cancer Prevalence Suppose we wish to investigate whether women who's sisters have a history of breast cancer are at higher risk for breast cancer themselves. Suppose we assume that the prevalence of breast cancer is 2% among 50{54 year-old US women with no famiuly history, whereas it is 5% among those women who's sisters have had breast cancer. We propose to interview 500 50-54 year-old women with a sister history of the disease. Assuming that we conduct a one-sided test at = :05, what would be the power of such a study? Here, we are going to test H0 : p = p0 versus HA : p > p0 where p0 = :02 This hypothesis would be rejected if our test statistic exceeds the appropriate critical value. That is, if ^ z = p p ; p0 > z = z:95 = 1:645 p0 (1 ; p0 )=n 1; We have assumed that the null hypothesis is really false and that the true prevalence is p = p1 where p1 = :05. Therefore, the power is the probability that the test statistic z exceeds the critical value z1; = 1:645 199 given that p = p1 = :05. That is, ! ^0 pp p ;;pp )=n > z1; p = p1 0 (1 0 p = P p > p0 + z1; p0 (1 ; p0 )=n p = p1 ^ ! pp (1 ; p )=n ; p ^ 0 1 pp (1 ; p )=n > p0 + z1; 0 p = p1 = P p p ; p1 p1 (1 ; p1)=n 1 1 s ! p0 (1 ; p0 ) + p p0 ; p1 = P Z > z1; p (1 ; p ) p (1 ; p )=n 1 1 power = P 1 1 So, in this example, the power is s ! power = P Z > z1; p0 (1 ; p0 ) + p p0 ; p1 p1 (1 ; p1 ) p1 (1 ; p1 )=n s ! = P Z > 1:645 ::02(1 ; ::02) + p :02 ; :05 05(1 ; 05) :05(1 ; :05)=500 = P (Z > ;2:02) = 1 ; P (Z > 2:02) = 1 ; :022 = :978 General result for the power of a one-sample z test for p: power = P (Z > z ) = P (Z < ;z ) ~ ~ where 8 q p (1;p ) jp ;p j > z1; p (1;p ) ; pp (1;p )=n < q p (1;p ) jp ;p j z=> ~ : z1; =2 p (1;p ) ; pp (1;p )=n 0 1 0 1 0 1 0 1 1 0 1 1 0 1 1 1 if the alternative is one-sided if the alternative is two-sided This result holds provided that the sample size is large enough to justify using the normal approximation (z test). That is, provided that np0 5 and n(1 ; p0 ) 5. 200 ...
View Full Document

This note was uploaded on 11/13/2011 for the course STAT 6200 taught by Professor Staff during the Summer '08 term at UGA.

Ask a homework question - tutors are online