Unformatted text preview: That is, if we look for the area under the curve between any two numbers
on the original scale (height in this example), this quantity di ers for
di erent normal curves.
There is not a single answer that applies for all normal curves.
However, when looking for areas under the curve between two numbers
expressed as z scores (the number of SDs above or below the mean), then
the areas are the same for any normal curve!
Notice that for women, X ; 1 X = 65 ; 1(2:5) = 62:5 and X =
65 + 1 X = 65 + 1(2:5) = 67:5 correspond to z scores of 1 and +1:
62:5 ; X = 62:5 ; 65 = ;2:5 = ;1
2:5
2:5
X
67:5 ; X = 67:5 ; 65 = 2:5 = 1
2:5
2:5
X The probability of women's height falling between z scores of 1 is
68.26%.
And for men, Y ; 1 Y = 70 ; 1(3) = 67 and Y +1 Y = 70+1(3) =
73 correspond to zscores of 1 and +1:
67 ; Y = 67 ; 70 = ;3 = ;1
3
3
Y
73 ; Y = 73 ; 70 = 3 = 1
3
3
Y The probability of men's height falling between zscores of 1 is
68.26%.
The point of all of this is that by transforming to z;scores we can compute
probabilities for any normal distribution (no matter its mean and variance)
using a single reference distribution  the standard normal distribution
N (0 1).
Fact: If X N( 2 ), then Z= X; N (0 1):
101 For a normally distributed r.v. X
in nding probabilities like N( 2 ) often we will be interested P (c < X < d) or P (X c) or P (X < c) etc:
where c d are given constants.
E.g., we might want to know the percentage of women with heights
between 60 and 65 inches, or the percentage of women with heights
greater than or equal to 68 inches, or less than 61 inches, etc.
How do we use Z ;scores to get such probabilities?
Facts about inequalities:
1. X c if and only if X d c d.
{ Here can be replaced by any other inequality or equality
( > < =) and the statement would still be true.
{ This means that P (X c) = P (X d c d) e.g., P (X 4) = P (X ; 3 4 ; 3)
and, P (X > 7) = P (X + 2 > 7 + 2):
2. c X d if and only if c + b] X + b] d + b].
{ Again, can be replaced by any other inequality or equality
( > < =).
{ This means that P (c X d) = P ( c + b] X + b] d + b]) E.g., P (3 X 7) = P ( 3 ; 2] X ; 2] 7 ; 2]) = P (1 X ; 2 5)
and, P (1 > X > ;3) = P ( 1 + 9] > X + 9] > ;3 + 9]) = P (10 > X + 9 > 6): 102 3. For c any constant and b a constant which is 0,
X c if and only if bX bc:
If b is a negative number, then multiplying by b reverses the inequality:
X c if and only if bX bc:
{ Again, can be replaced by any other inequality or equality
and the statement would still be true.
{ This means that
P (bX bc
P (X c) = P (bX bc) iif b < 0,
) fb 0
E.g., P (X 5) = P (3X 3(5)) = P (3X 15)
and, P (X > ;1) = P (;2X < (;2)(;1)) = P (;2X < 2):
4. Result 3 extends to double inequalities. That is, for b 0,
c X d if and only bc bX bd
and for b < 0,
c X d if and only bc bX bd:
{ Again, can be replaced by any other inequality or equality
and the statement would still be true.
{ This means that
P
P (c X d) = P (bc bX bd) iif b < 0,
(bc bX bd) f b 0
E.g., P (3 X 9) = P ((2)(3) 2X (2)(9)) = P (6 2X 18)
and, P (3 X ;1) = P ((;1)(3) ;X (;1)(9)) = P (;3 ;X
103 ;9): In summary, one can add or subtract any number or multiply or
divide by any nonnegative number on both (all) sides of an inequality without changing the inequality. If we multiply or divide by a
negative number that switches the direction of the inequality.
These results allow us to use the standard normal distribution N (0 1) to
compute probabilities associated with a normal distribution N ( 2 ) for
any and 2 . Examples: i. To detect whether patients have had a stroke, one measure which is
sometimes used is the cerebral blood ow (CBF) in the brain. Stroke
patients tend to have lower levels of CBF than healthy patients.
Assume that in the general population, X =CBF follows a N (75 172)
distribution. A patient is classi ed as \probable stroke" if his or her
CBF is less than 40. What proportion of healthy patients will be
mistakenly classi ed as probable stroke victims?
Answer: X N ( 2 ) where = 75, = 17. We want to nd
P (X < 40):
P (X < 40) = P (X ; < 40 ; )
= P X ; < 40 ;
= P Z < 40 ; where Z N (0 1) ;
= P Z < 40 17 75 = P (Z < ;2:06) Now the probability P (Z < c) for any number c can be computed
from a computer program. For instance, in Minitab we select
Calc ! Probability Distributions ! Normal:::
and then select \cumulative probability" (which gives the probability
to the left of c), set the mean and standard deviation to 0 and 1,
respectively, and input c in the eld \input constant". Hitting OK
gives the answer:
P (X < 40) = P (Z < ;2:06) = :0197
104 Note that 2.06 is just the Z score associated with 40.
Note that Minitab allows you to set the mean and the standard deviation to anything you want. So, we actually could have computed
P (X < 40) here directly without transforming to Z scores by setting the mean and standard deviation to 75 and 17, respectively, and
setting \input constant" to 40.
Other computer programs also have normal probability functions.
E.g., in Excel, the function NORMDIST(c, , ,TRUE) gives P (X <
c) for X N ( 2 ).
While transforming to Z scores is not necessary with one of these
computer functions, it is necessary for using a standard normal probability table.
Standard normal tables are given in a variety of formats. Some give P (Z <
c) for selected values c 0, some give P (;c < Z < c) for selected values
c 0, and others give P (0 < Z < c) for selected values c 0. (See
handout).
Any of these formats can be used to compute any desired normal probability if we use some logic and the facts that
a. the normal distribution is symmetric, so (for example) P (Z < ;c) = P (Z > c) for any positive constant c, and
b. the area under the normal distribution is 1, so that P (Z > c) = 1 ; P (Z c) = 1 ; P (Z < c) for any constant c.
When computing normal probabilities from a table it is very useful
to draw a picture in order to gure out exactly how to use the table
and these facts to get the desired probability. 105 Back to the example:
We want P (Z < ;2:06). Picture: 0.3
0.2 N(0,1) p.d.f. at Z 0.0 0.1 0.2
0.1
0.0 N(0,1) p.d.f. at Z 0.3 0.4 Standard normal p.d.f. 0.4 Standard normal p.d.f. 4 2 0
Z value 2 4 4 2 0 2 4 Z value To use the rst table that gives P (Z < c) for c 0 we reason as
follows:
P (X < 40) = P (Z < ;2:06) = P (Z > 2:06)
= 1 ; P (Z 2:06) = 1 ; P (Z < 2:06)
= 1 ; :98030 = :0197
To use the second table that gives P (;c < Z < c) for c 0 we
reason as follows:
1
P (X < 40) = P (Z < ;2:06) = 2 f1 ; P (;2:06 < Z < 2:06)g
1 f1 ; P (;2:05 < Z < 2:05)g = 1 (1 ; :9596) = :0202
2
2
which is slightly o because 2.06 didn't appear in our table and we
had to use 2.05 instead.
To use the third table that gives P (0 < Z < c) for c 0 we reason
as follows:
P (X < 40) = P (Z < ;2:06) = P (Z > 2:06)
1
= 1 ; P (0 < Z < 2:06) = 2 ; :4803 = :0197
2
106 ii. Suppose that a mild hypertensive is de ned as a person whose distolic
blood pressure is between 90 and 100 mm Hg (inclusive). Suppose
also that 35{44 yearold males have diastolic blood pressure which
is normally distributed with mean 80 and variance 144.
What is the probability that a randomly selected 35{44 year
old male is hypertensive?
I.e., if X N (80 144), nd P (90 X 100). 0.2
0.1
0.0 N(0,1) p.d.f. at Z 0.3 0.4 Standard normal p.d.f. 4 2 0 2 4 Z value Answer:
P (90 X 0 100) = P (90 ; X ; 100 ; )
1
B
X;
100 ; C = P 90 ; 80 Z
C
= P B 90 ;
@
A
12
 {z }
=Z 100 ; 80
12 = P (:83 Z 1:67) = P (Z 1:67) ; P (Z < :83)
= P (Z < 1:67) ; P (Z < :83) = :95254 ; :79673 = :15581
Our book actually gives a fourth form of normal table (Table A.3 in
Appendix A) which gives P (Z > c) for selected values c 0. To use
that table for this problem we would notice that
P (90 X 100) = P (Z < 1:67) ; P (Z < :83)
= 1 ; P (Z 1:67)] ; 1 ; P (Z :83)]
= (1 ; :047) ; (1 ; :203) = :156
107 iii. Glaucoma is an eye disease characterized by high intraocular pressure (IOP). Suppose that the distribution of X =IOP in the general
population is N ( 2 ) where = 16 mm Hg, and = 3 mm Hg.
If the normal (i.e. healthy) range of IOP is de ned as between 12
and 20 mm Hg, what percentage of the general population would fall
in this range?
Answer: P (12 0
B
X 20) = P B 12 ;
@ = P 12 ; 16
3 X;  {z }
=Z 20 ; 1
C
C
A 20 ; 16 = P (;1:33 < Z < 1:33)
3
= 2P (0 < Z < 1:33) = 2 1 ; P (Z 1:33) = 2(:5 ; :092) = 0:816
2 Z or 81.6%.
Normal Percentiles:
Sometimes, we'd like to work backward and gure out what value of X is
associated with a particular normal probability, rather than what normal
probability is associated with a particular value of X , for a normal r.v.
X N ( 2 ).
That is, we'd sometimes like to nd the pth percentile for a random
variable X N ( 2 ) for any given values and 2 .
Fact: For X N ( 2 ) the 100pth percentile of the distribution of X
(xp , say) is related to zp , the 100pth percentile of the standard normal
distribution, via
xp = + zp :
()
Here, zp can be looked up in a normal table like the rst one in the
handout by nding p in the body of the table, and then nding zp
from the margins of the table.
108 Examples: iv. Recall that for 35{44 year old men, X =diastolic blood pressure follows a N (80 12) distribution. What is the 95th percentile of diastolic
blood pressure in this population?
We want x:95 . To get it, rst nd z:95 and then use the relationship
given by (*).
Using the rst normal table in the handout, we look up .95 in the
body of the table. .95 doesn't appear there, but .94950 and .95053
do, which gives z:94950 = 1:64 and z:95053 = 1:65
Therefore, z:95 should be about half way between 1.64 and 1.65 or
z:95 = 1:645:
{ An exact value for zp for any p can be obtained via a computer
program. For example, in Minitab we follow the steps given
before, but select \Inverse cumulative probability" rather than
\Cumulative probability", and then set \Input constant" to p.
{ Using Minitab we can nd that the exact value for p = :95 is
z:95 = 1:64485.
Now we use the relationship (*) to get the 95th percenitle for X : x:95 = + z:95 = 80 + 1:64485(12) = 99:7 mm Hg.
{ Note that the table in the back of our book gives P (Z > c)
rather than p = P (Z c), but since P (Z > c) = 1 ; P (Z
c) = 1 ; p we can obtain zp from the table in our book by
looking up 1 ; p in the body of the table.
{ E.g., looking up 1 ; :95 = :05 in that table, we again nd that
z:95 = 1:645. 109 v. Find the 10th percentile of diastolic blood pressure among 35{44
year old males. 0.3
0.2 N(0,1) p.d.f. at Z 0.0 0.1 0.2
0.1
0.0 N(0,1) p.d.f. at Z 0.3 0.4 Standard normal p.d.f. 0.4 Standard normal p.d.f. 4 2 0
Z value 2 4 4 2 0 2 4 Z value From the above picture, it is clear that z:10 = ;z:90 or, more generally,
zp = ;z1;p
Using the normal table in the back of our book, we look up .10 in
the body of the table to give z:90 = 1:28, so z:10 = ;1:28 and x:10 = + z:10 = 80 + (;1:28)(12) = 64:6: 110 Normal Approximation to the Binomial:
Recall that if X = the number of successes out of n independent, identical trials with constant success probability p, then X has a binomial
distribution.
We will write this as
X B in(n p)
Let's look at the binomial probability distribution for a particular value
of p, p = :4, say, as n gets bigger. Below we plot the Bin(n p = :4)
probability distribution for n = 3 n = 6 n = 9 n = 12 n = 15, and
n = 18.
P(X=x) 0.0 0.10 0.3 0.25 Probability distribution of X~Bin(6,0.4) 0.1 P(X=x) Probability distribution of X~Bin(3,0.4) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 5 Probability distribution of X~Bin(9,0.4) Probability distribution of X~Bin(12,0.4) 0.0 0.10 P(X=x) 0.20
0.10
0.0 P(X=x) 6 x 0.20 x 0 2 4 6 8 0 2 4 6 8 10 Probability distribution of X~Bin(15,0.4) 12 x Probability distribution of X~Bin(18,0.4) 0.10 P(X=x) 0.0 0.10
0.0 P(X=x) 0.20 x 0 5 10 15 0 5 10 x 15 x Notice that the binomial distribution looks more and more normal
as n gets large!
{ There is one important di erence: binomial is discrete, normal
is continuous. But this becomes less and less of a factor as n
gets large, and, as we'll see, we can adjust for this di erence
anyway. 111 So, the binomial looks more and more similar to a normal distribution as
n gets large, but which normal distribution is the best approximation to
the distribution of X B in(n p)?
The answer is: the normal distribution with the same mean and variance
as X .
That is, for n large,
Bin(n p) is well approximated by N (np np(1 ; p)): Example: Suppose again that 55% of UGA undergrads are women. Sup pose I take a random sample of n = 20 undergrads. What's the probability
that X =number of women in the sample turns out to be 12?
Based on X B in(n p) where n = 20, p = :55, we can compute this
probability exactly. Using the binomial probability function,
P (X = 12) = 20 :5512(1 ; :55)20;12 = :1623:
12
Notice, though, that this is a relatively hard calculation. E.g., 20! =
2:4329 1018 .
Since X B in(n p), its mean and variance are
E (X ) = np = 20(:55) = 11 and var(X ) = np(1 ; p) = 4:95
So, the distribution of X should be well approximated by a N (11 4:95).
Here's the actual binomial probability distribution with a N (11 4:95) superimposed: 0.10
0.05
0.0 P(X=x) 0.15 Probability distribution of X~Bin(20,0.55) 0 5 10 15 20 x
With N(11,4.95) p.d.f. superimposed 112 2
2
Let Y N ( Y Y ), where Y = 11, Y = 4:95. Then the normal approximation to the binomial probability we want is P (X = 12) P (11:5 < Y < 12:5) = P 11:5 ; Y < Y ; Y < 12:5 ; Y
Y
Y
Y
p
p
= P 11:5 ; 11 < Z < 12:5 ; 11
4:95
4:95
= P (:22 < Z < :67)
= P (Z < :67) ; P (Z < :22)
= :74857 ; :58706 = :16151
which agrees with the true answer when rounded to three decimal places.
Another example: to nd the binomial probability of 15 of more women
in the sample, we would use the approximation P (X 15) P (Y 14:5) where Y N (11 4:95) Remember, n must be large for this approximation to work well. In
fact, it will only work well if n is large and p is not too close to 0 or
1.
Rule of thumb: the normal approximation to a Bin(n p) distribution can
be expected to work well if np 5 and n(1 ; p) 5. 113 Sampling Distribution of the Mean*
Sampling Distributions: Sample statistics, such as the sample mean, sample standard deviation,
sample median, etc., are random variables.
Why?
Because they are computed on a random sample.
Therefore, if we were to repeat the process of taking a random sample,
any sample statistic (the mean, say) would vary from sample to sample,
in a way that is random, because the sampling was done at random.
Of course in practice, we generally only draw one random sample, but any
statistic from that sample is still a random quantity.
Hence, any sample statistic has
{ a probability distribution,
{ an expected value, or long run average over all the possible
random samples we could possibly take, and
{ a population variance, or long run variance over all of the possible random samples we could take.
The probability distribution of a sample statistic is called the sampling
distribution of that statistic.
That sampling distribution has a (population) mean, variance, standard
deviation, etc. The estimated standard deviation of a statistic is called
the standard error of the statistic.
Right now we will focus on the sample mean, its sampling distribution, and standard error, but it is important to realize that any
statistic has a sampling distribution and standard error. * Read Ch.8 of our text.
114 The sample mean:
The sample mean of observations x1 x2 : : : xn has an expected
value and variance that depends upon the expected value and variance of x1 x2 : : : xn .
{ E.g., if x1 x2 : : : xn are big, we expect the sample mean to be
big. If x1 x2 : : : xn vary a lot, we would expect their mean
to be highly variable too.
Consider a random sample of observationsn x1 x2 : : : xn , where each xi
1P
has mean and variance 2 . Let x = n i=1 xi denote the sample mean
of the xi 's.
We assume that the observations x1 : : : xn are independent of each
other. This is typically satis ed as a consequence of random sampling.
Then without knowing the probability distribution of the xi 's, we cannot
make an exact statement about the entire probability distribution of x,
but we can say that the sampling distribution of x
 has mean , and
 has variance 2 =n.
True for x1 : : : xn drawn from any probability distribution with
mean and variance 2 .
These results makes sense:
The sample mean should be centered at around the same place as
the xi 's, and
The sample mean should have variance that depends upon 2 , the
variance of the xi 's, but which also should be smaller than the variance of the xi 's.
Notice that var(x) = n2 depends on n. When the sample size is
large, the sample mean has small variance. 115 If we know the full probability distribution of the xi 's then we can say
more. In particular:
If x1 : : : xn are each normally distributed with mean and variance 2
(i.e., if xi N ( 2 ) for each i) then x N( 2 =n): Central Limit Theorem:
So, we have seen that if the xi 's have mean and variance 2 , then it is
always true that
E(x) =
var(x) = 2 =n
and if the xi 's are also normal, then x is normal too.
One of the most important theoreical results in statistics, the central
limit theorem allows us to go even farther:
If the xi 's have mean and variance 2 , then regardless of the distribution
of the xi 's, their sample mean is approximately normally distributed if the
sample size n is su ciently large. I.e.,
x : N ( 2 =n) for large enough n
or, if we standardize x (i.e., switch to Z scores):
p
Z = x=; n : N (0 1) for large enough n This remarkable results is the most important reason why the normal
distribution plays such a key role in statistics.
Among other things, the CLT allows statistical inference procedures
based on sample means (e.g., we typically use the sample mean to
make inferences on an unknown population mean) can be based on
the normal distribution (even if the original observations are not
normally distributed).
116 Example  Body Weights Although human heights are pretty close to normally distributed, weights
are not. Especially in the US, the distribution of body weight is skewed
right. That is, there are more very heavy people than there are very light
people.
Suppose that among US males, the average weight is 78 kg with a standard
deviation of 13 kg, and the distribution is skewed right.
Let x1 : : : Pn be a random sample of the weights of n = 30 US males and
x
1
let x = 30 30 xi be the sample mean weight.
i=1
Then even though the weights of individual subjects are not normally
distributed, the CLT implies that x is approximately distributed as
2
132 = N (78 5:63):
xN
n = N 78 30
Suppose we were to take samples of size n = 30 repeatedly, and
compute the sample mean each time. What proportion of those sample
means would be between 2 kg of the population mean weight (between 76
and 80 kg)?
We can translate this question to, What is P (76 x 80)?
Without knowing the exact distribution of weight, we don't know the exact
distribution of x, so we can't compute this probability exactly.
However, assuming that n = 30 is large enough, x is approximately
N (78 5:63), so we can approximate this probability as follows:
P (76 x 80) P (76 ; x ; 80 ; )
p
p
p
= P 76=; n x=; n 80=; n
p
p
= P 76 ; 78 Z 80 ; 78
13= 30
13= 30
= P (;:84 Z :84) = 1 ; 2P (Z > :84) = 1 ; 2(:200) = :6
or approximately 60% of the samples will have sample means between 76
and 80 kg (within 2kg of the true mean).
117 Now what body weight cuts o the upper 5% of the sampling distribution of the sample mean, for n = 30?
I.e., what is the 95th percentile of the sampling distribution of x?
{ This would be the weight such that, if the xi 's each have mean
78 kg and = 13 kg, we would expect to observe a sample
mean weight at least this small only 5% of the time.
Again, assuming that n = 30 is large enough for the CLT to hold, then the
sampling distribution is approximately normal with mean 78 and standard
p
deviation 13= 30 = 2:373.
So,
p
x;
p
Z N (0 1) ) x ; Z ( = n)
=n x p + Z ( = n) Therefore, the 95th percentile of the distribution of x is related to the 95th
percentile of the standard normal distribution via x:95 p + z:95( = n) = 78 + z:95 (2:373) We can get z:95 by looking up 1 ; :95 = :05 in Table A.3 in the back of
our book, which yields z:95 = 1:645, so x:95 = 78 + (1:645)(2:373) = 81:9:
So, the 95th percentile of the distribution of x, the sample mean
weight based on a sample of size n = 30 is 81.9 kg.
This means that when taking a sample of size 30 of weights that
have true mean 78 and true sd 13, 95% of the time we would expect
a sample mean less than 81.9 kg.
Based on this result, what would you conclude if you took a sample
of size 30 and found the mean to be 82.4 kg (say)?
{ You'd either have gotten a very unusual sample, or
{ the sample really didn't come from a distribution with mean
78 and standard deviation 13 in the rst place.
118 Now what weights enclose 95% of the sample means of size n = 30?
That is, what are the weights (kg) such that 95% of the time we
would expect to get sample mean weights between those values? 0.0 0.05 p.d.f. 0.10 0.15 Approximate probability density of sample mean based on n=30 72 74 76 78 80 82 84 Weight (kg)
the N(78,5.63) distribution This translates into nding x:025 and x:975 the 2.5th and 97.5th percentiles of the weight distribution.
By looking up .025 in Table A.3, we can nd that z:975 = 1:96 and z:025 = ;1:96:
Therefore, x:975
and x:025 p + z:975 ( = n) = 78 + 1:96(2:373) = 82:65
p + z:025( = n) = 78 + (;1:96)(2:373) = 73:35 So if weights have population mean 78 and population sd 13, we
expect the sample mean of 30 observations to fall between 73.35 and
82.56 kg about 95% of the time.
{ Again, if we took a single sample of size 30 and calculated a
sample mean outside of this range, we'd either have observed
an unusual result or we might be tempted to conclude that the
weights didn't have mean 78 and sd 13 in the rst place.
119 Con dence Intervals* Another way to look at the previous calculation is that we have used the
fact that
P (;1:96 Z 1:96) = :95
and the CLT to infer that
p
P ;1:96 x=; n 1:96 :95
)
) P P n x; ;1:96 p ;x ; 1:96 p ; n P +x + 1:96 pn )
) P x ; 1:96 pn 1:96 pn :95 ;x + 1:96 p n :95 +x ; 1:96 pn :95 x + 1:96 pn :95 Because of the above probability statement, we say that the interval
computed as
p
p
(x ; 1:96 = n x + 1:96 = n)
forms a (approximate) 95% con dence interval for .
Note what is random here. is a population mean. It is a xed (unknown) constant. x is random, because it is computed on a random
sample.
Therefore, we are attaching a probability to where x lies, not where
lies.
The interpretation here is that if we were to repeat the process by
which the upper and lower limits were calculated (drawing the sample, computing the sample mean, etc.) 95% of the time we would
get an interval that covers the true population mean .
* Read Ch.9 of our text.
120 The con dence interval that we just introduced is an example of one
of the methods of statistical inference. Statistical Inference: The typical paradigm for statistical inference is that we are interested in
some population characteristic or parameter:
{ e.g., the average cholesterol level of 4049 year old American
females,
{ the proportion of the US voting age population that approves
of the job that the president is doing,
{ the population variance in the cost of a certain medical procedure at US hospitals.
So, we collect a random sample representative of the population of interest,
and use variables measured on the sample to infer what is true of the
corresponding population parameter.
There are two main aspects of statistical inference: estimation and hypothesis testing.
1. Estimation
a. Point estimation. In point estimation, we simply use a sample statistic to give a numerical estimate of the corresponding
population value (parameter).
{ E.g., sample mean to estimation population mean, sample proportion to estimate population proportion, sample sd to estimate the population sd.
{ Good estimates should be unbiased (on target) and have small
variance (be precise).
b. Interval estimation. Almost all point estimates are likely to be
wrong. They may be close to the quantity being estimated, but
there is almost certainly some error (hopefully small). Con dence intervals quantify the uncertainty or error in our estimate
by nding an interval within which the population parameter
can be expected to lie with high probability.
{ Hopefully, that interval is narrow, meaning we're highly con dent that there is little error in our estimate i.e., it is a precise
estimate.
{ Con dence intervals must be interpreted carefully.
121 2. Hypothesis testing. In hypothesis testing we make a decision about
the population parameter based upon what we know about the corresponding sample estimate.
{ E.g., we decide whether the population mean is equal to a
certain value
{ we decide whether the population variance is equal to a certain
value
{ we decide whether two population proportions are equal to
each other, etc.
{ There is always the possibility that our decision will be wrong,
but in statistical hypothesis testing, we know the probability
that we have made the wrong decision.
Hypothesis testing and con dence interval estimation are really ipsides of the same coin. That is, they are two di erent ways to look
at the problem of statistical inference.
{ They always give compatible results, but in some cases it may
be more useful to frame an inference problem in terms of interval estimation and in other cases it may be more useful to
conduct hypothesis tests.
Point Estimation:
A statistic T is an unbiased estimator of a parameter if
E(T ) =
Otherwise, T is said to be biased.
A statistic can always be thought of as an estimator of its expected
value, or long run average.
All things being equal, we would always prefer an unbiased estimator
over a biased one.
The precision of an estimator refers to the amount of variance in its
sampling distribution.
The more variance in an estimator, the more spread out its values,
the less precise it is. 122 Bias and precision can be understood through the following picture: Accuracy of an estimator combines bias and precision. An accurate
estimator is one that has low bias and high precision.
123 Estimation of a population mean:
Suppose we have a random sample from a normal distribution. That
is, let x1 : : : xn be independent random variables, each with a N ( 2 )
distribution.
For now, suppose we know the value of 2 , the variance of each xi .
Based on a sample of size n we wish to make inference on .
1P
A natural estimate of is the sample mean x = n n=1 xi .
i Why? Because its expected value if . Recall E(x) = .
x is a point estimate of .
Because x1 : : : xn were assumed normal, x N ( 2 =n).
Even if x1 : : : xn were not normal, though, the CLT implies that
x : N ( 2 =n) (approximately normal) if n is large.
Precision of x:
Remember, the precision of a statistic is related to that statistic's variance
(the variance of its sampling distribution). The variance of x is var(x) =
2 =n, so
x is more precise when the sample size is large because that makes
2 =n small and x is more precise when 2 , the variance of the original data, is small,
because that also makes 2 =n small. 124 Back on p.120, we went through some calculations to show that for x
computed from a random sample with mean and variance 2 x ; 1:96 pn P x ; 1:96 pn :95 This probability becomes exact if the sample is drawn from a normal
distribution.
Therefore, we say that
p p p (x ; 1:96 = n x + 1:96 = n) = x 1:96 = n
is a 95% con dence interval for .
It is an exact 95% interval for samples drawn from a normal distribution, an approximate 95% interval for samples drawn from nonnormal distributions.
Interpretation: If we were to
{ draw a random sample of size n,
{ compute x
p
{ construct the interval x 1:96 = n
{ repeat this process many, many times,
then 95% of these intervals will contain . 125 p The precision of x is re ected in the width of interval, which is 2(1:96) = n.
E.g., again suppose we have a random sample x1 : : : xn of size n of
the weights (kg) of US males, and suppose that E(xi ) = (unknown)
and var(xi ) = 2 (known) for each i (each subject).
Then here are the widths of approximate 95% con dence intervals
for for di erent values of n and :
Population
SD ( )
8
13
18
23 Sample Size (n)
15
30
60
8.1
5.7
4.0
13.2
9.3
6.6
18.2 12.9
9.1
23.3 16.5 11.6 120
2.9
4.7
6.4
8.2 Notice the con dence intervals get narrower (more precise) as
{ Sample size increases.
{ Population SD decreases.
Notice that the 95% con dence is
x z1;:05=2 pn  {z } =z:975 =1:96 How do we get a 90% interval or a 99% interval?
General formula for CI for : For a random sample x1 ldots xn from
a normal distribution with common mean and common known variance
2 , a 100(1 ; ) con dence interval for
is given by
x z1; =2 pn :
This con dence interval is exact for normal distributions, approximate for nonnormal distributions by the CLT.
Beware that our book uses the notation zp for the value of the standard normal that cuts o 100p% in the upper tail. We use zp to
denote the value that cuts o 100p% in the lower tail.
126 Example  Birthweights of SIDS Babies
In 1976{77 there were 78 cases of crib death (SIDS) in King Co., WA.
The average birthweight in this sample was x = 2994 g.
Based on nationwide surveys of millions of deliveries, the mean birthweight in the US is 3300 g, with a standard deviation of 800 g.
Suppose that this sample of n = 78 babies is a random sample from
the total population of SIDS cases (it's not, but we'll assume so for
illustration purposes).
Find a 95% con dence interval for the population mean birthweight of
SIDS cases in the US.
Since we have speci ed that we want a 95% interval,
100 (1 ; )% = 95%
Therefore, 1 ; =2 = and ) = z1; =2 = Thus, the 95% interval for is x z1; =2 pn =
=
If we assume that birthweights are normally distributed, then this is
an exact 95% CI for . Otherwise, it is an approximate 95% CI for
. 127 Interpretation (short version): On the basis of these data, we are 95%
con dent that the population mean birthweight for SIDS infants in
the US is covered by the interval
( ) It is conventional to form 95% intervals. However, that is just tradition without any theoretical basis. Sometimes we may want other
con dence levels.
Suppose we had wanted a 90% con dence interval for .
Then 100(1 ; )% = 90% which implies that
=
so that and 1 ; =2 = z1; =2 = Thus the 90% interval is given by x z1; =2 pn =
=
Note that this interval is narrower than a 95% interval.
{ As the con dence level goes up, the width of the con dence
interval increases as well.
{ Intuition: for me to be very highly con dent that my interval
covers , I have to make my interval wide. 128 OneSided Con dence Intervals:
So far, we have just talked about two sided con dence intervals: con dence
intervals with a lower and upper bound that straddle the population mean
with some prespeci ed probability (95%, say).
In some situations, we are interested only in nding an upper bound, which
will fall above the population mean with some probability. Or perhaps, a
lower bound, which falls below the mean with some probability. Example  Cholesterol Level High cholesterol is considered a risk factor for heart disease. There is little
concern about low cholesterol levels  basically, the lower, the better.
So, we might be interested in estimating the mean cholesterol level in the
normal (healthy) population, and placing an upper bound on that mean,
such that we can be 95% sure that the population mean falls below that
upper bound.
This would be useful for deciding whether a patient with a given
cholesterol level has elevated cholesterol relative to the healthy population.
Suppose that the population standard deviation for cholesterol level
among healthy people is known to be 25 mg/dL. Cholesterol levels
are known to be somewhat skewed right.
Suppose that a random sample of 28 normal adults was obtained
and the sample mean cholesterol level was 168.3 mg/dL.
Obtain a 95% upper bound on , the mean cholesterol level in the healthy
population. 129 Answer: If we assume that the sample size is large enough for the CLT
to hold, then x:N 2 n:
or, if we switch to Z scores, this statement is equivalent to
x ; : N (0 1):
=pn
We know that P (Z ;1:645) = :95
because ;1:645 = z:05 , the 5th percentile of the standard normal distribution.
Therefore,
:95 = P (Z ;1:645) p
P x=; n ;1:645
;
p
=P x;
;1:645 = n
;
p
= P ; ;x ; 1:645 = n
; x + 1:645 =pn
=P That is, p P ( x + 1:645 = n) :95
()
p
so that x + 1:645 = n is a 95% upper con dence bound for .
Note that if cholesterol levels had been normal to begin with, then
(*) would have been an exact equality, and our con dence bound
would have been an exact 95% bound.
Since cholesterol level was nonnormal, we used the CLT to establish the approximate relationship given by (*) and our bound is an
approximate 95% bound. 130 So, in the example, the upper bound is given by
p p x + 1:645 = n = 168:3 + 1:645(25)= 24 = 176:7
So, we can be 95% con dent that the population mean cholesterol
level for healthy adults falls below 176.7.
The general formula for a 100(1 ; )% upper con dence bound on based
on a sample of size n from a population with standard deviation is
p x + z1; s:e:(x) = x + z1; = n:
A 100(1 ; )% lower con dence bound on is given by
p x ; z1; s:e:(x) = x ; z1; = n:
The case when is unknown:
To this point, we have assumed that we know the population sd .
Occasionally, this may be the case, but typically, is unknown and must
be estimated from the data just as must.
What should we expect this to do to our con dence intervals?
Well if we have to estimate an additional parameter , one should
expect that that would introduce additional uncertainty, and make
our con dence intervals wider. As we'll see, this intuition is correct. 131 Student's t distribution:
In the case when was known, we based our con dence interval for on
the fact that for a random sample from a N ( 2 ) distribution, the sample
mean follows a normal distribution:
p
Z = x=; n N (0 1)
(y)
(This result is only approximately true for a sample from a nonnormal distribution when n is su ciently large.)
p
That is, in the known case, we considered the distribution of x=; n to
derive a con dence interval for
In the unknown case, therefore, a natural starting point is to consider
the distribution of
t = x=; n
sp
where we've replaced by its sample estimate, the sample standard deviation s.
This makes sense, but once we replace by s, this quantity no longer
follows a standard normal distribution.
In fact, it can be shown that t follows a distribution that looks like the
normal, but is more spread out. That distribution is called Student's t
distribution.
This distribution is named after Student, which is the pseudonym of
the author who discovered it. 132 Student's t distribution is more spread out because having to estimate introduces additional uncertainty (error) and makes t a more
variable quantity than Z .
How much more variable?
That depends upon how precise s is as an estimate of , which is
determined by the sample size n, or equivalently, by the divisor n ; 1
in the formula for s, which is called the degrees of freedom of the
t distribution.
That is, there is a distinct t distribution for every possible value of
the degrees of freedom n ; 1.
{ I.e., the t distribution is a parametric distribution with parameter n ; 1, called its degrees of freedom.
As n grows, s becomes a better estimate of , and the t distribution
get's less spread out relative to the normal. Here are t distributions
for degrees of freedom equal to 3 6 9 relative to a standard normal
distribution.
0.4 Standard normal and t distributions (p.d.f.s) 0.2
0.1
0.0 p.d.f. of x 0.3 N(0,1)
t(3)
t(6)
t(9) 4 2 0 2 4 x
Vertical lines are 97.5th percentiles We denote the t distribution with d degrees of freedom by t(d). Here
we have the t(3), t(6), and t(9) degrees of freedom as well as the
N (0 1).
Notice that the spread in the t distribution decreases with the degees
of freedom n ; 1.
133 In fact, if n ; 1 is large enough, the t(n ; 1) and N (0 1) become
almost indistinguishable. Here is the t(30) compared to the N (0 1):
0.4 Standard normal and t(30) distribution (p.d.f.s) 0.2
0.0 0.1 p.d.f. of x 0.3 N(0,1)
t(30) 4 2 0 2 4 x
Vertical lines are 97.5th percentiles On these plots we've also plotted vertical lines for z:975 = 1:96, the
97.5th percentile of the standard normal distribution as well as, the
corresponding 97.5th percentiles for the t distributions: t:975(3) t:975 (6),
t:975(9), and, in the second plot, t:975 (30).
Recall z:975 = 1:96 was the multiplier for obtaining a 95% CI for in the
known case. In that case the 95% CI for was given by x z:{z } s:e:(x) = x 1:96 pn
 975
=1:96 In the unknown case, a 95% CI for based on a sample of size n is given
by
s
x t:975 (n ; 1)s:e:(x) = x t:975 (n ; 1) pn
Thus, in the unknown case, our multipler chages from z:975 to
t:975(n ; 1).
{ As we can see from the plots, t:975 (n ; 1) is a bigger number,
especially when n ; 1 is small, so we get a wider interval.
134 Note that for degrees of freedom n ; 1 = 30, the z and t multipliers
are very close. This observation has led to the often given rule of
thumb that for n ; 1 30 we can use the z multiplier in place of the
t to form a con dence interval for even when is unknown.
{ This replacement introduces some error in the calculation, but
not much, especially for n ; 1 much larger than 30.
General Formula: In general, for a sample of size n drawn from a normally distributed population with mean and variance 2 , a 100(1 ; )%
CI for is given by
p x t1; =2(n ; 1)s:e:(x) = x t1; =2(n ; 1)s= n
This interval is approximately correct even if the sample is drawn
from a nonnormal population, as long as the sample size is large. Example { Lead Content in Boston Drinking Water Recall the following data on the lead content (mg/liter) in 12 samples
of drinking water in the city of Boston, MA.
:035 :060 :055 :035 :031 :039 :038 :049 :073 :047 :031 :016
Assuming that lead content is normally distributed, form a 90% CI for
the mean lead content in Boston drinking water.
Answer: In this case, we do not know the population mean or population
standard deviation, so they must be estimated from the sample data:
n
1 X = 1 (:035 +
x= n
i=1 n v
u
u
s=t 1 f( n
X n ; 1 i=1 + :016) = :0424 1
x2 ) ; nx2g = 11 f:0352 +
i The standard error of x is + :0162 ) ; 12(:04242 )g = :0153 0153
s
s:e:(x) = pn = :p = :00441
12
135 Here we want a 90% interval, so 100(1 ; ) = 90, or
= and 1 ; =2 = Since the population sd is unknown, we must use the t distribution to
form our interval. The sample size is n = 12, so the appropriate degrees
of freedom is n ; 1 = 12 ; 1 = 11. Going to the back of our book, Table
A.4, we nd
t1; =2(n ; 1) = t (11) =
Therefore, a 90% CI for is given by x t1; =2 (n ; 1)s:e:(x) = :0424 (
=( ) )(:00441) We are 90% con dent that the mean lead content in Boston drinking
water lies between
and
.
Note that this is an exact 90% interval because the sample was drawn
from a normal population. However, this would be an approximate
90% CI if lead content was not normally distributed, provided that
the sample size was large enough for the CLT to hold.
How large does the sample size have to be for the CLT to hold?
Tough question. It depends upon how close to normal the population
was that we sampled from.
{ If we drew a sample from a very nonnormal population (highly
skewed and/or highly discrete) then it requires a larger sample
size in order for sample means from that population to follow
a normal sampling distribution.
{ If the population we drew from to begin with is nearly normal,
though, a much smaller sample size may su ce.
The sample size necessary for the CLT to hold can be quite small in
cases  as small as n = 5 sometimes  but to be safe, we generally
need samples of size 25 or more to be fairly con dent that normal
distribution will provide a good approximation to the sampling distribution of x.
136 Hypothesis Testing*
The other main aspect of statistical inference (besides point and interval
estimation) is hypothesis testing.
In hypothesis testing we make a decision about the true state of the population based upon what we know concerning the sample.
This decision is guided by probability.
A good metaphor for the approach used in statistical hypothesis testing is
the American legal system.
\Innocent until proven guilty" means
 we assume innocence
 we collect and examine evidence to contradict innocence
 if evidence is strongly againsth innocence (beyond a reasonable doubt)
we reject innocence and conclude the alternative, guilt.
 if not, we haven't proven innocence, only failed to proved guilt and
assumption of innocence is maintained.
The prosecutor's hypothesis is that the defendant is guilty, so he/she
assumes the oppositive and tries to disprove it.
In statistical hypothesis testing, the researcher plays the role of the prosecutor. His/her research hypothesis is \guilt" so he/she assumes the opposite, which is called the null hypothesis, and is typically represented
as H0 (\H naught").
For example,
H0 : defendant is innocent
or
H0 : no association between obesity and diabetes * Read Ch.10 of our text.
137 The hypothesis that the researcher is trying to prove is called the alternative hypothesis, denoted HA , or sometimes, H1 .
For example, HA : defendant is not innocent (guilty)
or HA : there is an association between obesity and diabetes
The alternative hypothesis is always framed in such a way that it is
the only other possibility under consideration, if the null hypothesis
is not true. That is, the alternative hypothesis is HA : not H0
Typically, we are interested in the true state of nature in the population
operationalized in terms of the true value of some parameter, or parameters.
The simplest case: we want to test a hypothesis about a population mean. Example  Birthweights of SIDS Cases Based on nationwide surveys of millions of deliveries, the mean birthweight in the US is 3300 g, with a standard deviation of 800 g.
We want to investigate whether the population mean birthweight of
SIDS cases is di erent from that of the general population.
Recall that in 1976{77, there were 78 SIDS cases in King County,
WA. The sample mean birthweight among the King Co. cases was
x = 2994 g.
We will assume that these cases are a random sample from the population of SIDS cases nationwide (strong, questionable assumption).
We will also assume that SIDS birthweights are normally distributed,
with population sd = 800, the same as in the general population.
Is the mean birthweight among SIDS cases di erent than in the general population?
138 Let be the population mean birthweight among SIDS cases in the
US. The null hypothesis is what we want to disprove. In this case,
then, our null hypothesis is H0 : = 3300g
The value that we assume for under the null hypothesis is called the
null value for and is denoted as 0 . That is, our null hypothesis
is of the form H0 : = where 0 0 = 3300 g How about the alternative hypothesis?
Here, there are three possibilities for the truth: < 0 = 0 or > 0 In a onesided alternative hypothesis situation, the researcher/analyst
makes an a priori assumption and dismisses either < 0 or > 0 as
out of the realm of possibility.
In the SIDS example, the researcher may be willing to assume a
priori that there is no possible way that the mean SIDS birthweight
could be greater than in the general population. In that case HA : not H0
becomes HA : < where 0 0 = 3300 g In a twosided alternative hypothesis situation, the researcher/analyst
makes no such a priori assumption, so that the alternative hypothesis
becomes
HA : 6= 0
where 0 = 3300 g
We will concentrate on onesided alternatives rst, and then discuss
how things change when we instead use a twosided alternative.
139 Type I and II Errors
In performing a hypothesis test there are two possible states of nature and
two possible conclusions that can be made:
State of
Nature
H0 is true
H0 is false
Conclusion
Fail to Reject H0
Correct
Type II Error
Reject H0
Type I Error
Correct
We can make errors in two ways:
I We can incorrectly reject H0  A Type I Error
II We can incorrectly fail to reject H0  A Type II Error
Ideally, we rarely make errors of wither type.
Let = P (we make a Type I error)
= P (we make a Type II error)
We would like to simultaneouly minimize both and .
However, the only one of these that we have complete control over
is . depends upon how false the null hypothesis is.
{ Why? Because if the true value of is far from 0, its a lot
easier to reject H0 : = 0 , than if is closer to 0 .
So, we construct our test in such a way that is small. 140 Example  Birthweights of SIDS Cases (Continued)
Suppose that we are interested in the onesided alternative, so we want to
test
H0 : = 0 versus HA : < 0 where 0 = 3300 g
That is, we're willing to dismiss the possibility that SIDS cases might
have birthweights greater than the general population.
Given that we don't know , how do we decide in favor of H0 or HA ?
Answer: we look at how much smaller x is than 0 .
If x is much smaller than 0 = 3300 then there's strong evidence
against H0 and we reject H0 in favor of HA .
{ Suppose x had been 1100 g. That seems very far from 0 =
3300, so we would have little trouble concluding in favor of HA .
{ However, what if x had been 3250 g? That's smaller than
0 = 3300, but is it small enough enough to conclude that
< 3300?
How much smaller than 3300 must x be before we're willing to conclude that < 3300?
In the legal system the evidence against the null hypothesis of innocence
must be \beyond a reasonable doubt."
In hypothesis testing, \beyond a reasonable doubt" is , the probability
that we reject H0 when it is really true (the probability of convicting an
innocent person).
We set this probability low, to some prespeci ed level called the
signi cance level of the test.
{ The conventional choice for the signi cance level is = :05,
but this is just convention. Other values such as = :1 or
= :01 are also sometimes used.
How do set the signi cance level low? By requiring x to be smaller
than 0 by enough so that such an extreme value would be unlikely,
if the null hypothesis were true.
141 So, we look at how unlikely it is, given that the null hypothesis is true, to
have observed an x at least as far from 0 as the one we got.
This probability, the probability of obtaining a result at least as
unlikely as the one obtained, given that the null hypothesis is true,
is called the pvalue of the test.
Then if the pvalue is small enough (smaller than the prespeci ed signi cance level ), then we reject H0 .
In the SIDS example, suppose we decide to test H0 : = 3300 using
signi cance level = :05.
That is, we are going to require that x be fairly unusually small,
something that occurs 5% of the time, assuming that the null hypothesis is true, before we decide that the null hypothesis isn't really
true.
Recall that in our sample of n = 78 cases, the sample mean was x = 2994
g. This is less than 0 = 3300, but how unlikely is it to get a sample mean
that's as small as 2994 given that the population mean is = 3300 (given
that H0 is true)?
That is, what's the pvalue here? 142 Since we assumed that SIDS birthweights are N (
then based on a sample size of n = 78, xN 2 ), where = 800, 8002
n = N 3300 78
2 assuming that H0 : = 3300 is true.
Therefore, our pvalue is p = P (x 2994) = P (x ; 2994 ; )
;
p
= P x=; n 2994pn
=
;
= P Z 2994p
800= 78
p
= P Z 2994 ; 3300 assuming H0 : = 3300 is true
800= 78
= P (Z ;3:38) = P (Z 3:38) = :00036
So, under the null hypothesis, we would expect to get a sample mean
at least as small as the one we got with probability .00036 (or only
.036% of the time).
Therefore, either H0 : = 3300 is true and we've observed a very
unusual event, or H0 : = 3300 is not true.
Since, p = :00036 is less than our chosen signi cance level of =
:05, its such an unusual event under H0 , that we're willing to reject
H0 : = 3300 in favor of HA : < 3300. 143 Steps in a statistical hypothesis test:
1. State the research question in terms of the null and alternative hypotheses
{ In the previous SIDS example, H0 : = 0 versus HA : < 0 ,
where 0 = 3300 g.
2. Specify the signi cance level.
{ In the SIDS example, we used = :05.
3. Choose an appropriate test statistic.
{ In the SIDS example, since we're testing a hypothesis on the
population mean , we based our test on the sample mean x.
{ Speci cally, however, we looked at how much smaller x was
p
than 0 relative to the standard error of x, = n. That is, we
looked at the test statistic:
;0
z = x =pn :
4. Collect the data and compute the necessary sample statistics and
test statistic.
{ We collected the data and computed x and then the test statistic z , which turned out to be z = ;3:38.
5. Calculate the pvalue, compare it to the signi cance level , and
state the conclusion. It is good practice to report not only the result
of the test (reject, fail to reject) but also the numeric value of the
test statistic and the numeric pvalue.
{ We found that p = :00036, so we rejected H0 : = 3300 in
favor of < 3300. Our conclusion was that the population
mean birthweight for SIDS cases is less than 3300 g, the mean
birthweight in the general population (z = ;3:38, p = :00036). 144 We have emphasized the pvalue approach to making the decision whether
to reject, or fail to reject the null hypothesis.
Compute the pvalue and reject if p < .
This is the preferred method of conducting the test, but you should be
aware that there is another, equivalent approach for making our conclusion
known as the critical value approach.
To understand the critical value approach, think back to our SIDS example. There, we observed x = 2994, which was low relative to the null value
of 0 = 3300.
;0
This led to a test statistic of z = x=pn = ;3:38. Notice that if x had been closer to 0 , then
{ the test statistic would have been closer to 0,
{ and the pvalue would have been larger.
E.g., if x had been 3250, say, then the test statistic would have been
;0
p
z = x =pn = 3250 ; 3300 = ;:55
800= 78 which has pvalue p = P (Z < ;:55) = :291, which is > = :05 and
we would not have rejected H0 .
Thus di erent values of x lead to di erent test statistics.
The rejection region of a test, is the set of values of the test statistic
which lead to rejection of H0 .
Equivalently, the set of values that lead to p values less than .
For a given level , the critical value of a test statistic is the boundary
of the rejection region.
I.e., it is the value of the test statistic which is just barely large
enough in magnitude to lead to rejection of H0 at a given signi cance
level . 145 In the SIDS example, and in general for testing H0 : = 0 versus a onesided alternative for a normal sample with known sd , our test statistic
is
;0
z = x =pn distributed as N (0 1) under H0 0.4 Below is a picture of the distribution of this statistic under H0 : 0.2
0.0 0.1 p.d.f. of z 0.3 z_.05, the 5th %’ile of N(0,1)
z = 3.38, value of our test stat 4 2 0 2 4 value of z, the test statistic The solid vertical line is z:05 = ;1:645, the 5th percentile of the N (0 1)
distribution. That is, the area under the curve to the left of that line is
.05.
Therefore, for an = :05level test, if our test statistic had turned
out to < ;1:645, then we would reject H0 if it had been > ;1:645
then we would have failed to reject H0 .
Thus, (;1 ;1:645) is the rejection region of our test, (;1:645 +1)
is the acceptance region, and 1.645 is the critical value because it is
the boundary of the rejection region.
Thus, instead of computing the pvalue of our observed test statistic z =
;3:38 and comparing it to
= :05, we could instead have compared
z = ;3:38 to the critical value z:05 = ;1:645. Since z = ;3:38 < z:05 =
;1:645, we reject H0 . 146 What if our alternative hypothesis had been HA : > 0 rather
than HA : < 0 ?
In that case, we would have been looking for large values of our test statistic
;0
z = x=pn .
In particular, we would have rejected H0 in favor of HA : >
;0
z = x =pn > z:95 = 1:645 0 if Notice that for either direction of the onesided alternative we rejected H0 if jzj > z:95 = 1:645.
General method for an level test of H0 : = 0 versus a onesided
alternative based on a sample of size n from the N ( 2 ) distribution
when 2 is known:
Critical value approach: reject H0 if x; 0 is consistent with the alternative
hypothesis and
x; 0 >z :
jz j =
1;
=pn
Otherwise, we fail to reject.
pvalue approach: reject H0 if p < . Let Z denote a N (0 1) random
variable, and z the value of our test statistic. The pvalue is computed as P (Z < z) f the alternative is H
<
p = P (Z > z) iif the alternative is HA :: > 0,,
A
0 147 The Case When is Unknown:
If is unknown, the logic of testing H0 : = 0 doesn't change at all. ;0
However, the test statistic z = x=pn is no longer available to us, because
is unknown. Instead, we do the obvious thing and replace byits sample
estimate, s.
That substitution changes our test statistic from z to t, where
t = x ; n0
s=p and s is the sample standard deviation.
Of course, this also changes the distribution of our test statistic. While
z N (0 1) under H0 , t t(n ; 1) under H0 .
This a ects how we compute the pvalue and critical value for our
test, but not the basic logic of the testing procedure or the steps
taken to implement the test. Myocardial Infarction (Heart Attack) A topic of recent clinical interest is the possibility of using drugs to
reduce the size of the infarct (area of tissue death due to loss of blood
ow) who have had a myocardial infarction within the last 24 hours.
Suppose we know that in untreated patients, the mean infarct size is
25 (ckgEQ=m2 ). Furthermore, in 8 patients treated with a drug,
the sample mean infarct size was 16 with a sample standard deviation
of s = 10.
Do the treated patients have smaller than average infarct size?
Let = population mean infarct size for patients treated with the drug.
Then our hypotheses that we'd like to test are H0 : = 0 versus HA : <
Suppose we want an = :05level test.
148 0 where 0 = 25. The logic here remains the same as before. Since we're interest in a population mean , we examine the sample mean x.
Speci cally, we calculate the pvalue: the probability of observing a sample
mean at least as extreme (as small, in this case) as the one we got (16),
under the null hypothesis that = 25: p = P (x 16) = P (x ; 1
0
B x ; 16 ; C
= P B s=pn s=pn C
B
C
@ {z }
A
t(n;1) = P t(n ; 1) 16 ; ) 16 p
;
s= n 0
1
B
Bt(n ; 1) 16 ; 0 C
C
pC
=PB
B
s={z n }C
@
A
 assuming H0 : = 0 is true = t, our test stat
p
= P t(n ; 1) 16 ; 25
10= 8
= P (t(n ; 1) ;2:55) = :0191 Here, P (t(n ; 1) ;2:55) = :0191 was computed in Minitab.
Conclusion: since p = :0191 < = :05, we reject H0 : = 25 and
conclude that the mean infarct size among treated patients is smaller
than the average infarct size of untreated patients. 149 The basic steps of hypothesis testing haven't changed:
1. State the research question in terms of the null and alternative hypotheses
{ H0 : = 0 versus HA : < 0 , where 0 = 25.
2. Specify the signi cance level.
{ We used = :05.
3. Choose an appropriate test statistic.
{ We based our test on the sample mean x and formed a test
statistic equal to
t = x ; n0 :
s=p
4. Collect the data and compute the necessary sample statistics and
test statistic.
{ We collected the data and computed x = 16 and then the test
statistic t, which turned out to be t = ;2:55.
5. Calculate the pvalue, compare it to the signi cance level , and
state the conclusion.
{ We found that p = :0191 < = :05, so we rejected H0 : = 25
in favor of < 25. 150 0.4 0
Below is a picture of the distribution of our test statistic t = x; n
s=p
under H0 : = 0 : 0.2
0.0 0.1 p.d.f. of t 0.3 t_.05(7), the 5th %’ile of t(7)
t = 2.55, value of our test stat 4 2 0 2 4 value of t, the test statistic In the critical value approach, instead of comparing p to .05, we would
compare our test statistic t to the critical value t:05 (7), the 5th percentile
of the t distribution on n ; 1 = 7 degrees of freedom.
Equivalently, we can compare jtj = j ; 2:55j = 2:55 to t:95 (7) which
is just 1 times t:05 (7).
From Table A.4 in the back of our book, t:95 (7) = 1:895, so since
jtj = 2:55 > 1:895, we reject H0 at level = :05. 151 General method for an level test of H0 : = 0 versus a onesided
alternative based on a sample of size n from the N ( 2 ) distribution
when 2 is unknown:
Critical value approach: reject H0 if x; 0 is consistent with the alternative
hypothesis and
x ; 0 > t (n ; 1):
jtj =
1;
s=pn
Otherwise, we fail to reject.
pvalue approach: reject H0 if p < . Let t(n ; 1) denote a random variable
with this distribution, and t the value of our test statistic. The pvalue is
computed as
is H
<
p = P (t(n ; 1) < t) iif the alternative is HA :: > 0,,
P (t(n ; 1) > t) f the alternative A
0
Twosided Alternatives:
Often, we are not willing to dismiss either > 0 or < 0 , the two
possible alternatives to = 0 . In such cases, the appropriate set of
hypotheses to test are H0 : = 0 versus 6= 0 How does this a ect our testing procedure?
Again, the answer is that it doesn't really change the logic or the steps
in the procedure, it just changes how we compute the pvalue or critical
value. 152 Example  Serum Cholesterol of Asians vs. Americans Suppose we want to compare the mean serumcholesterol level among
recent Asian immigrants to the US with the population mean in the
US.
Suppose we assume that cholesterol levels in women aged 2140 years
in the US are normally distributed with population mean 190 mg/dL
and population sd 40 mg/dL.
Suppose that we take a random sample of n = 100 recent Asian
immigrant women in this age range, and measure cholesterol level
on these subjects. The average cholesterol level in this sample was
x = 181:52 mg/dL and we are willing to assume the population SD
among these Asian immigrants is = 40, the same as it is among
Americans.
Is the mean cholesterol level among recent Asian immigrant women
the same as that of the coresponding general US population?
Steps for conducting hypothesis test to address this question:
1. State the research question in terms of the null and alternative hypotheses
{ Let = the mean cholesterol level among the Asian population. Then our hypotheses are H0 : = 0 versus HA : 6= 0 where 0 = 190: { This is a twosided alternative situation because if the Asians
di er from the general US population, we can't be sure that
their cholesterol levels will be lower or higher.
2. Specify the signi cance level.
{ We'll stick with = :05 for now. 153 3. Choose an appropriate test statistic.
{ Since we're interested in and whether it di ers from 0, it
still makes sense to examine x and how far it di ers from 0 .
{ In addition, we know here.
{ Therefore, it still makes sense to base inference on the test
statistic
;
z = x =pn0
4. Collect the data and compute the necessary sample statistics and
test statistic.
{ We collected the data and computed x = 181:52. The test
statistic is computed as
;
p
z = x =pn0 = 181:52 ; 190 = ;2:12:
40= 100 5. Calculate the pvalue, compare it to the signi cance level , and
state the conclusion.
{ Here's where things di er from the onesided alternative case. 154 0.4 { The pvalue is the probability of getting a result at least as
extreme as the one that we obtained. That is, the probability
of a result which provides evidence at least as strong against
the null hypothesis (in favor of the alternative). Picture: 0.2
0.1
0.0 p.d.f. of z 0.3 z=2.12, value of our test stat
z=2.12, a test stat equally in favor of H_A 4 2 0 2 4 value of z, the test statistic { In the picture above, our test statistic is t = ;2:12. Notice,
that any value of the test statistic ;2:12 and any value of the
test statistic 2:12 would provide at least as much evidence
in favor of HA : 6= 0 .
{ Therefore, the pvalue here is computed as p = P (Z ;2:12)+P (Z 2:12) = 2P (Z 2:12) = 2(:017) = :034 { Notice that this is exactly twice as large as the pvalue we
would have obtained for a onesided laternative HA : < 0 .
Since p = :034 < = :05, we reject H0 and conclude that recent
Asian immigrant women between the ages of 21 and 40 years have
di erent (in this case lower) mean cholesterol level that the corresponding US population (z = ;2:12, p = :034). 155 0.4 To understand how the critical value approach di ers in the twosided
alternative case, a picture is again helpful: 0.2
0.0 0.1 p.d.f. of z 0.3 z=2.12, value of our test stat
z_.025=1.96, lower boundary of acceptance region
z_.975=1.96, upper boundary of acceptance region 4 2 0 2 4 value of z, the test statistic The solid line at z:025 = ;1:96 is the value such that 2.5% of the
area under the curve falls to the left of that line.
Since the pvalue in a twosided alternative situation is twice the
probability in onetail, a value of the test statistic equal to z:025 =
;1:96 would have had a pvalue of p = 2(:025) = :05.
Similarly, if the value of the test statistic had been equal to z:975 the
pvalue would also have been p = 2(:025) = :05.
Therefore, the rejection region of our test would include all values of
the test statistic z:025 = ;1:96 and all values z:975 = 1:96.
Thus, there are two boundaries of the rejection region and hence two
critical values: z:025 = ;1:96 and z:975 = 1:96.
So, based on the critical value approach, we would reject H0 if our test
statistic z < z:025 or if z > z:975 .
Equivalently, we reject H0 at level = :05 if
jz j > z:975 = z1;:025 = z1;:05=2: 156 General method for an level test of H0 : = 0 versus a twosided
alternative HA : 6= 0 based on a sample of size n from the N ( 2 )
distribution when 2 is known:
Critical value approach: reject H0 if
jz j = x; 0 >z
1; =2 :
=pn Otherwise, we fail to reject.
pvalue approach: reject H0 if p < . Let Z denote a N (0 1) random
variable, and z the value of our test statistic. The pvalue is computed as p = 2P (Z > jzj)
General method for an level test of H0 : = 0 versus a twosided
alternative HA : 6= 0 based on a sample of size n from the N ( 2 )
distribution when 2 is unknown:
Critical value approach: reject H0 if
jtj = x; 0 >t
1; =2 (n ; 1):
s=pn Otherwise, we fail to reject.
pvalue approach: reject H0 if p < . Let t(n ; 1) denote a random variable
distributed at t(n ; 1), and t the value of our test statistic. The pvalue is
computed as
p = 2P (t(n ; 1) > jtj) 157 Example  SerumCreatinine The mean serumcreatinine level measured in 12 patients 24 hours
after they received a newly proposed antibiotic was 1.2 mg/dL. The
sample sd was 0.6 mg/dL.
Suppose that it is known that the general population has a mean
serumcreatinine of 1.0 mg/dL.
Does the population mean serumcreatinine level among patients
treated with the antibiotic di er from that of the general population?
We assume that serumcreatinine in the population of interest is
normally distributed with mean and unknown variance.
We also assume that the 12 patients are randomly sampled from the
population of interest (all patients given this antibiotic).
1.
H0 : = 0 versus HA : 6= 0 where 0 = 1:0
2. For variety's sake, let's test at = :01 for a change.
0
3. We test based on t = x; n .
s=p { We reject H0 at level if jtj > t1; =2 (n ; 1).
{ Or, equivalently, we reject H0 at level if p < .
4. x = 1:2, s = 0:6, so our test statistic is
p
t = x ; n0 = 1:2 ; 1:0 = 1:15
s=p
0:6= 12 5. The pvalue is p = 2P (t(n ; 1) > jtj) = 2P (t(11) > 1:15) = 2(:1373) = :2746
Since p > = :01 we fail to reject H0 .
158 5 Equivalently, we could compare jtj to the critical value t1; =2(n ; 1) = t1;:01=2(11) = t:995(11) = 3:106
Since jtj = 1:15 < t:995 (11) = 3:106, we fail to reject H0 .
Conclusion: There is insu cient evidence to conclude that the mean
serumcreatinine level among patients treated with the antibiotic
di ers from the mean serumcreatinine in the general population.
Power and Sample Size
Recall from our discussion of error types when conducting a statistical
hypothesis test that
= P (we make a Type I error) = P (reject H0 when H0 is true)
= P (we make a Type II error) = P (not reject H0 when H0 is false)
We construct our test in such a way to ensure that is equal to
some prespeci ed small value (e.g., = :05).
We constructed our test to control to be small. We'd like to be
small too, but we noted that depends upon \how false" the null
hypothesis is. 159 Example  Birthweights of SIDS Cases Recall that we had a sample of n birthweights of SIDS babies with
a sample mean of x = 2994. We assumed that = 800 and we used
the 1sample z test to test H0 : = versus HA : < 0 where 0 0 = 3300. Picture:
0.0020 Sample size is n=15 here 0.0020 Sample size is n=15 here 0.0010 pdf of xbar 0.0015 True p.d.f. of xbar
True mean, mu=2850
Null value, mu0=3300 0.0 0.0005 0.0010
0.0005
0.0 pdf of xbar 0.0015 True p.d.f. of xbar
True mean, mu=3250
Null value, mu0=3300 2000 2500 3000 3500 4000 2000 xbar 2500 3000 3500 4000 xbar Here, we've assumed that the true value of , the mean birthweight
of SIDS cases is = 3250 on the left and = 2850 on the right.
We've also assumed a sample size of n = 15, so that the true distribution of x is x N( 2 =n) = N ( 8002 =15) = N ( 42666:67) Clearly, , is smaller when is far from 160 0 (in the plot on the right). is the probability of failing to reject H0 when it is false. That is, failing
to detect the truth of HA .
(In the current context, is the probability of failing to detect a true
di erence between and 0 ).
Often it is more convenient to think in terms of the probability of detecting
the truth of HA (detecting a di erence between and 0).
This probability is called the power of the test and it is simply
power = P (rejecting H0 jH0 is false)
= 1 ; P (not rejecting H0 jH0 is false) = 1 ;
Thus, the further is from 0 , the smaller is and the larger the
power is.
{ It is easier to reject H0 (power is high) when H0 is \very false"
(plot on the right) than when H0 is only slightly false (plot on
the left).
As we noted, though, we can't control how false H0 is, because we can't
control the true population mean .
However, power also depends upon the spread in the distribution of x.
Suppose that instead of the picture on the previous page, we had less
spread in the distribution of x:
Sample size is n=78 here 0.004 0.004 Sample size is n=78 here pdf of xbar 0.002 0.003 True p.d.f. of xbar
True mean, mu=2850
Null value, mu0=3300 0.0 0.001 0.002
0.001
0.0 pdf of xbar 0.003 True p.d.f. of xbar
True mean, mu=3250
Null value, mu0=3300 2000 2500 3000
xbar 3500 4000 2000 2500 3000 3500 xbar 161 4000 Clearly, now it is easier to reject H0 in both cases. This is because
the spread in the distribution of x has decreased: x N( 2 =n) = N ( 8002 =78) = N ( 8205:13) That is, the less spread in the distribution of x, the greater the power.
The spread in the distribution of x is quanti ed by var(x) = 2 =n.
So, this spread depends on
{ 2 (Power increases as 2 decreases.)
{ n, the sample size. (Power increases as n increases.)
Note that we can't control 2 , but we can control n, the sample size,
when we design the study.
So, power and sample size are intimately related. A given sample size
implies a certain power, and a certain power implies a certain sample size.
Typically, at the design stage of a study, the speci c hypothesis test
that will be used to analyze the study is identi ed, and then the
minimum sample size is determined so as to achieve a prespeci ed
desired level of power.
{ Typically, it is desirable to have power of 80% or higher. Otherwise, there's a pretty good chance (20%) that we won't be
able to detect the di erence (e ect) we are interested in even
if it's real, which makes the study not worth doing. 162 Of course, power depends upon a variety of other factors besides
sample size. It depends on
i. Sample size
{ The larger the sample size, the greater the power.
{ Can be controlled in the design of the study.
ii. the true di erence we are trying to detect (how false H0 is, or
the true di erence ; 0 ).
{ Bigger di erences are easier to detect (result in higher power).
{ Unknown, so must be assumed.
iii. the population SD .
{ The less variable the population is (smaller ), the easier it is
to detect e ects (easier to to detect a signal when there's not
much noise (static)).
iv. , the signi cance level.
{ Similar to senstivity and speci city in diagnostic testing, there's
a tradeo between and (and hence between and power).
{ Decreasing makes it harder to reject H0 , which decreases
power (increases ). 163 To understand the tradeo betwen and , recall that in the onesample
z test of H0 : = 0 versus HA : < 0 we reject H0 if
p
;0
z = x =pn < z = ;z1; or, equivalently, if x < 0 ; z1; = n
Consider the following picture:
n=78, sigma=800, and alpha=.001 here 0.004 0.004 n=78, sigma=800, and alpha=.05 here pdf of xbar 0.002
0.003 True p.d.f. of xbar
True mean, mu=2850
Null value, mu0=3300
xbarz_(1alpha)*sigma/sqrt(n), alpha=.001 0.0 0.001 0.002
0.001
0.0 pdf of xbar 0.003 True p.d.f. of xbar
True mean, mu=2850
Null value, mu0=3300
xbarz_(1alpha)*sigma/sqrt(n), alpha=.05 2000 2500 3000
xbar 3500 4000 2000 2500 3000 3500 4000 xbar In both pictures, the true mean is = 2850, the true population SD
is = 800, and the sample size is n = 78.
In the picture on the left we are testing at = :05, and on the right
we are testing at = :001.
{ Note that decreasing makes it harder to reject H0 : =
0 = 3300, so we need to observe a value of x which is more
inconsistent with the null hypothesis. That is, we need to
observe a smaller x to reject H0 if is small.
{ In the plot on the left, = :05 so we reject if
800
x < 0;z1;:05 pn = 3300;1:645 p = 3151:01 = dashed line
78
and on the right = :001, so we reject if
800
x < 0;z1;:001 pn = 3300;3:090 p = 3020:08 = dashed line
78
164 If the true population mean is = 2850, then the bellshaped curve
in the pictures is the true p.d.f. of x.
{ The area under that curve to the right of the dashed line is ,
the probability of getting a value of x that would lead us to
fail to reject H0 even though it is false.
So, as decreases, increases, and hence the power decreases too. Example  Determining Power for A Proposed Study A new drug is proposed for people with high intraocular pressure
(IOP), to prevent the development of glaucoma. A pilot study was
conducted with the drug among 10 patients and their mean IOP
decreased by 5 mm Hg with a SD of 10 mm Hg after 1 month of
using the drug. The investigators propose to study n = 50 patients
in the main study. What would the power of such a study be to
detect a reduction of 5 mm Hg after 1 month of use of the drug?
For now, we will assume that the true population SD is known to be
10 as obtained in the pilot study.
We will also assume that the test to be used will be an = :05level
z test of H0 : = 0 with a onesided alternative HA : < 0 .
{ Here 0 is the population mean IOP among untreated subjects.
Of course, this null value is known.
The power is given by
power = P (reject H0 given that H0 is false and ; 0 = ;5)
;0
= P x =pn < ;z1; 0 = + 5
p
p
= P x ;=p; 5 < ;z1; = P x=; n ; =5 n < ;z1;
n
p
p
= P x=; n < ;z1; + =5 n
p!
p
50
= P Z < ;z1; + 5 n = P Z < ;1:645 + 5 10
= P (Z < 1:89) = 1 ; P (Z 1:89) = 1 ; :029 = :971
165 What if we had used a twosided alternative?
In the IOP example suppose instead that we wished to test
H0 : = 0 versus HA : 6= 0
In this case, we would reject H0 if
x; 0 >z
1; =2
=pn
or, equivalently, if
x ; 0 < ;z
x; 0 >z
p
1; =2 or if
=n
=pn 1; =2
Thus, the power is given by
power = P (reject H0 given that H0 is false and ; 0 = ;5)
;0
;0
= P x =pn < ;z1; =2 0 = + 5 + P x =pn > z1; =2 0 = + 5
= P x ;=p; 5 < ;z1; =2 + P x ;=p; 5 > z1; =2
n
n
p
p
p
p
= P x=; n ; =5 n < ;z1; =2 + P x=; n ; =5 n > z1; =2
p
p
5 n +P Z >z
5n
= P Z < ;z
+
+
1; =2 1; p =2 p = P Z < ;z1; =2 + 5 n + P Z < ;z1; =2 ; 5 n
p
p
= P Z < ;1:96 + 5
+ P Z < ;1:96 ; 5
10= 50
10= 50
= P (Z < 1:58) + P (Z < ;5:50)
= 1 ; P (Z 1:58) + P (Z > 5:50) = 1 ; :057 + 0:000 = :943
Note that the test with a onesided alternative is more powerful than
the test with a twosided alternative.
166 General Result for Power of OneSample z Test:
The power of an level onesample z test of H0 : =
tion, known population variance ) is given by 8
> = P Z < ;z1; + j jpn
>
<
pn
power > = P Z < ;z1; =2 ;
> +P Z < ;z1; =2 +
: 0 (normal popula for a onesided alternative
pn for a twosided alternative where = ; 0, the di erence between the true population mean and
the null value 0 .
here is the e ect we want to detect. In the example it was = ;5,
a reduction of 5 mm Hg in IOP. 167 Sample Size:
Typically, at the design stage we x power at a desired level and compute
the sample size necessary to achieve that power rather than the other way
around.
One way to determine sample size for a given power is to use the
methods we've just outlined to gure out the power for each of a
range of values for n. Then select the smallest n that gives a power
to the power we want.
E.g., suppose we want to determine the minimum sample size necessary to ensure at least 90% power for the IOP example using a
onesided alternative and a z test.
Then repeating the calculations of p.165 for several n values we get: n
10
15
20
25
30
35
40 Power
.4746
.6147
.7228
.8038
.8630
.9054
.9354 Narrowing our search we nd: n
30
31
32
33
34
35 Power
.8630
.8727
.8817
.8902
.8981
.9054 So that we need sample size of n = 35 to achieve power of at least
.90 (90%) given our set of assumptions.
168 Alternatively, we can reason as follows to solve the problem more directly
(rather than by trial and error):
For a z test with a onesided alternative, we determined that 0
1
BZ < ;z + j jpn C
power = P B
A
@  1; {z }C
() If we want power equal to p, say, then this implies that ( ) should be the
100pth percentile of the Z distribution. That is,
p
jj n
z = ;z +
1; p Solving for n we have zp + z1; = jj p n )
) n = (zp + jz1; )
j
2 (z + z
p 1; )2 :
n=
p 2 E.g., in the IOP example if we want power of p = :90 and if we set
= :05, = 10,
102 (z:90 + z1;:05)2 = 100(1:2816 + 1:645)2 = 34:36 35
n=
52
25
General Result for Sample Size for a OneSample z Test:
The sample size necessary to achieve power equal to p for an level onesample z test of H0 : = 0 (normal population, known population
variance ) is given by ( z +z1;
2
2 (z +z
1;
2
2( p )2 p 2
=2 ) for a onesided alternative
for a twosided alternative
where = ; 0, the di erence between the true population mean and
the null value 0 . n= 169 Comparison of Two Means* In the last two chapters, we studied how to do inference on a single population mean based upon a single sample of data from that population.
We now take up the problem of inference on two means 1 and 2 based
upon two samples of data.
When considering inference based upon two samples, it is important to
distinguish between two scenarios for which di erent methodologies are
appropriate: Paired Samples vs. Independent Samples.
In either case, we have data that we will represent as follows:
Sample 1
x11
x21
..
.
xn1 1 Sample 2
x12
x22
..
.
xn2 2 1. Paired Samples.
For paired data, the sample size is the same in each sample. That
is, n1 = n2 = n.
In addition, the rst observation in sample 1 corresponds to the
rst observation in sample 2, the second observation in sample 1
corresponds to the second observation in sample 2, etc.
{ That is, the ith observation in samples 1 and 2 are paired, in
some sense. By \paired" we mean that they are connected in
such a way so that it is not reasonable to consider them to be
independent random variables. * Read Ch.11 of our text.
170 Pairing can occur in many di erent ways. E.g.,
{ Variables xi1 and xi2 might be pretest and posttest measurements on the same patients (study involves n paitients, indexed
by i = 1 : : : n).
{ Variables xi1 and xi2 might be measurements or observations
taken on the same unit (e.g., xray) by two di erent observers
(e.g., radiologists), or taken with two di erent measuring devices.
{ Variables xi1 and xi2 might be measurements of the same response variable on the same subjects at two di erent time
points (blood pressure at time 1, time 2), or two di erent locations (intraocular pressure (IOP) in the right eye and left
eye).
{ Variables xi1 and xi2 might be measurements of the same variable from two di erent family members (e.g., husband and wife,
in a study involving n married couples).
In all of these situations, we would expect that xi1 and xi2, the measurements taken on the ith subject (or pair) might be similar to one
another, or statistically dependent, because of common characteristics of the subject or pair.
{ It would be reasonable to assume that observations from subject to subject (pair to pair) are independent, but that two
observations from the same subject (or pair) would be dependent.
2. Independent Samples.
Alternatively, the two samples might not be paired, and therefore,
the data would be independent both within samples and between
samples.
In this situation, xi1 and xi2 are not paired in any sense (don't come
from a common source), and we can have samples of di erent sizes.
That is, n1 is not necessarily equal to n2. 171 Independent samples are common as well. Examples include:
{ n1 subjects randomly assigned to group 1 (e.g., they receive an
active treatment) and n2 other subjects randomly assigned to
group 2 (e.g., a placebo, or control, group), and then the same
response measured on each subject.
{ n subjects in the study, but n1 subjects (selected at random)
are measured at time 1 and the remaining n2 = n ; n1 subjects
measured at time 2.
{ Same as before, but n1 subjects could have IOP measured in
their left eye, n2 could have IOP measured in their right eye.
{ n1 husbands measured, n2 wives measured from n = n1 + n2
married couples (no one in the sample married to each other). Paired Samples: The paired sample problem is the easier of the two because it can be
handled by the methods we have already studied.
For paired data, what is typically of interest is the di erence
= 1 ;2 where 1 is the population mean corresponding to sample 1, and 2 is the
population mean corresponding to sample 2.
Notice that , the di erence in the population means, can also be
thought of as the population mean of the di erences.
In a paired situation, instead of thinking about having two samples, its
really more appropriate to say that we have a single sample of di erences
whose population mean is = 1 ; 2. 172 Example  Systolic Blood Pressure and Oral Contraceptives
A study of the e ects of taking oral contraceptives (OCs) on systolic
blood pressure (SBP) was conducted in which a random sample of
n = 10 women had their SBP measured before starting to use OCs
(i.e., at baseline) and after having taken OCs for 6 months.
The data are as follows:
Subject Number
i
1
2
3
4
5
6
7
8
9
10 Sample 1
xi1=Baseline SBP
115
112
107
119
115
138
126
105
104
115 Sample 2
xi2 =SBP using OCs
128
115
106
128
122
145
132
109
102
117 Di erence
di
13
3
1
9
7
7
6
4
2
2 The data are paired here because samples 1 and 2 correspond to 2
measurements on the same women.
{ If a woman has high SBP at baseline, she's more likely to have
relatively high SBP at the second measurement occasion, too.
Therefore, these measurements are dependent.
Let = population mean SBP when not taking OCs,
2 = population mean SBP when taking OCs,
= 1; 2
1 173 There are two types of inferences taht we might be interested in concerning
:
Hypothesis test: we may want to test H0 : = 0 versus HA : 6= 0
or, perhaps, versus HA : < 0
Con dence interval: we may instead prefer to estimate and form a
100(1 ; )% (e.g., 95%) CI for .
Both of these problems are ones which we already know how to handle, if
we just notice that we can think of this as a one sample problem.
Here we have a single sample of di erences: d1 : : : dn , where di = xi1 ; xi2 i = 1 ::: n We assume that the di 's are independent, each with distribution di N ( = 1 ;2 d)
2 We estimate the population mean and population sd d with the
corresponding sample quantities:
n
1 X d = 1 (;13 + (;3) +
d= n
i 10
i=1 + (;2)) = ;4:80 v
v
u1X
u1 X
n
u
u f( n d2) ; nd2g
2=t
t
sd = n ; 1 (di ; d)
n ; 1 i=1 i
i=1
r
= 1 f(;13)2 + + (;2)2 ; 10(;4:80)2 g = 4:566
9 Therefore, inference for can be done with the one sample methods we've
already learned.
174 E.g., assuming that d is unknown, and for a twotailed alternative HA :
6= 0, we have the following ttest of H0 : = 0 :
Test statistic:
d0
;
t = s ;pn = ;4:80 p 0 = ;3:32
4:566= 10
d=
Twosided p;value: p = 2P (t(n ; 1) > jtj) = 2P (t(9) > 3:32) = 2f1 ; P (t(9) < 3:32)g
= 2f1 ; :9956g = :0089
So, at level = :05, we reject H0 and conclude that there is a
signi cant di erence between the mean SBP with and without OC
use. The mean SBP when using OCs is higher.
A 95% twosided CI for would be given by s
:
d t1; =2(n ; 1) pd = ;4:80 t:975 (9) 4p566
 {z } 10
n
=2:2622 = (;8:066 ;1:534)
We are 95% con dent that the true mean di erence between the SBP
at baseline and the SBP when using OCs lies between 8.066 and
1.534. A negative di erence here means that the SBP at baseline is
lower.
If d , the population sd of the di erence between the measurements
in the two samples had been known, we would have used a ztest
and zbased con dence interval rather than the tbased inferences
illustrated here. 175 Independent Samples: In the independent samples case, we can't reduce the problem to one which
we already know how to solve. Instead, we're going to need some new
methodology.
We consider testing rst.
As in the onesample problem, we will assume that we have samples from
normally distributed populations. If not, then our results will not hold
exactly, but will be approximately valid if the sample size is reasonably
large by the CLT.
In particular, we assume that for sample 1 x11 x21 : : : xn1 1 are independent, with xi1 N (
and for sample 2
x12 x22 : : : xn1 2 are independent, with xi2 N ( 1 2
1 ) 2 2
2 ) and we assume that samples 1 and 2 are independent of each other.
That is, we have two normal samples with population means 1 and
2 and population SDs 1 and 2 .
The steps we take in conducting a hypothesis test in this setting are
the same as always:
1. State the research question in terms of the null and alternative hypothesis.
{ The null hypothesis that we are interested in will be H0 : 1 =
2 , or equivalently, H0 : 1 ; 2 = 0 versus HA :
or, perhaps, versus HA :
2. Specify a signi cance level.
{ E.g., = :05.
176 ; 2 = 0 (twosided)
6
1 ; 2 < 0(> 0) (onesided)
1 3. Select an appropriate test statistic.
{ Since we are interested in whether 1 ; 2 = 0, it is natural
to examine how far x1 ; x2 is from 0. Here x1 is the sample
mean from sample 1, x2 is the sample mean from sample 2.
{ Similar to the onesample problem, we judge how far x1 ; x2
is from its null value, 0, relative to its standard error. That is,
our test statistic is going to be of the general form: x1 ; x2 ; 0 = px1 ; x2 ; 0
s:e:(x1 ; x2)
var(x1 ; x2 )
^
{ (Recall that the standard error of a statistic is its estimated
standard deviation i.e., the square root of its estimated variance.)
{ The exact form of this test statistic depends upon what we
assume about the population SDs, 1 and 2 .
{ Speci cally, the standard error in the denominator of our test
statistic depends upon whether 1 and 2 are assumed (i)
known or unknown, and assumed (ii) equal or unequal.
4. Collect the data and compute the test statistic.
5. Calculate the pvalue and make conclusion.
{ The computation of the pvalue depends upon which test statistic is appropriate given our assumptions regarding 1 and 2
(step 3). Di erent test statistics have di erent distributions
under H0 , which a ects the pvalue or critical value. 177 In general, under the assumptions of independent samples such that x11 x21 : : : xn1 1 are independent, with xi1 N (
x12 x22 : : : xn2 2 are independent, with xi2 N (
then x1 ; x2 N
2 1 ;2 2
2 2
1 n1 + n2 1
2 2
1
2
2 )
)
() 2 i.e., var(x1 ; x2 ) = n1 + n2 .
1
2
If we standardize (convert to z scores), then (*) becomes x1 ;q2 ; ( 1 ; 2 ) N (0 1)
x
2
2
1+ 2
n1 n2
Under H0 : 1 ;2 () = 0, (**) becomes x
q 1 ; x2 N (0 1)
+
 n {z n }
2
1
1 (y) 2
2
2 test statistic
Cases: Case 1: both known (may or may not be equal).
In this case, the standard error in the denominator of our test statistic
above is
s2 2
s:e:(x1 ; x2 ) = n1 + n2
2
1 2
2 1 2 which can be computed directly. Therefore, our test statistic and its distribution are given by (y). 178 Example  SBP and OC Use, TwoSample Experiment Suppose that instead of the paired design described before in which
each woman was measured twice, once when not using OCs and once
when using OCs, the following design was used:
A random sample of n1 = 8 35 to 39yearold nonpregnant, premenopausal OC users and a random sample of n2 = 21 35 to 39yearold nonpregnant, premenopausal nonOC users were obtained.
The OC users were found to have a mean SBP of x1 = 132:86 mm Hg,
and the nonOC user's were found to have a mean SBP of x2 = 127:44
mm Hg.
1 , the population SD of SBP among OC users and 2 , the population mean SD among nonOC users are assumed to be the same,
equal to the common value 1 = 2 = = 16:0 mm Hg.
Then our test statistic is
x
q 16:0 16:02
z = q 12; x2 2 = 132:862; 127:44 = 0:815
1+ 2
8 + 21
n1 n2
Since our test statistic z is distributed as N (0 1), the pvalue for a twosided test is
p = 2P (Z > :815) = 2(:207) = :414
and we would fail to reject H0 : 1 = 2 based on an = :05 level test.
There is insu cient evidence to conclude that the mean SBP is different for the OC users than for the nonOC users.
General Rule under Case 1:
;
Onesided alternative: reject H0 if jz j > z1; where z = qx12 x2 2 .
1+ 2
n2 n1 Equivalently, reject H0 if p < where p = P (Z > jzj).
Twosided alternative: reject H0 if jz j > z1; =2 . Equivalently, reject H0
if p < where p = 2P (Z > jz j). 179 Case 2:
If 2
1 = 2
1
2
2 2
2 2
unknown, but assumed equal ( 1 = 2
2 = 2 , say). = 2 , then the test statistic in (y) becomes
x
q 12; x2 2 = r x1 ; x2
2 1+1
n1 + n2
n
n
1 2 which would still be N (0 1) if we know 2 .
However, we don't know 2 . Obvious thing to do: replace
estimate.
Two possible estimators come to mind:
s2 = sample variance from 1st sample
1
s2 = sample variance from 2nd sample
2 2 by a sample 2
2
Under the assumption that 1 = 2 = 2 , both are estimators of the same
quantity, 2 , each based on only a portion of the total number of relevant
observations available.
Better idea: combine these two estimators by taking their (weighted) average:
(n1 ; 1)s2 + (n2 ; 1)s2
1
2
2
2
^ = sP =
n1 + n2 ; 2 s s ) 11
11
s:e:(x1 ; x2 ) = ^ 2 n + n = s2 n + n
P1
1
2
2
)
test stat. = t = r x1 ; x2 s2 n11 + n12
P
t(n1 + n2 ; 2) by an estimate s2 (which is known as the
P
pooled estimate of ) changes the distribution of our test statistic
from N (0 1) to t(n1 + n2 ; 2).
Note that replacing 2
2 180 Example  SBP and OC Use, TwoSample Experiment In the same setup as before, now assume that 1 , the population
SD of SBP among OC users, and 2 , the population SD of SBP
among OC nonusers, are assumed to be equal, but their common
value = 1 = 2 is unknown.
Suppose also that the sample SD among OC users was s1 = 15:34
mm Hg, and the sample SD among OC nonusers was s2 = 18:23
mm Hg.
The pooled estimate of 2 , the common variance in the two populations,
is
(n1 ; 1)s2 + (n2 ; 1)s2 = (8 ; 1)15:342 + (21 ; 1)18:232 = 307:18
1
2
sP =
n1 + n2 ; 2
8 + 21 ; 2
2 Therefore, our test statistic is
132
t = r x1 ; x2
= q :86 ; 127:44 = 0:74
;1 1
1+1
2
307:18 8 + 12
sP n1 n2
which we compare to the t(n1 + n2 ; 2) = t(8+21 ; 2) = t(27) distribution,
the distribution of this test statistic under the null hypothesis.
For a twosided alternative hypothesis, the pvalue would be p = 2P (t(n1 + n2 ; 2) > jtj) = 2P (t(27) > :74)
= 2f1 ; P (t(27) < :74)g = 2(1 ; :7684) = :4632
and the critical value for a .05level test is t1; =2 (n1 + n2 ; 2) = t:975(27) =
2:052.
Since p = :4632 > = :05 (or, equivalently, since jtj = :74 <
t:975(27) = 2:052) we fail to reject H0 . There is insu cient evidence
to conclude that the mean SBP for OC users is di erent from that
of OC nonusers. 181 General Rule under Case 2:
Onesided alternative: reject H0 if jtj > t1; (n1 + n2 ; 2) where t = r x1 ; x2
s2 n11 + n12
P (n1 ; 1)s2 + (n2 ; 1)s2
1
2
sP =
n1 + n2 ; 2
2 Equivalently, reject H0 if p < where p = P (t(n1 + n2 ; 2) > jtj).
Twosided alternative: reject H0 if jtj > t1; =2 (n1 + n2 ; 2). Equivalently,
reject H0 if p < where p = 2P (t(n1 + n2 ; 2) > jtj). Case 3: both unknown but assumed di erent.
In this case, the test statistic in (y):
2
1 2
2 x
q 1 ; x2
2
1
1 2 2
n + n2 is not available because we don't know 2
1 and 2
2 . 2
Obvious solution: replace 1 by s2 , the sample SD from the rst sample,
1
2 by s2 , the sample SD from the second sample.
and replace 2
2
The resulting test statistic is
x
t = q 12; x2 2
s1 s2
n1 + n2 182 Problem: even though this test statistic makes good sense, its distribution under H0 is di cult to derive mathematically.
However, it can be shown that this test statistic has a null distribution
which is well approximated by a t distribution with degrees of freedom
that can be approximated from the data. That is,
x
t = q 12; x2s2 : t( )
s1
2
n1 + n2
where
= s2
1 n1 2 s2 + s2
1
2
n1 n2 2 s2
=(n1 ; 1) + n22 2 =(n2 ; 1) : Note that this quantity should be rounded down to the nearest integer to give an approximate degrees of freedom for the distribution
of t under H0 .
The approximation to the distribution of t under H0 given above is
based on what is known as Satterthwaite's approximation. Example  SBP and OC Use, TwoSample Experiment In the same setup as before, now assume that 1 , the population SD
of SBP among OC users, and 2 , the population SD of SBP among
OC nonusers, are unknown and we are not willing to assume that
they are equal.
Suppose again that the sample SD among OC users was s1 = 15:34
mm Hg, and the sample SD among OC nonusers was s2 = 18:23
mm Hg.
In this situation, our test statistic becomes x
132
t = q 12; x2 2 = q :862; 127:44 = :8058
s1 s2
15:34
:23
+ 1821 2
8
n1 + n2 183 Using Sattherthwaite's approximation, this test statistic is approximately
distributed as t( ) under H0 where
= s2
1 n1 2 = ; 15:342
8 s2 + s2
2
1
n1 n2 2 s2 2 =(n ; 1)
=(n1 ; 1) + n22
2
2
15:342 + 18:232
2 ; 18:23 2 =(21 ; 1) = 15:04
=(8 ; 1) + 21
8 21 2 which we round down to = 15.
Therefore, our pvalue is p = 2P (t( ) > jtj) = 2P (t(15) > :8058)
= 2f1 ; P (t(15) < :8058)g = 2(1 ; :7835) = :433
and our .05level critical value is
t1; =2( ) = t:975 (15) = 2:131
Therefore, since p = :433 > = :05 (or, equivalently, because jtj =
:8058 < t:975(15) = 2:131) we fail to reject H0 .
There is insu cient evidence here to conclude that the mean SBP of
OC users di ers from that of OC nonusers.
General Rule under Case 3:
Onesided alternative: reject H0 if jtj > t1; ( ) where x
t = q 12; x2 2
s1 s2
n1 + n2 s2 + s2 2
1
2
n1 n2
= (s2 =n1 )2 (s2 =n2 )2
1
2
n1 ;1 + n2 ;1 Equivalently, reject H0 if p < where p = P (t( ) > jtj).
Twosided alternative: reject H0 if jtj > t1; =2 ( ). Equivalently, reject
H0 if p < where p = 2P (t( ) > jtj).
184 Con dence Intervals for 1 ; 2
As we've learned, the acceptance region of an level test forms a 110(1 ;
)% con dence interval.
Therefore, the tests we have just derived for the two independent samples
problem can all be inverted to form con dence intervals.
General Rule for Con dence Limits under Case 1:
Onesided limits: a 100(1 ; )% upper con dence bound on 1 ; 2 under
case 1 is given by
s
2
1 (x1 ; x2 ) + z1; A 100(1 ; 2
2 2
1 2
2 n1 + n2
)% lower con dence bound on 1 ; 2 is given by
(x1 ; x2 ) ; z1; s n1 + n2 Twosided limits: a 100(1 ; )% con dence interval on
1 is given by
s
2 2 (x1 ; x2 ) z1; =2 n1 + n2
1
2 185 1; 2 under case General Rule for Con dence Limits under Case 2:
Onesided limits: a 100(1 ; )% upper con dence bound on 1 ;
case 2 is given by
s
11
(x1 ; x2 ) + t1; (n1 + n2 ; 2) s2 n + n
P1
2
A 100(1 ; )% lower con dence bound on 1 ; 2 is given by 2 under s 11
(x1 ; x2 ) ; t1; (n1 + n2 ; 2) s2 n + n
P1
2
Twosided limits: a 100(1 ; )% con dence interval on 1 ;
2 is given by
s
11
(x1 ; x2 ) t1; =2 (n1 + n2 ; 2) s2 n + n
P1
2
General Rule for Con dence Limits under Case 3:
Onesided limits: a 100(1 ; )% upper con dence bound on
case 3 is given by s A 100(1 ; 2 under case 1; 2 under s2 + s2
(x1 ; x2 ) + t1; ( ) n1 n2
1
2
)% lower con dence bound on 1 ; 2 is given by
s2 2
ss
(x1 ; x2 ) ; t1; ( ) n1 + n2
1
2 Twosided limits: a 100(1 ; )% con dence interval on
3 is given by
s2 2
ss
(x1 ; x2 ) t1; =2( ) n1 + n2
1
2 1; 2 under case Notice that all of these con dence intervals are of the same general
form: x1 ; x2 plus or minus tcrit or zcrit standard errors of x1 ; x2 .
186 Example  Blood Glucose Level and Stenosis A study was performed concerning risk factors for carotid artery
stenosis (narrowing) among 464 men born in 1914 and residing in
the city of Malmo, Sweden. The following data were reported for
bloodglucose level (mmol/L):
Stenosis Status
No Stenosis
Stenosis n
356
108 Sample Mean
5.3
5.1 Sample SD
1.4
0.8 Using an appropriate procedure, test whether there is a signi cant difference between the mean bloodglucose levels of men with and without
stenosis. Use = :01. In addition, form a 99% con dence interval for the
di erence in the population mean bloodglucose levels of those with and
without stenosis.
Let 1 =population mean bloodglucose of men with stenosis, and 2 be
the corresponding mean for those without stenosis. We are interested in
testing
H0 : 1 ; 2 = 0 versus HA : 1 ; 2 6= 0
and forming a 99% CI for 1 ; 2 .
We do not know the SDs for the two populations here, so we know that
we are going to use a t test here rather than a z test. However, are we in
case 2 (equal population SDs) or in case 3 (unequal population SDs)?
To answer this question, we can choose between cases 2 and 3 based
upon looking at whether the sample SDs are close to each other
and by using our medical knowledge/judgement as to whether its
reasonable to assume equal variability in blood glucose level in these
two groups. 187 Alternatively, we can do a formal hypothesis test of H0 : 2
1 = 2
2 versus HA : 2
1 6= 2
2 (z) There exists a statistical test of this hypothesis for data from two independent normally distributed samples. It is called the F test for equal
variances, and it is performed as follows:
The test statistic for H0 is given by s2=s2 if 2 2
12
F = s2=s2 ifss1< ss2
2
2
21
1
2
Under H0 , this statistic follows the F distribution. The F distribution
has two parameters, called the numerator degrees of freedom which
is equal to one less than the sample size associated with the variance in
the numerator, and the denominator degrees of freedom which is one
less than the sample size associated with the variance in the denominator
of F .
We will denote this distribution as F (num df denom df) and the
100pth percentile by Fp (num df denom df).
We reject H0 at level , if F > Fcrit where Fcrit is given by
2
2
Fcrit = F1; (n1 ; 1 n2 ; 1) if s1 < s2
F1; (n2 ; 1 n1 ; 1) if s2 s2
1
2 Critical values of the F distribution are given in table A.5 in the
back of our book.
Equivalently, we reject H0 at level if p < where
2P (F (n1 ; 1 n2 ; 1) > F ) if s2 s2
p = 2P (F (n ; 1 n ; 1) > F ) if s1 < s2
2
2
2
1
1
2
Probabilitites associated with the F distribution can be computed
with computer programs such as Minitab.
188 Back to the example:
We will conduct the t test of H0 : 1 ; 2 = 0 versus a twosided alternative
under both cases 2 and 3, but then we will do the F test for equal variances
to see which case is more appropriate for these data.
Under case 2, our test statistic is
t = r x1 ; x2 s2 n11 + n12
P where So, 2
1) 2
1
s2 = (n1 ; n s+ + (n22; 1)s2
P
n2 ;
1
2
2
= (356 ; 1)1:4+ + (1082; 1)0:8 = 1:654
356 108 ; :2
= :1413 = 1:416
t = q 5:3;; 5:1
1
1
1:654 356 + 108
and our pvalue and critical value are
p = 2P (t(n1 + n2 ; 2) > jtj) = 2P (t(462) > 1:416) = 2f1 ; P (t(462) < 1:416)g
= 2(1 ; :9212) = :158
and t1; =2(n1 + n2 ; 2) = t:995 (462) = 2:587 So, we fail to reject H0 at level = :01 because p = :1413 > = :01
(equivalently, because jtj = 1:416 < tcrit = 2:587).
Under case 2, a 99% CI for 1 ; 2 would be s 11
(x1 ;x2 ) t1; =2 (n1 +n2 ;2) s2 n + n = :2 2:587(:1413) = (;:165 :565)
P1
2
189 Under case 3, our test statistic is :2
x
t = q 12; x2s2 = q 2:2 2 = :1069 = 1:871
s1
1:4 + 0:8
2
356
108
n1 + n2
The approximate degrees of freedom for Satterthwaite's approximation are
= s2
1 n1 = ; 1:42
356 2 s2 + s2
1
2
n1 n2 2 s2 2
=(n1 ; 1) + n22 =(n2 ; 1) :82 2
+ 108
; :82 2 =(108 ; 1) = 315:97
2
=(356 ; 1) +
1:42
356 108 which we round down to = 315. Therefore, our pvalue and critical value
are p = 2P (t( ) > 1:871) = 2f1 ; P (t(315) < 1:871)g = 2f1 ; :9689g = :062
and t1; =2( ) = t:995 (315) = 2:592 So, again, we fail to reject H0 at = :01 (although our pvalue is
now considerably smaller than in case 2).
Under case 3, a 99% CI for 1 ; 2 would be s s2 + s2 = :2 2:592(:1069) = (;:077 :477)
(x1 ; x2 ) t1; =2( ) n1 n2
1
2 190 Now to choose between cases 2 and 3 by conducting an F test of H0 :
2
2.
Note that s1 = 1:4 > :8 = s2 , so we compute 2
1 = s2 = 1:42 = 3:06
1
F = s2 :82
2
The pvalue and critical value here are p = 2P (F (n1 ; 1 n2 ; 1) > F ) = 2P (F (355 107) > 3:06)
= 2f1 ; P (F (355 107) < 3:06)g = 2f1 ; 1:000g = 0:000
and Fcrit = F1; (n1 ; 1 n2 ; 1) = F:99 (355 107) = 1:46
So, because p = 0:000 < = :01 (or, equivalently, because F =
3:06 > Fcrit ), we reject H0 , and conclude that the population variances are di erent here, so that the case 3 analysis was more appropriate. 191 Inference for Proportions* So far we have con ned our discussion of inference to means of continuous
random variables. However, dichotomous (also known as binary, or
01, or Bernoulli) variables are also very common in the health sciences.
Examples of dichotomous random variables:
{ Disease status (0=disease free, 1=diseased)
{ Mortality (0=dead, 1=alive)
{ Pregnancy (0=not pregnant, 1=pregnant)
{ Adherence to a protocol (0=no, 1=yes)
{ Gender (0=male, 1=female)
Note that these are all essentially qualitative variables, but we assign
the numbers 0 and 1 to make them numeric to allow analysis.
Note also that the sample mean of a 01 variable is the proportion
of the sample members who fall in the \1" category.
A population mean of a 01 variable is the corresponding population
proportion in the \1" category, which also has the interpretation as
the probability of being in the \1" category.
As always we can express proportions and probabilities as percentages by multiplying by 100%.
Given that a proportion is a mean, and given that the CLT says that means
of even nonnormally distributed random variables are approximately normal, for large sample sizes, it should be no suprise that the normaltheory
inference that we have just been studied can be extended to proportions
and justi ed as approximately valid for large sample sizes. * Read Ch.14 of our text.
192 Normal Approximation to the Binomial
Recall the binomial distribution gives the probability function for a random variable X de ned as the number of successes that occur out of n
trials, where the trials are independent, identically distributed with constant success probability p.
We write this as X B in(n p).
{ Recall that
E(X ) = np var(X ) = np(1 ; p)
Recall also from pp.111113 of these notes that the CLT implies that the
normal distribution can approximate the binomial distribution well when
the sample size is large.
Which normal distribution? The one with the same mean and variance as the binomial distribution that we are trying to approximate.
That is, if np 5 and n(1 ; p) 5, then for X B in(n p),
X : N (np np(1 ; p))
What does this have to do with inference for a proportion?
Notice that if X =the number of successes out of n trials, then the proportion of successes out of n trials is just
p = X=n
^ B in(n p) : N (np np(1 ; p)) then
1
p = X n Bin(n p) : n N (np np(1 ; p)) = N (p p(1 ; p)=n)
^n1 Since X So, we have that p : N (p p(1 ; p)=n)
^
()
which says that a sample proportion p is approximately normally
^
distributed with mean p, the corresponding population proportion,
and variance p(1 ; p)=n.
193 OneSample Con dence Intervals for p
Based on the distributional result (*), we can standardize (convert to z
scores) to get the following result:
^
: N (0 1)
z = p p;p
()
p(1 ; p)=n
p
^
Therefore, for example, z = pp(1;pp)=n should fall between 1.96 and
;
1.96 approximately 95% of the time.
p
^
z = pp(1;pp)=n should fall between 1.645 and 1.645 approximately
;
90% of the time.
p
^
In general, z = pp(1;pp)=n should fall between ;z1; =2 and z1; =2
;
approximately 100(1 ; )% of the time.
That is, we can make the probability statement:
^
1:96) :95
P (;1:96 p p ; p
p(1 ; p)=n
If we rearrange the lefthand side so that p falls in the middle of the
inequality, we get p p P (^ ; 1:96 p(1 ; p)=n p p + 1:96 p(1 ; p)=n) :95
p
^ p p ^
Therefore, (^; 1:96 p(1 ; p)=n p +1:96 p(1 ; p)=n) is an approxp
imate 95% CI for p.
Note that the endpoints of this interval depend upon p, the true
value of the population proportion, which is of course unknown.
Therefore, we replace p by p, leading to
^ p p ^
^^
^
^
(^ ; 1:96 p(1 ; p)=n p + 1:96 p(1 ; p)=n)
p
as an approximate 95% CI for p.
194 More generally, for np 5 and n(1 ; p) 5, an approximate 100(1 ; )%
^
^
CI for p is given by
p
p z1; =2 p(1 ; p)=n
^
^
^
Notice that this interval is of the usual form: estimator plus or minus
some number of standard errors.
pp(1 ; p)=n and the multi^
{ Here, the standard error of p is ^
^
plier is the upper =2th critical value of a z (standard normal)
distribution. Example  Prevalence of Breast Cancer Suppose we are interested in estimating the prevalence (population
proportion with a condition or characteristic) of breast cancer among
50{54year old women whose mothers have had breast cancer.
Suppose that in a random sample of 1,000 such women, 40 are found
to have had breast cancer at some point in their lives.
Obtain a point estimate and 99% con dence interval for the prevalence of breast cancer in this population.
The best point estimate of p is the sample proportion, p = x=n where
^
x =the number with breast cancer, and n is the sample size. So, our
estimate of p is
p = 1000 = :040
^ 40
or 4%.
To check whether the sample size is large enough in this problem to justify
our normal theory con dence interval, we notice that np = 1000(:040) = 40 5 and n(1 ; p) = 1000(1 ; :040) = 960 5
^
so we should be OK. 195 For a 99% CI, 100(1 ; ) = 99 so = :01. Therefore, z1; =2 = z:995 = 2:576 (back of the book)
The standard error of p is
^ p p s:e:(^) = p(1 ; p)=n = :040(1 ; :040)=1000 = :00620
p
^
^
so that our approximate 99% CI for p is p ^
^
p z1; =2 p(1 ; p)=n = :040 2:576(:00620) = (:024 :056)
^
Thus, we are 99% con dent that the true prevalence of breast cancer
among 50{54yearold women whose mothers had breast cancer lies
between 2.4% and 5.6%.
Occasionally, we want a onesided interval (lower or upper bound). Here
is the general result:
For np 5 and n(1 ; p) 5, an approximate 100(1 ; )% lower bound
^
^
on p is given by
p
p ; z1; p(1 ; p)=n
^
^
^
An approximate 100(1 ; )% upper bound on p is given by p + z1;
^ p p(1 ; p)=n
^
^ 196 OneSample Hypothesis Tests for p
Suppose that in the breast cancer example, it is known that the
population prevalence of breast cancer among women with no family
history of breast cancer is 2%.
Then to determine whether a family history of breast cancer is a risk
factor for this disease, we may be interested in testing the hypothesis H0 : p = p0 versus HA : p > p0 where p0 = :02. How can we test such a hypothesis?
Recall from (**) that p
^
pp(1; pp)=n : N (0 1)
;
where p is the true population proportion.
Under the null hypothesis, p = p0 so this result becomes
^
: N (0 1)
z = p p ; p0
p0 (1 ; p0 )=n
Since z compares the sample proportion p to the null value p0 (rel^
ative to the standard error of p), and since the distribution of z is
^
known, z is the natural test statistic for testing H0 : p = p0 . 197 General method for an approximate level test of H0 : p = p0 versus a
one or twosided alternative:
Critical value approach: reject H0 if p;p0 is consistent with the alternative
^
hypothesis and if
p ; p0
^
> z1; =2 for a onesided alternative
jz j = p
z1;
for a twosided alternative.
p(1 ; p)=n
^
^
Otherwise, we fail to reject.
pvalue approach: reject H0 if p < . The pvalue is computed as
8 P (Z < z) if the alternative is H : p < p ,
<
A
0
p = : P (Z > z) if the alternative is HA : p > p0,
2P (Z > jz j) if the alternative is HA : p 6= p0
Here, Z denotes a N (0 1) random variable, and z is the value of our test
statistic.
This normaltheory test can be justi ed by the CLT, and should
work well provided that np0 5 and n(1 ; p0 ) 5. Example  Breast Cancer Prevalence Suppose we wish to conduct an = :01level test of
H0 : p = p0 versus HA : p > p0 where p0 = :02.
Our test statistic is
^
z = p p ; p0
= p :040 ; :02
= 4:52
p0 (1 ; p0)=n
:02(1 ; :02)=1000
Since p = :040 > p0 = :02, the sample results provide evidence in
^
favor of HA : p > :02.
Our critical value here is z1;:01 = z:99 = 2:327, so since jz j = 4:52 >
z:99 = 2:327, we reject H0 in favor of HA : p > p0 .
The conclusion is that there is a signi cantly higher prevalence (at
level .01) for women whose mothers had breast cancer.
The pvalue for our test would be
p = P (Z > z) = P (Z > 4:52) = :0000031 (from Minitab)
198 Power and Sample Size for Testing a Proportion
We have already studied power and sample size calculation methods for
onesample z tests.
Therefore, when using normalapproximation methods (z tests) for inference on p, a population proportion, the power and sample size methods
we've already learned apply with little modi cation. Example  Breast Cancer Prevalence
Suppose we wish to investigate whether women who's sisters have a
history of breast cancer are at higher risk for breast cancer themselves.
Suppose we assume that the prevalence of breast cancer is 2% among
50{54 yearold US women with no famiuly history, whereas it is 5%
among those women who's sisters have had breast cancer.
We propose to interview 500 5054 yearold women with a sister
history of the disease.
Assuming that we conduct a onesided test at = :05, what would
be the power of such a study?
Here, we are going to test H0 : p = p0 versus HA : p > p0 where p0 = :02 This hypothesis would be rejected if our test statistic exceeds the appropriate critical value. That is, if
^
z = p p ; p0
> z = z:95 = 1:645
p0 (1 ; p0 )=n 1;
We have assumed that the null hypothesis is really false and that the
true prevalence is p = p1 where p1 = :05. Therefore, the power is the
probability that the test statistic z exceeds the critical value z1; = 1:645
199 given that p = p1 = :05. That is, ! ^0
pp p ;;pp )=n > z1; p = p1
0 (1
0
p
= P p > p0 + z1; p0 (1 ; p0 )=n p = p1
^
!
pp (1 ; p )=n ; p
^
0
1
pp (1 ; p )=n
> p0 + z1; 0
p = p1
= P p p ; p1
p1 (1 ; p1)=n
1
1
s
!
p0 (1 ; p0 ) + p p0 ; p1
= P Z > z1; p (1 ; p )
p (1 ; p )=n
1
1 power = P 1 1 So, in this example, the power is s ! power = P Z > z1; p0 (1 ; p0 ) + p p0 ; p1
p1 (1 ; p1 )
p1 (1 ; p1 )=n
s
!
= P Z > 1:645 ::02(1 ; ::02) + p :02 ; :05
05(1 ; 05)
:05(1 ; :05)=500
= P (Z > ;2:02) = 1 ; P (Z > 2:02) = 1 ; :022 = :978
General result for the power of a onesample z test for p:
power = P (Z > z ) = P (Z < ;z )
~
~
where 8 q p (1;p ) jp ;p j
> z1; p (1;p ) ; pp (1;p )=n
<
q p (1;p ) jp ;p j
z=>
~
: z1; =2 p (1;p ) ; pp (1;p )=n
0
1 0
1 0
1 0 1 1 0
1 1 0 1 1 1 if the alternative is onesided
if the alternative is twosided This result holds provided that the sample size is large enough to
justify using the normal approximation (z test). That is, provided
that np0 5 and n(1 ; p0 ) 5.
200 ...
View
Full
Document
This note was uploaded on 11/13/2011 for the course STAT 6200 taught by Professor Staff during the Summer '08 term at UGA.
 Summer '08
 Staff
 Statistics, Biostatistics

Click to edit the document details