This preview shows pages 1–9. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CHAPTER 7
SAlVIPLING DISTRIBUTION 7.2 Sampling Plans and Experimental Designs The way a sample is selected is called the sampling plan or experimental design. Simple
random sampling is commonly used sampling plan in which every sample of size n has the
same chance of being selected. Four most commonly used sampling plans are given as follows. Deﬁnition 1. If a sample of n elements is selected from a population of N elements
using a plan in which each of the possible samples has the equal chance of selection, then
the sampling is said to be random and the result sample is a simple random sample. Example 1. Suppose we want to select a sample of size n = 2 from a population
containing N = 4 objects (say, A, B, C, and D). There are six distinct samples that could
be selected, as listed in the following table. Sample Observations in Sample
1 A, B
A, C
A, D
B, C
B, D
C', D monsoon: If each of these six samples has the equal chance of being selected, given by 1/6, then the
resulting sample is called a simple random sample, or just a random sample. In general, we
have the following deﬁnition. The selection of a simple random sample can be done by using random numbers  dig—
its generated so that the values 0 to 9 occur randomly and with equal frequency. Another
method is to let computer generates random numbers for sampling. Deﬁnition 2. When the population consists of two or more subpopulations, called strata,
a sampling plan that ensures. that a simple random sample is selected from each subpopula— tion is called a stratiﬁed random sample. Example 2. Suppose a public opinion poll designed to estimate the proportion of voters
who favor spending more tax revenue on an improved ambulance service is to be conducted
in a certain county. The county contains two cities and a rural area. The population ele
ments of interest for the poll are all men and women of voting age who reside in the county.
A stratiﬁed random sample of adults residing in the county can be obtained by selecting a simple random sample of adults from each city and another simple random sample of adults
from the rural area. In this case, the two cities and the rural area represents three strata from which simple random sample are selected. The principal reasons for using stratiﬁed random sampling rather than simple random
sampling are as follows: 1. Stratiﬁcation may produce a smaller sampling error than would be produced by a
simple random sample of the same size. This result is particularly true if measurements within strata are homogeneous. 2. The cost per observation in the survey may be reduced by stratiﬁcation of the popu—
lation elements into convenient grouping. Deﬁnition 3. When the available sampling units are groups of elements, called clusters,
a cluster sample is a simple random sample of clusters from the available clusters in the
population. Example 3. To estimate the average income per household in a large city, how should
they choose the sample? If they use simple random sampling, they will need a frame list—
ing all households (elements) in the city, and this frame may be very costly or impossible
to obtain. They cannot avoid this problem by using stratiﬁed random sampling because a
frame is still required for each stratum in the population. Rather than draw a simple ran—
dom sample of elements, they could divide the city into regions such as blocks (or clusters
of elements) and select a simple random sample of blocks from the population. This task
is easily accomplished by using a frame that lists all city blocks. Then the income of every
household within each sampled block could be measured. Deﬁnition 4. A Lin—k: systematic random sample involves the random selection of one
of the first k elements in an ordered population, and then the systematic selection of every
kth element thereafter. A systematic sample is generally spread more uniformly over the entire population and
thus may provide more information about the population than an equivalent amount of data
contained in a simple random sample. Example 4. Suppose we wish to select a 1—in—5 systematic sample of travel vouchers
from a stack of N = 1000 (that is, sample n = 200 vouchers) to determine the proportion
of vouchers ﬁled correctly. A voucher is drawn at random from the ﬁrst ﬁve vouchers (for
instance, number 2), and every ﬁfth voucher thereafter is included in the sample. Suppose
that most of the ﬁrst 500 vouchers have been correctly ﬁled, but because of a change in clerk,
the second 500 have all been incorrectly ﬁled. Simple random sampling could accidentally
select a large number (perhaps all) of the 200 vouchers from either the ﬁrst or the second
500 vouchers and hence yield a very poor estimate of the true proportion of correct ﬁling.
In contrast, the systematic sampling would select an equal number of vouchers from each of
the two groups and would give a very accurate estimate of the fraction of vouchers correctly ﬁled. I 7.3 Statistics and Sampling Distribution When we select a random sample from a population, the numerical descriptive measures,
such as mean, standard deviation, and so on, calculated from the sample is referred to as
statistics. These statistics vary or change for each different random sample we select; that
is, they are random variables. The probability distributions for statistics are called sampling
distributions because, in repeated sampling, they provide this information: * What value of the statistic can occur. * How often each value occur. Deﬁnition 5.. The sampling distribution of a statistic is the probability distribution for
the possible values of the statistic that results when random samples of size n are repeated
drawn from the population. There are three ways of ﬁnding the sampling distribution of a statistic: 1. Derive the distribution mathematically using the laws of probability. 2. Use simulation to approximate the distribution. That is, draw a large number of
samples of size n, calculating the value of the statistic for each sample, and tabulate the
results in a relative frequency histogram. When the number of samples is large, the histogram
will be very close to the theoretical sampling distribution. 3.. Use statistical theorems to derive exact or approximate sample distribution. Example 5,. Suppose a population consists of N = 5 numbers: 3, 6, 9, 12, 15. If a
random sample of size n = 3 is selected without replacement, ﬁnd the sample distribution
for (a) the sample mean E, (b) the sample median m. Solution. All possible random samples of size n = 3 and their corresponding means and
medians are given below. Sample Observations in Sample Sample Mean Sample Median 1 3, 6, 9 6 6
2 3, 6, 12 7 6
3 3, 6, 15 8 6
4 3, 9, 12 8 9
5 3, 9, 15 9 9
6 3,12,15 10 12
7 6, 9, 12 9 9
8 6, 9, 15 10 9
9 6,1215 11 12
10 9,1215 12 12 (a) The sample distribution for the sample mean E is given by That is,
nun (b) The sample distribution for the sample median m is given by 3 P = 6 = — = 0,3
{m } 10
required
4
P = = — = .4
{m 9} 10 0 That is, Mm) 03
I Note. It is usually very difﬁcult to derive sampling distributions by the method described
in the preceding example. When this method is no longer feasible, we may have to use one of these methods: * Use a simulation to approximate the sampling distribution empirically. * Rely on statistical theorems and theoretical results. 7.4 The Central Limit Theorem The Central Limit Theorem states that, under rather general conditions, sums and means
of random samples of measurements drawn from a population tend to have an approximately 4 normal distribution Consider an experiment of tossing a balanced die n times. Let E denote the mean of
the numbers on the n upper faces. If we use computer software to generate and depict the
histograms of the sampling distribution of T for n = 2, n = 3, n = 4, and so on, we will
amazingly ﬁnd that the shape of these histograms looks closer and closer like the standard
normal curve as n becomes larger and larger. Theorem 1 (Central Limit Theorem). If random samples of n observations are drawn
from a nonnormal population with ﬁnite mean ,a and standard deviation (7, then, when n is
large, the sampling distribution of the sample mean E is approximately normally distributed,
with mean a and standard deviation 0/ The approximation becomes more accurate as
n becomes large. Example 6. Achievement test scores of all high school seniors in a certain state have
mean a = 60 and variance 02 = 64. A random sample of n = 100 students from a large high
school had a mean score of 58.. Is there evidence to suggest that this high school is inferior? Solution. Let E denote the mean of a random sample of n = 100 scores from a population
with a = 60 and 02 = 64.. We wish to calculate the probability that the sample mean 5 is
at most 58, namely, P {T g 58}. By the Central Limit Theorem, it follows that PW g 58} z P{z g —2..5} = 0.0062 where the standardized value of the mean score 58 is calculated as
58 — 60 _
8/ x/ 100 Since this probability is exceedingly small, it is unlikely that any peer high school will produce
the mean score lower than 58.. This evidence suggests that the average score for this high school is inferior. —24,5. 7.5 The Sampling distribution of the Sample Mean Theorem 1 (The Sampling distribution of the Sample Mean E) ‘k If a random sample of n measurements is selected from a population with mean ,u and
standard deviation 0, the sampling distribution of the sample mean 5 will have mean ,a and
standard deviation 0/ * If the population has a normal distribution, the sampling distribution of the sample
mean ‘95 will be exactly normally distributed with mean ,a and standard deviation 0/ * If the population distribution is nonnormal, the sampling distribution of the sample
mean E will be approximately normally distributed, with mean a and standard deviation
a/ﬁ, for large samples (by the Central Limit Theorem). Deﬁnition 6. The standard deviation of a statistic used as an estimator of a populatiOn
parameter is also called the standard error of the estimator (abbreviated SE) because it 5 refers to the precision of the estimator. Therefore, the standard deviation of E — given by a/ﬁ — is referred to as the standard
error of the mean, abbreviated as SE or just SE. Example 7. The duration of Alzheimer’s disease from the onset of symptoms until death
ranges from 3 to 20 years; the average is 8 years with a standard deviation of 4 years. The
administrator of a large medical center randomly selects the medical records of 30 deceased
Alzheimer’s patients from the medical center’s database and records the average duration
Find the approximate probability that the average (a) is less than 7 years, (b) exceeds 7 years, (0) lies within 1 year of the population mean ,a = 8. Solution. The standard error is 4
— = 0.73.. (7 x/ﬁ T 7% (a) To ﬁnd the probability that the average is less than 7 years, we need to calculate the
standardized value of 7: 7 8 = —1. .
0.73 37 Then the desired probability is
P{E < 7} m P{z < 4.37} = 0.0853.
(b) The probability that the average exceeds 7 years is
P{§:‘ > 7} % P{z > —1.37} = 1— 0.0853 2 0.9147. (c) To ﬁnd the probability that the average lies within 1 year of the population mean
a = 8, we need to calculate the standardized values of 7 and 9: 7 * 8 9 — 8
—————— = — a — = 1” ‘_
0’73 1 37 and 0073 37 Then the required probability is 22 P{7 < E <9} P{—1.37 < z < 1.37}
P{z < 1.37} — P{z < —1.37}
0.9147 — 0.0853 0.8294. Example 8.. To avoid difﬁculties with the Federal Trade Commission or state and local
consumer protection agencies, a beverage bottler must make reasonably certain that 12—ounce 6 bottles actually contain 12 ounces of beverage. To determine whether a bottling machine is
working satisfactorily, one bottler randomly samples 30 bottles per hour and measures the
amount of beverage in each bottle. The mean E of the 30 ﬁll measurements is used to decide
whether to readjust the amount of beverage delivered per bottle by the ﬁlling machine. If
records show that the amount of ﬁll per bottle is normally distributed, with a standard
deviation of 03 ounces, and if the bottling machine is set to produce a mean ﬁll per bottle
of 12 ounces, what is the approximate probability that the sample mean T of the 30 test bottles is less than 11.99 ounces? Solution. The standard error is _0_ Z = 0.055, Wm To ﬁnd the probability that the sample mean of the 10 test bottles is less than 12 ounces,
we need to calculate the standardized value of 119:
11.9 — 12 = —1, 2.,
04,055 8 The required probability is then
P {E <11,.9} = P {z < —182} = 0,0344, Since this probability is very small, the company should not have difﬁculties with the Federal
Trade Commission or state and local consumer protection agencies. Example 9. An electronic ﬁrm manufacturers light bulbs that have a length of life with
mean 800 hours and a standard deviation of 80 hours. Find the probability that a random
sample of 64 bulbs will have an average life of greater than 77 5 hour. Solution. The standard error is Lzﬂﬂou x/ﬁx/671 To find the probability that the sample mean of the 64 bulbs is greater than 775 hours, we
need to calculate the standardized value of 775: 775 — 800 _
10 ” —2,,5, The required probability is then Pg > 775} m P{z > —2.5} = 1 — P {z < —2.5} = 1 — 00062 = 0,9938. HOMEWORK: pp..273 — 274 7,19, 7,24, 71,29, 7,30, 71,31, 7,33 7.6 The Sampling Distribution of the Sample Proportion Let x be a binomial random variable with n trials and probability p of success. Here
the parameter p can also be referred to as the population proportion of success. Since it
represents the number of successes in n trials, the sample proportion of success x 23:—
77; will be used to estimate of the population proportion p. The binomial random variable 5t has mean ,a = np and standard deviation 0 = Since ﬁis simply the value of :5, expressed as a proportion (13 = i), the sampling distribution
of ii is identical to the probability distribution of :17, except that it has a new scale along the
horizontal axis. Because of this change of scale, the mean and standard deviation of p are also rescaled, so that the mean of the sampling distribution is p and its standard error is SEQ/i): —— whereq=1—p. Just as we can approximate the probability distribution of the binomial random variable
5t with a normal distribution when the sample size n is large, we can do the same with the sampling distribution of 3?. Theorem 2 (Properties of the Sampling Distribution of the Sample Proportion If a
random sample of n observations is drawn from a binomial population with parameter p,
then the sampling distribution of the sample proportion A :1}
p = 
n
will have a mean p and standard deviation
SE65) 2 m where q z 1 — p.
n When the sample size n is large, the sampling distribution of p can be approximated by a
normal distribution. The approximation will be adequate if np > 5 and nq > 5. Example 10.. In a survey, 500 mothers and fathers were asked about the importance
of sports for boys and girls. Of the parents interviewed, 60% agree that the genders are
equal and should have equal opportunities to participate in sports. Describe the sampling
distribution of the sample proportion ﬁof parents who agree that the genders are equal and should have equal opportunities. Solution. Let p denote the population proportion of all parents in the United States
who agree that the genders are equal and should have equal opportunities. The sampling 8 distribution of I? can be approximated by a normal distribution, with mean equal to p and standard error SEQ/5): % whereqzl—p. It should be noted that the sampling distribution of if is centered over its mean p. Even though we do not know the exact value of p (the sample proportion 13 : 0 .60 may be larger or
smaller than p), an approximate value for the standard deviation of the sampling distribution can be found using the sample proportion 1’5: 060 to approximate the unknown value of p. Thus,
SE = — m —— = ———— = 0.022.
(p) \/ n V n 500 Now the probability the 13 will fall within 28E (13) = 0.044 is given by
p — p 0.044 P{ SE0?) < z P{lz[ < 2}
P{—2<z<2}=P{z<2}—P{z<—2}
0.9772 — 0.0228 0.9544. A P {lﬁ— pl < 0044} Therefore, approximately 95% of the time 13 will fall within 28E (f5) = 0.044 of the (un—
known) value of 1)... Example 11. Refer to Example 10. Suppose the proportion p of parents in the popu—
lation is actually equal to 0.55. What is the probability of observing a sample proportion
larger than or equal to the observed value 1? = 0.60? Solution. Since n = 500 and if = 0.60, we calculate SE (2’5) = 0qu = —~—(0'555)0(00'45) = 0.0222. The required probability is
Pﬁﬁz 0.60} % P{z 2 2.25} = 1 — P{z g 2.25} = 1 — 0.9878 = 0.0122, where the standardized value of 0.60 is 0.60 — 0.55
0.0222 That is, if we were to select a random sample of n = 500 observations from a population
with proportion p equal to 0.55, the probability that the sample proportion 13‘ would be larger
than or equal to 0.60 is only 0122.. = 2.25.. HOME‘W‘ORK: pp279 — 281
7.37, 7.41, 7.43, 7.45, 7.47 ...
View
Full
Document
 Spring '08
 Cheng
 Statistics

Click to edit the document details