Unformatted text preview: EM 521
Applied Statistics
Sampling and Sampling
Distributions
AS&W – Chapter 7 1 Sampling and Sampling
Distributions Simple Random Sampling Point Estimation Introduction to Sampling Distributions Sampling Distribution of
x 2 Statistical Inference The
The purpose
purpose of
of statistical
statistical inference
inference is
is to
to obtain
obtain
information
information about
about a
a population
population from
from information
informatio
contained
contained in
in a
a sample.
sample. A
A population
population is
is the
the set
set of
of all
all the
the elements
elements of
of int
int
A
A sample
sample is
is a
a subset
subset of
of the
the population.
population. 3 Statistical Inference The
The sample
sample results
results provide
provide only
only estimates
estimates of
of th
th
values
values of
of the
the population
population characteristics.
characteristics. With
With proper
proper sampling
sampling methods
methods,, the
the sample
sample resu
resu
can
can provide
provide “good”
“good” estimates
estimates of
of the
the population
population
characteristics.
characteristics.
A
A parameter
parameter is
is a
a numerical
numerical characteristic
characteristic of
of a
a
population.
population. 4 Simple Random Sampling:
Finite Population Finite populations are often defined by lists such as: • Organization membership roster
• Credit card account numbers
• Inventory product numbers A simple random sample of size n from a
finite population of size N is a sample selected
such that each possible sample of size n has
the same probability of being selected. 5 Simple Random Sampling:
Finite Population Replacing each sampled element before selecting
subsequent elements is called sampling with
replacement. Sampling without replacement is the procedure
used most often. In large sampling projects, computergenerated
random numbers are often used to automate the
sample selection process. 6 Simple Random Sampling:
Infinite Population Infinite populations are often defined by an
ongoing process whereby the elements of the
population consist of items generated as though
the process would operate indefinitely. A simple random sample from an infinite population
is a sample selected such that the following conditio
are satisfied.
• Each element selected comes from the same
population.
• Each element is selected independently.
7 Simple Random Sampling:
Infinite Population In the case of infinite populations, it is impossible to
obtain a list of all elements in the population. The random number selection procedure cannot be
used for infinite populations. 8 Point Estimation In
In point
point estimation
estimation we
we use
use the
the data
data from
from the
the sa
sa
to
to compute
compute a
a value
value of
of a
a sample
sample statistic
statistic that
that se
se
as
as an
an estimate
estimate of
of a
a population
population parameter.
parameter.
We
x
We refer
refer to
to
mean
mean .. as
as the
the point
point estimator
estimator of
of the
the popu
pop ss is
stand
is the
the point
point estimator
estimator of
of the
the population
population standa
stand
deviation
deviation .. p is
is the
the point
point estimator
estimator of
of the
the population
population propo
propo
9 Sampling Error When the expected value of a point estimator is eq
to the population parameter, the point estimator is
to be unbiased. The absolute value of the difference between an
unbiased point estimate and the corresponding
population parameter is called the sampling error. Sampling error is the result of using a subset of the
population (the sample), and not the entire
population. Statistical methods can be used to make probability
statements about the size of the sampling error.
10 Sampling Error The sampling errors are:
 x  for sample mean
 s  for sample standard deviation  p p for sample proportion 11 Example: St. Andrew’s
St. Andrew’s College receives
900 applications annually from
prospective students. The
application form contains
a variety of information
including the individual’s
scholastic aptitude test (SAT) score and whether or
not the individual desires oncampus housing. 12 Example: St. Andrew’s
The director of admissions
would like to know the
following information: •the average SAT score for the 900 applicants, and
•the proportion of
applicants that want to live on campus. 13 Example: St. Andrew’s
We will now look at three
alternatives for obtaining the
desired information. Conducting a census of the
entire 900 applicants Selecting a sample of 30
applicants, using a random number table Selecting a sample of 30 applicants, using Excel 14 Conducting a Census If the relevant data for the entire 900
applicants were in the college’s database,
the population parameters of interest could
be calculated using the formulas we covered
before. We will assume for the moment that
conducting a census is practical in this
example. 15 Conducting a Census Population Mean SAT Score
x i 900 Population Standard Deviation for SAT
2
Score
(
x ) i 990 900 80 Population Proportion Wanting OnCampus Housing 648
p 900 .72 16 Simple Random Sampling Now suppose that the necessary data on the
current year’s applicants were not yet entered in
college’s database. Furthermore, the Director of Admissions must obt
ob
estimates of the population parameters of intere
a meeting taking place in a few hours. She decides a sample of 30 applicants will be use The applicants were numbered, from 1 to 900, as
their applications arrived.
17 Simple Random Sampling:
Using a Random Number Table Taking a Sample of 30 Applicants • Because the finite population has 900 elements, w
will need 3digit random numbers to randomly
select applicants numbered from 1 to 900.
• We will use the last three digits of the 5digit
random numbers in the third column of the
textbook’s random number table, and continue
into the fourth column as needed. 18 Simple Random Sampling:
Using a Random Number Table Taking a Sample of 30 Applicants • The numbers we draw will be the numbers of
the
applicants we will sample unless
• the random number is greater than 900 or
• the random number has already been used. • We will continue to draw random numbers until
we have selected 30 applicants for our sample. • (We will go through all of column 3 and part of
column 4 of the random number table,
encountering
in the process five numbers greater than 900 and
one duplicate, 835.)
19 Simple Random Sampling:
Using a Random Number Table Use of Random Numbers for Sampling 3Digit
Applicant
Random Number
Included in Sample
744
No. 744
436
No. 436
865
No. 865
790
No. 790
835
No. 835
Number exceeds 900
902
190
No. 190
836
No. 836
. . . and so on 20 Simple Random Sampling:
Using a Random Number Table Sample Data
Random
No. Number
1
744
2
436
3
865
4
790
5
835
.
.
.
.
30
498 SAT Live OnScore Campus
Applicant
Conrad Harris 1025
Yes
Enrique Romero
950
Yes
Fabian Avante 1090
No
Lucila Cruz
1120
Yes
Chan Chiang
930
No
.
.
.
.
.
.
Emily Morse 1010
No
21 Simple Random Sampling:
Using a Computer Taking a Sample of 30 Applicants • Computers can be used to generate random
numbers for selecting random samples.
• For example, Excel’s function
= RANDBETWEEN(1,900)
can be used to generate random numbers betw
bet
1 and 900.
• Then we choose the 30 applicants correspondin
to the 30 smallest random numbers as our sam
22 Point Estimation x as Point Estimator of x x 29,910 997
30
30
i s as Point Estimator of s 2
(
x x
) i 29 163,996
75.2
29 p as Point Estimator of p
p 20 30 .68 Note: Different random numbers would have identified a different sample which would have resulted in different point
estimates.
23 Summary of Point Estimates
Obtained from a Simple Random Sample
Population
Parameter Parameter
Value Point
Estimator Point
Estimate = Population mean 990
SAT score x = Sample mean 997 80 s = Sample std. 75.2
deviation for
SAT score = Population std.
deviation for
SAT score .72
p = Population proportion wanting
campus housing SAT score p = Sample pro .68 portion wanting
campus housing
24 Sampling Distribution of x A random variable is defined as: numerical
description of the outcome of an experiment.
Consider the process of selecting a simple random
sample as an experiment
Then, the sample mean is a random variable since it
is a numerical description of the sample selection
experiment
Just like other random variables sample mean has an
expected value and a standard deviation, and a
probability distribution 25 Sampling Distribution of x The sampling distribution of
x
is the probability
distribution of all possible values of the sample
mean x .
Expected Value ofx
E( x ) = where: = the population mean 26 Sampling Distribution of x Standard Deviation of x
Finite Population Infinite Population N n x ( )
n N1 x n • A finite population is treated as being
infinite if n/N < .05.
• ( N n) / ( N 1)
is the finite correction factor.
• x is referred to as the standard
error of the
mean.
27 Form of the Sampling Distribution ofx If we use a large (n > 30) simple random sample, the
central limit theorem enables us to conclude that the
x
sampling distribution of
can be approximated by
a normal distribution.
When the simple random sample is small (n < 30),
the sampling distribution xof
can be considered
normal only if we assume the population has a
normal distribution. 28 Sampling Distribution of x for SAT Scores Sampling
Distribution
of x E(x) 990 x 80 14.6
n
30 x 29 Sampling Distribution of x for SAT Scores What is the probability that a simple random sampl
samp
of 30 applicants will provide an estimate of the
population mean SAT score that is within +/10 of
the actual population mean ?
In other words, what is the probability xthat
will b
between 980 and 1000? 30 Sampling Distribution of x for SAT Scores Step 1: Calculate the zvalue at the upper endpoint
the interval.
z = (1000 990)/14.6= .68
Step 2: Find the area under the curve to the left of
upper endpoint.
P(z < .68) = .7517 31 Sampling Distribution of x for SAT Scores z
. .00
. Cumulative Probabilities for
the Standard Normal
Distribution
.01
.02
.03
.04
.05
.06
.07
.
.
.
.
.
.
. .5 .6915 .6950
.6 .7257 .7291
.7 .7580 .7611
.8 .7881 .7910 .6985 .7019 .7054 .7088 .7123 .7157
.7324 .7357 .7389 .7422 .7454 .7486
.7642 .7673 .7704 .7734 .7764 .7794
.7939 .7967 .7995 .8023 .8051 .8078 .08
. .09
. .7190 .7224
.7517 .7549
.7823 .7852 .8106 .8133
.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
.
.
.
.
.
.
.
.
.
.
.
32 Sampling Distribution of x for SAT Scores Sampling
Distribution
of x x 14.6 Area = .7517 x
990 1000
33 Sampling Distribution of x for SAT Scores Step 3: Calculate the zvalue at the lower endpoint
the interval.
z = (980 990)/14.6=  .68
Step 4: Find the area under the curve to the left of
lower endpoint.
P(z < .68) = P(z > .68)
= 1 P(z < .68)
= 1 . 7517
= .2483
34 Sampling Distribution of x for SAT Scores Sampling
Distribution
of x x 14.6 Area = .2483 x
980 990
35 Sampling Distribution of x for SAT Scores Step 5: Calculate the area under the curve between
the lower and upper endpoints of the interval
P(.68 < z < .68) = P(z < .68) P(z < .68)
= .7517 .2483
= .5034
The probability that the sample mean SAT
score will
be between 980 and 1000 is:
P(980 < x< 1000) = .5034 36 Sampling Distribution of x for SAT Scores Sampling
Distribution
of x x 14.6
Area = .5034 980 990 1000 x
37 Relationship Between the Sample Size
and the Sampling Distribution
x of Suppose we select a simple random sample of 100
applicants instead of the 30 originally considered. E(x ) = regardless of the sample size. In
x
our
example, Ethe
( ) sample
remainssize
at 990. Whenever
is increased, the standar
x
error of the mean
is decreased. With the increa
in the sample size to n = 100, the standard error of
mean is decreased to:
x 80 8.0
n
100
38 Relationship Between the Sample Size
and the Sampling Distribution
x of With n = 100, x 8 With n = 30, x 14.6 E(x) 990 x
39 Relationship Between the Sample Size
and the Sampling Distribution
x of Recall that when n = 30, P(980
x < < 1000) = .503 We follow the same steps to solve for P(980
x < <1
when n = 100 as we showed earlier when n = 30. Now, with n = 100, P(980
x< < 1000) = .7888. Because the sampling distribution with n = 100 has
x of have less
smaller standard error, the values
variability and tend to be closer to the population
mean than the valuesx of
with n = 30. 40 Relationship Between the Sample Size
and the Sampling Distribution
x of
Sampling
Distribution
of x x 8 Area = .7888 x
980 990 1000 41 ...
View
Full Document
 Spring '13
 aslan
 Normal Distribution, Standard Deviation

Click to edit the document details