Chapter 5 Properties of a Random Sample
5.1 Population and sample
(CB 5.1)
Suppose we are interested in the distribution or certain features of a collection
of data. We call this collection a population
.
Suppose for some reason these data are not well documented or easily ac
cessible and the distribution or features that we are interested in cannot be
readily computed.
A simple example of this is household incomes.
If we
wish to know the true average U.S. household income for this month then
we have a very big task on hand because we have to gather the information
from hundreds of millions of families.
A solution of this is to draw a sample
from the population, in other words to
select a subset from the population, and use the sample information to make
inference on the truth.
How to best do this and how to handle sampling
variability are among the most important issues in statistics
The populations features that might be of interest include:
the shape of
the distribution (is it symmetric or skewed, does it have one single peak
or multiple peaks, etc.), whether a standard distribution (normal, gamma,
Weibull, Poisson, etc) could serve as a reasonable approximation, what is the
mean, variance, percentiles, etc.
Any number which can be computed from the population is called a parameter
.
Common parameters of interest are the mean, variance, percentiles, mode
(most probable value).
A statistic
is any number calculated from the sample data.
Suppose the
sample data are
X
1
, . . . , X
n
.
Examples of statistics are the sample mean
=
¯
X
=
n

1
∑
n
i
=1
X
i
, sample variance = (
n

1)

1
∑
n
i
=1
(
X
i

¯
X
)
2
, sample
percentiles, sample range (= sample maximum

sample minimum).
Consider the experiment of drawing at random a sample from a population.
Before the sample is drawn, we can think of the sample values to be observed
as random variables. In that sense we can also think of any statistic computed
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
from these values as a random variable and speak about its distribution, called
sampling distribution
. After the sample is drawn, then we see the value of
the statistic and there would be no distribution to speak of.
Typically we will assume that the population size is much bigger than the
sample size and that the sample observations are drawn from the population
independently of one another under very similar sampling conditions.
As
such the random variables in the sample will be approximately independent
and have very similar distributions.
We say that a collection of rv’s
X
1
, . . . , X
n
form a random sample
if they are
iid. We will for the most part assume that this is case. In practice, this is of
course often violated. The iid. theory is nevertheless relavant since the iid.
model can be used as a fundamental building block for complicated models
of dependence.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '11
 ali
 FDistribution, Normal Distribution, Probability, Variance, Probability theory, yn, lim P

Click to edit the document details