This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Chapter 5 Properties of a Random Sample 5.1 Population and sample (CB 5.1) Suppose we are interested in the distribution or certain features of a collection of data. We call this collection a population . Suppose for some reason these data are not well documented or easily ac- cessible and the distribution or features that we are interested in cannot be readily computed. A simple example of this is household incomes. If we wish to know the true average U.S. household income for this month then we have a very big task on hand because we have to gather the information from hundreds of millions of families. A solution of this is to draw a sample from the population, in other words to select a subset from the population, and use the sample information to make inference on the truth. How to best do this and how to handle sampling variability are among the most important issues in statistics The populations features that might be of interest include: the shape of the distribution (is it symmetric or skewed, does it have one single peak or multiple peaks, etc.), whether a standard distribution (normal, gamma, Weibull, Poisson, etc) could serve as a reasonable approximation, what is the mean, variance, percentiles, etc. Any number which can be computed from the population is called a parameter . Common parameters of interest are the mean, variance, percentiles, mode (most probable value). A statistic is any number calculated from the sample data. Suppose the sample data are X 1 , . . . , X n . Examples of statistics are the sample mean = ¯ X = n- 1 ∑ n i =1 X i , sample variance = ( n- 1)- 1 ∑ n i =1 ( X i- ¯ X ) 2 , sample percentiles, sample range (= sample maximum- sample minimum). Consider the experiment of drawing at random a sample from a population. Before the sample is drawn, we can think of the sample values to be observed as random variables. In that sense we can also think of any statistic computed 1 from these values as a random variable and speak about its distribution, called sampling distribution . After the sample is drawn, then we see the value of the statistic and there would be no distribution to speak of. Typically we will assume that the population size is much bigger than the sample size and that the sample observations are drawn from the population independently of one another under very similar sampling conditions. As such the random variables in the sample will be approximately independent and have very similar distributions. We say that a collection of rv’s X 1 , . . . , X n form a random sample if they are iid. We will for the most part assume that this is case. In practice, this is of course often violated. The iid. theory is nevertheless relavant since the iid....
View Full Document