Sources of Variation Kerby Shedden Department of Statistics, University of Michigan April 8, 2011 2 / 42
Populations The population is the set of all units (i.e. people, rats, cars) that are of interest in a research study. Examples: 1. If we are retrospectively studying voter participation in a past US presidential election, the population is everyone who was eligible to vote in the election. 2. If we are studying factors that were associated with a voter’s decision about which candidate to vote for in a past US election, the population would be everyone who actually voted in the election. 3. If we are studying patients’ responses to a new type of leukemia therapy, the population is everyone with leukemia (or everyone with leukemia who is in some sense “eligible” for the treatment). 4. If we are studying energy metabolism of Sprague Dawley rats, the population is the set of all Sprague Dawley rats. 3 / 42

Samples and sampling variation A sample is a selected subset of a population. A sample will differ from the population, and different samples will generally differ from each other. The goal of working with a sample is to identify characteristics of the sample that are likely to generalize to the population. Since a sample only allows us to estimate properties of the population, we must assess our confidence that any generalizations we make are correct. Variation in our estimates due to the process of sampling is called “sampling variation.” 4 / 42
Sampling variation For example, suppose the population consists of the three values x = 1 , 2 , 5, with a population mean of EX = 8 / 3. We sample two of the values (“without replacement”), and use the sample mean to estimate the population mean. The following table gives the possible results.

