This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Stat 411 Lecture Notes Statistics and sampling distributions * Ryan Martin Spring 2012 1 Introduction Statistics is closely related to probability theory, but the two fields have entirely different goals. Recall, from Stat 401, that a typical probability problem starts with some as- sumptions about the distribution of a random variable (e.g., that its binomial), and the objective is to derive some properties (probabilities, expected values, etc) of said random variable based on the stated assumptions. The statistics problem goes almost completely the other way around. Indeed, in statistics, a sample from a given population is observed, and the goal is to learn something about that population based on the sample. In other words, the goal in statistics is to reason from sample to population, rather than from population to sample as in the case of probability. So while the two thingsprobability and statisticsare closely related, there is clearly a sharp difference. One could even make the case that a statistics problem is actually more challenging than a probability problem because it requires more than just mathematics to solve. (The point here is that, in a statistics problem, theres simply too much information missing about the popula- tion to be able to derive the answer via the deductive reasoning of mathematics.) The goal of Stat 411 is to develop the mathematical theory of statistics, mostly building on multivariate calculus and probability theory at the level of Stat 401. To understand the goal a bit better, lets start with some notation. Let X 1 ,...,X n be a random sample (independent and identically distributed, iid) from a distribution with cumulative distribution function (CDF) F ( x ). The CDF admits a probability mass function (PMF) p ( x ) in the discrete case and a probability density function (PDF) f ( x ) in the continuous case. One can imagine that p ( x ) or f ( x ) characterizes the population from which X 1 ,...,X n is sampled from. Typically, there is something about this population that is unknown; otherwise, theres not much point in sampling from it. For example, if the population in question is of registered voters in Cook county, then one might be interested in the unknown proportion that would vote democrat in the upcoming election. The goal would be to estimate this proportion from a sample. But the point here is * Version: January 11, 2012 Please do not distribute these notes without the authors consent ( firstname.lastname@example.org ) These notes are meant to supplement in-class lectures. The author makes no guarantees that these notes are free of typos or other, more serious errors. 1 that the population/distribution of interest is not completely known. Mathematically, we handle this by introducing a quantity , taking values in some R d , d 1, and weakening the initial assumption by saying that the distribution in question has PMF or PDF of the form p ( x ) or f...
View Full Document