Stat 411 – Lecture Notes
Statistics and sampling distributions
*†‡
Ryan Martin
Spring 2012
1
Introduction
Statistics is closely related to probability theory, but the two fields have entirely different
goals.
Recall, from Stat 401, that a typical probability problem starts with some as
sumptions about the distribution of a random variable (e.g., that it’s binomial), and the
objective is to derive some properties (probabilities, expected values, etc) of said random
variable based on the stated assumptions. The statistics problem goes almost completely
the other way around. Indeed, in statistics, a sample from a given population is observed,
and the goal is to learn something about that population based on the sample. In other
words, the goal in statistics is to reason from sample to population, rather than from
population to sample as in the case of probability. So while the two things—probability
and statistics—are closely related, there is clearly a sharp difference.
One could even
make the case that a statistics problem is actually more challenging than a probability
problem because it requires more than just mathematics to solve. (The point here is that,
in a statistics problem, there’s simply too much information missing about the popula
tion to be able to derive
the
answer via the deductive reasoning of mathematics.) The
goal of Stat 411 is to develop the mathematical theory of statistics, mostly building on
multivariate calculus and probability theory at the level of Stat 401.
To understand the goal a bit better, let’s start with some notation. Let
X
1
, . . . , X
n
be a random sample (independent and identically distributed, iid) from a distribution
with cumulative distribution function (CDF)
F
(
x
). The CDF admits a probability mass
function (PMF)
p
(
x
) in the discrete case and a probability density function (PDF)
f
(
x
) in
the continuous case. One can imagine that
p
(
x
) or
f
(
x
) characterizes the population from
which
X
1
, . . . , X
n
is sampled from. Typically, there is something about this population
that is unknown; otherwise, there’s not much point in sampling from it. For example,
if the population in question is of registered voters in Cook county, then one might be
interested in the
unknown
proportion that would vote democrat in the upcoming election.
The goal would be to “estimate” this proportion from a sample. But the point here is
*
Version: January 11, 2012
†
Please do not distribute these notes without the author’s consent (
[email protected]
)
‡
These notes are meant to supplement inclass lectures. The author makes no guarantees that these
notes are free of typos or other, more serious errors.
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
that the population/distribution of interest is not completely known.
Mathematically,
we handle this by introducing a quantity
θ
, taking values in some Θ
⊆
R
d
,
d
≥
1, and
weakening the initial assumption by saying that the distribution in question has PMF or
PDF of the form
p
θ
(
x
) or
f
θ
(
x
) for some
θ
∈
Θ. That is, the statistician believes that
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 STAFF
 Statistics, Normal Distribution, Probability, Probability theory, Tn, Xn

Click to edit the document details