CS 702
Discrete Mathematics and Probability Theory
Spring 2009
Alistair Sinclair, David Tse
Lecture 19
I.I.D. Random Variables
Estimating the bias of a coin
Question:
We want to estimate the proportion
p
of Democrats in the US population, by taking a small
random sample. How large does our sample have to be to guarantee that our estimate will be within (say)
10% (in relative terms) of the true value with probability at least 0.95?
This is perhaps the most basic statistical estimation problem, and shows up everywhere. We will develop
a simple solution that uses only Chebyshev’s inequality. More refined methods can be used to get sharper
results.
Let’s denote the size of our sample by
n
(to be determined), and the number of Democrats in it by the
random variable
S
n
. (The subscript
n
just reminds us that the r.v. depends on the size of the sample.) Then
our estimate will be the value
A
n
=
1
n
S
n
.
Now as has often been the case, we will find it helpful to write
S
n
=
X
1
+
X
2
+
···
+
X
n
, where
X
i
=
braceleftBigg
1
if person
i
in sample is a Democrat;
0
otherwise.
Note that each
X
i
can be viewed as a coin toss, with Heads probability
p
(though of course we do not know
the value of
p
!). And the coin tosses are independent.
1
What is the expectation of our estimate?
E
(
A
n
) =
E
(
1
n
S
n
) =
1
n
E
(
X
1
+
X
2
+
···
+
X
n
) =
1
n
×
(
np
) =
p
.
So for any value of
n
, our estimate will always have the correct expectation
p
. [Such a r.v. is often called an
unbiased estimator
of
p
.] Now presumably, as we increase our sample size
n
, our estimate should get more
and more accurate. This will show up in the fact that the
variance
decreases with
n
: i.e., as
n
increases, the
probability that we are far from the mean
p
will get smaller.
To see this, we need to compute Var
(
A
n
)
.
But
A
n
=
1
n
∑
n
i
=
1
X
i
, which is just a constant times a sum of
independent
random variables. So we can compute Var
(
A
n
)
using the technology we established in the last
lecture note:
Var
(
A
n
) =
Var
(
1
n
n
∑
i
=
1
X
i
) = (
1
n
)
2
Var
(
n
∑
i
=
1
X
i
) = (
1
n
)
2
n
∑
i
=
1
Var
(
X
i
) =
σ
2
n
,
where we have written
σ
2
for the variance of each of the
X
i
. So we see that the variance of
A
n
decreases
linearly with
n
. This fact ensures that, as we take larger and larger sample sizes
n
, the probability that we
deviate much from the expectation
p
gets smaller and smaller.
Let’s now use Chebyshev’s inequality to figure out how large
n
has to be to ensure a specified accuracy
in our estimate of the proportion of Democrats
p
. A natural way to measure this is for us to specify two
1
We are assuming here that the sampling is done “with replacement”; i.e., we select each person in the sample from the entire
population, including those we have already picked. So there is a small chance that we will pick the same person twice.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 PAPADIMITROU
 Central Limit Theorem, Normal Distribution, Variance, Probability theory

Click to edit the document details