Statistical Inference for FE
Professor S. Kou, Department of IEOR, Columbia University
Lecture 1. Basic Estimation Principles
1
Introduction to Statistics
What is Statistics? Statistics is the study of collecting, analyzing, and inter
preting quantitative data in such a way that the reliability of the conclusions
can be evaluated in a scienti
fi
c way. Three major problems in statistics: (1)
How to design experiments and collect data that better and e
ﬃ
ciently ad
dress the questions of interest (Experimental Design and Sampling Design).
(2) How to describe the major features and detect patterns in the data (Ex
ploratory Data Analysis). (3) How to account for sampling variability and
bias and draw reliable conclusions from the data (Inference).
To begin the discussion, we shall talk about the di
ff
erence between sam
ple and population. Samples are observations from a given population. For
example, suppose we have observations,
X
1
, ..., X
n
from the same popula
tion, then the sample mean is
¯
X
=
1
n
P
n
i
=1
X
i
while the population mean
E
[
X
] =
μ
, which is typically an unknown parameter.
To estimate unknown population parameters population, we use various
statistics. A statistic is a function computed from the data in a sample. In
particular, a (point) estimator
ˆ
θ
is a statistic computed from a sample that
gives a single value for the unknown population parameter
θ
. Note that a
statistic is a random variable while a unknown parameter is a constant.
To do hypothesis testing, we use test statistics.
A test statistic is a
statistic computed from a sample that is used to conduct hypothesis testing.
1.1
Random Sampling and Selection Bias
A central principle in statistics is that we prefer to have random samples
from the population, rather than selfselected samples from the population.
However, this may not be feasible in social science, and observation studies.
Selfselection biases has been encountered in various subjects, e.g. astron
omy, medicine,
fi
nance, economics, sociology, public opinion polls, etc.
To give a simple illustration. During 1936 presidential election, the mag
azine “Literary Gazette” did a public opinion poll to make a forecast for the
election.
Out of 10 million people sampled, 2.3 million people responded,
among which 57% indicated that they favored the republican candidate Lan
don and 43% indicated the democratic candidate Roosevelt. Of course, the
result was completely wrong. The reason is that although they sampled 2.3
million people, they got these people from telephone directories.
In 1936
only rich people could a
ff
ord telephones, and rich people tend to vote for
republicans.
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
The biased samples also appeared in recent days. For example, in 1983,
A national television news program invited its viewers to participate in a
“phonein” on the issue whether the U.N. should continue to be based in
the U.S. The phonein result was:
yes, 33% and no 67%, with a sample
size 180,000. However, a more scienti
fi
c survey several days later based on
only about 1,000 random samples revealed that about 78% of people in U.S.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '10
 kou
 Normal Distribution, Standard Deviation, Estimation theory, Bias of an estimator, unknown parameter

Click to edit the document details