Statistical Inference for FE
Professor S. Kou, Department of IEOR, Columbia University
Lecture 1. Basic Estimation Principles
1 Introduction to Statistics
What is Statistics? Statistics is the study of collecting, analyzing, and inter-
preting quantitative data in such a way that the reliability of the conclusions
can be evaluated in a scienti
c way. Three major problems in statistics: (1)
How to design experiments and collect data that better and e
dress the questions of interest (Experimental Design and Sampling Design).
(2) How to describe the major features and detect patterns in the data (Ex-
ploratory Data Analysis). (3) How to account for sampling variability and
bias and draw reliable conclusions from the data (Inference).
To begin the discussion, we shall talk about the di
erence between sam-
ple and population. Samples are observations from a given population. For
example, suppose we have observations,
from the same popula-
tion, then the sample mean is
while the population mean
, which is typically an unknown parameter.
To estimate unknown population parameters population, we use various
statistics. A statistic is a function computed from the data in a sample. In
particular, a (point) estimator
is a statistic computed from a sample that
gives a single value for the unknown population parameter
statistic is a random variable while a unknown parameter is a constant.
To do hypothesis testing, we use test statistics. A test statistic is a
statistic computed from a sample that is used to conduct hypothesis testing.
1.1 Random Sampling and Selection Bias
A central principle in statistics is that we prefer to have random samples
from the population, rather than self-selected samples from the population.
However, this may not be feasible in social science, and observation studies.
Self-selection biases has been encountered in various subjects, e.g. astron-
nance, economics, sociology, public opinion polls, etc.
To give a simple illustration. During 1936 presidential election, the mag-
azine “Literary Gazette” did a public opinion poll to make a forecast for the
election. Out of 10 million people sampled, 2.3 million people responded,
among which 57% indicated that they favored the republican candidate Lan-
don and 43% indicated the democratic candidate Roosevelt. Of course, the
result was completely wrong. The reason is that although they sampled 2.3
million people, they got these people from telephone directories. In 1936
only rich people could a
ord telephones, and rich people tend to vote for