THE THEORY OF POINT ESTIMATION
A point estimator uses the information available in a sample to obtain a single
number that estimates a population parameter. There can be a variety of estimators
of the same parameter that derive from different principles of estimation, and it
is necessary to choose amongst them. Therefore, criteria are required that will
indicate which are the acceptable estimators and which of these is the best in given
circumstances.
Often, the choice of an estimate is governed by practical considerations such as
the ease of computation or the ready availability of a computer program. In what
follows, we shall ignore such considerations in order to concentrate on the criteria
that, ideally, we would wish to adopt. The circumstances that affect the choice are
more complicated than one might imagine at first.
Consider an estimator
ˆ
θ
n
based on a sample of size
n
that purports to estimate
a population parameter
θ
, and let
˜
θ
n
be any other estimator based on the same
sample. If
P
(
θ
−
c
1
≤
ˆ
θ
n
≤
θ
+
c
2
)
≥
P
(
θ
−
c
1
≤
˜
θ
n
≤
θ
+
c
2
)
for all values of
c
1
, c
2
>
0, then
ˆ
θ
n
would be unequivocally the best estimator.
However, it is possible to show that an estimator has such a property only in very
restricted circumstances.
Instead, we must evaluate an estimator against a list of partial criteria, of
which we shall itemise the principal ones. The first is the criterion of unbiasedness.
(a)
ˆ
θ
is an unbiased estimator of the population parameter
θ
if
E
(
ˆ
θ
) =
θ
.
On it own, this is an insuﬃcient criterion. Thus, for example, we observe that both
a single element
x
i
from a random sample of size
n
and the average ¯
x
=
∑
x
i
/n
of all the sample elements constitute unbiased estimators of the population mean
μ
=
E
(
x
i
). However, the sample average is always the preferred estimator.
This suggests that we must also consider the dispersion of the estimators,
which, in the cases of the examples above, are
V
(
x
i
) =
σ
2
, which is the population
variance, and
V
(¯
x
) =
σ
2
/n
, which is some fraction of that variance.
We observe that
V
(¯
x
)
→
0 and
n
→ ∞
. The collapse of its variance, together
with the fact that it is an unbiased estimate of
μ
, ensures that ¯
x
converges on
μ
as
the sample size increases, which is to say that it is a consistent estimator.
Some quite reasonable estimators do not have finite expected values or finite
variances, in which case the criteria that make reference to these moments become
irrelevant.
For example, the obvious estimate of 1
/μ
, based on a random sample
x
i
∼
N
(
μ, σ
2
);
i
= 1
, . . . , n
, is 1
/
¯
x
. However, for any finite value of
μ
there is a finite
probability that ¯
x
will fall in the arbitrarily small interval [
−
,
] that contains
zero. Division by zero generates infinity; and, for this reason, the integral that
would, otherwise, define the expectation of 1
/
¯
x
does not converge. For a while,
such pathologies will be ignored, and we shall persist in itemising the criteria.