Best Linear Estimators
ARE 210
Page 1
BLE, BLUE and BLMSE
1.
How do we estimate the unknown parameters of a probability distribution?
2.
What kind of inferences can we make based on those parameter estimates?
3.
Under what conditions is our rule for estimating these unknown parameters optimal in
some reasonable sense?
4.
When can we do better, and when not?
In these notes, I develop three kinds of estimators that are linear in the observations (data)
that can all be thought of in terms of optimization theory.
The
BLE
(the
B
est
L
inear
E
stimator) chooses a linear combination of the observations
from a random sample to minimize the variance without constraint. The result is weight
zero on each and every data point. While this estimator does in fact attain the global un
restricted minimum of the variance for an estimator (its variance is always zero!), it may
be biased. Indeed, it will be biased with probability one for all sample sizes and probabil
ity distributions.
The
BLUE (B
est
L
inear
U
nbiased
E
stimator) principle minimizes the variance of the
chosen linear combination of the data subject to the constraint that the estimator must be
unbiased.
The
BLMSE
(B
est
L
inear
M
ean
S
quared
E
rror
E
stimator) principle weights the square
of the bias equally with the variance in the objective function and minimizes the unre
stricted global minimum of the sum (bias
2
+ variance).
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentBest Linear Estimators
ARE 210
Page 2
We begin by supposing that we have a random sample of
i.i.d.
random variables,
12
,,,
n
yy
y
"
. Let the population mean for the underlying probability distribution for the
y
’s be
µ
and let the population variance be
σ
2
. Both of these are unknown.
For now, we will not make any further assumptions about the distribution. We do not as
sume that we know the functional form of the
pdf
(such as normal). We will, however,
restrict our attention to linear combinations of the data, say
1
ˆ
n
ii
i
wy
=
µ=
∑
, where the
“weights”
w
i
are choice variables, to make calculating expectations simpler and to pose
the estimation problem better.
Writing the mean of
ˆ
µ
as
()
( )
11
1
ˆ
( )
nn
n
i
i
i
i
EE
w
y
w
E
y
w
==
=
=
=
µ
∑∑
∑
,
(1)
the variance of ˆ
µ
is equal to
[]
{}
( )
2
2
2
ˆ
ˆˆ
i
E
w
y
w
µ
σ=
µ− µ
=
−
µ
{ }
2
1
n
i
Ew
y
=
=−
µ
∑
.
(2)
We seek to choose the weights
w
i
, for
i
= 1,…,
n
, to minimize this function. Using the
composite function theorem, the necessary firstorder conditions are
2
ˆ
1
2(
)
(
)
0
1,
,
n
ij
j
j
i
Ey
w
y
i
n
w
µ
=
∂σ
µ
−
µ
=
∀
=
∂
∑
"
.
(3)
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '07
 LAFRANCE
 Normal Distribution, Standard Deviation, Variance, Probability theory, Mean squared error, best linear

Click to edit the document details