� �
�
�
�
�
�
March
12,
2003
2.6 Bayes estimation
.
The
definition
of
Bayes
estimator
is
a
special
case
of
the
general
definition
of
Bayes
decision
rule
given
in
Sec.
1.3.
Given
a
family
{
P
θ
, θ
∈
Θ
}
of
laws,
where
(Θ
,
T
)
is
a
measurable
space,
a
loss
function
L
(
θ,
y
),
the
risk
for
an
estimator
U
at
θ
defined
by
r
(
θ,
U
) :=
E
θ
L
(
θ,
U
),
and
a
prior
π
defined
on
(Θ
,
T
),
an
estimator
T
is
Bayes
for
π
iff
the
Bayes
risk
r
(
π,
U
) :=
∫
r
(
θ,
U
)
dπ
(
θ
)
has
its
minimum
for
all
statistics
U
when
U
=
T
.
Recall
that
by
Theorem
1.3.8,
if
a
decision
problem
for
a
measurable
family
and
a
given
prior
has
a
decision
rule
with
finite
risk
and
some
decision
rule
a
(
·
)
minimizes
the
posterior
risk
for
almost
all
x
,
then
it
is
Bayes.
Recall
also
that
if
a
family
{
P
θ
, θ
∈
Θ
}
is
dominated
by
a
σ
finite
measure
v
, we can
choose
v
equivalent
to
the
family
by
Lemma
2.1.6.
For
squarederror
loss,
Bayes
estimates
are
just
expectations
for
the
posterior:
2.6.1 Theorem
. Let
{
P
θ
, θ
∈
Θ
}
be
a
measurable
family
equivalent
to
a
σ
finite
measure
v
. Let
π
be
a
prior
on
Θ
and
g
a
measurable
function
from
Θ
into
some
R
d
. Then
for
squarederror
loss,
there
exists
a
Bayes
estimator
for
g
(
θ
)
if
and
only
if
there
exists
an
estimator
U
for
g
(
θ
)
with
finite
risk,
r
(
π,
U
) =

U
(
x
)
−
g
(
θ
)
2
dP
θ
(
x
)
dπ
(
θ
)
<
.

∞
Then
a
Bayes
estimator
is
given
by
T
(
x
) :=
g
(
θ
)
dπ
x
(
θ
)
where
the
integral
with
respect
to
the
posterior
π
x
exists
and
is
finite
for
v
almost
all
x
.
T
is
the
unique
Bayes
estimator
up
to
equality
v
almost
everywhere.
Thus
T
is
an
admissible
estimator
of
g
.
Proof.
Since
 ·
2
is
the
sum
of
squares
of
coordinates,
we
can
assume
d
= 1.
By

Propositions
1.3.5
and
1.3.13,
the
posterior
distributions
π
x
have
the
properties
of
regular
conditional
probabilities
of
θ
given
x
as
defined
in
RAP,
Section
10.2.
“Only
if”
holds
since
by
definition,
a
Bayes
estimator
has
finite
risk.
To
prove
“if,”
let
U
have
finite
risk,
r
(
π,
U
)
<
∞
.
Let
dQ
(
θ,
x
)
:=
dP
θ
(
x
)
dπ
(
θ
)
be
the
usual
joint
distribution
of
θ
and
x
.
Then
the
function
(
θ,
x
)
→
U
(
x
)
−
g
(
θ
) is in
L
2
(
Q
),
even
though
possibly
neither
x
U
(
x
) nor
θ
→
g
(
θ
) is.
Thus
U
(
x
)
−
g
(
θ
)
∈ L
1
(
Q
),
and
we
have
the
→
conditional
expectation
(by
RAP,
Theorem
10.2.5)
E
(
U
(
x
)
−
g
(
θ
)
x
) =
U
(
x
)
−
g
(
θ
)
dπ
x
(
θ
) =
U
(
x
)
−
g
(
θ
)
dπ
x
(
θ
)

for
v
almost
all
x
, since
U
(
x
)
doesn’t
depend
on
θ
. Thus
T
(
x
)
is
welldefined
for
v
almost
all
x
. Now
x
→
U
(
x
)
−
T
(
x
)
is
the
orthogonal
projection
in
L
2
(
Q
) of
U
(
x
)
−
g
(
θ
) into
the
space
H
of
squareintegrable
functions
of
x
for
Q
(RAP,
Theorem
10.2.9),
which
is
unique
up
to
a.s.
equality
(RAP,
Theorem
5.3.8).
Thus
(
U
(
x
)
−
g
(
θ
)
−
f
(
x
))
2
dQ
(
θ,
x
)
is
minimized
over
all
squareintegrable
functions
f
of
x
when
and
only
when
f
(
x
) =
U
(
x
)
−
T
(
x
) for
v
almost
all
x
.
For
any
other
estimator
V
(
x
) of
g
(
θ
)
with
finite
risk,
U
−
V
∈
H
. Thus
(
V
(
x
)
−
g
(
θ
))
2
dQ
(
θ,
x
)
is
minimized
among
all
estimators
V
(
x
) of
g
(
θ
) when
V
=
T
, in
other
words,
T
is
a
Bayes
estimator
of
g
(
θ
),
unique
up
to
v
almost
everywhere
equality.