±
±
±
±
±
March
12,
2003
2.6 Bayes estimation
.
The
deﬁnition
of
Bayes
estimator
is
a
special
case
of
the
general
deﬁnition
of
Bayes
decision
rule
given
in
Sec.
1.3.
Given
a
family
{
P
θ
,θ
∈
Θ
}
of
laws,
where
(Θ
,
T
)
is
a
measurable
space,
a
loss
function
L
(
θ,
y
),
the
risk
for
an
estimator
U
at
θ
deﬁned
by
r
(
θ,
U
):=
E
θ
L
(
θ,
U
),
and
a
prior
π
deﬁned
on
(Θ
,
T
),
an
estimator
T
is
Bayes
for
π
iﬀ
the
Bayes
risk
r
(
π,
U
):=
∫
r
(
θ,
U
)
dπ
(
θ
)
has
its
minimum
for
all
statistics
U
when
U
=
T
.
Recall
that
by
Theorem
1.3.8,
if
a
decision
problem
for
a
measurable
family
and
a
given
prior
has
a
decision
rule
with
ﬁnite
risk
and
some
decision
rule
a
(
·
)
minimizes
the
posterior
risk
for
almost
all
x
,
then
it
is
Bayes.
Recall
also
that
if
a
family
{
P
θ
,θ
∈
Θ
}
is
dominated
by
a
σ
ﬁnite
measure
v
,wecan
choose
v
equivalent
to
the
family
by
Lemma
2.1.6.
For
squarederror
loss,
Bayes
estimates
are
just
expectations
for
the
posterior:
2.6.1 Theorem
.Le
t
{
P
θ
,θ
∈
Θ
}
be
a
measurable
family
equivalent
to
a
σ
ﬁnite
measure
v
.L
e
t
π
be
a
prior
on
Θ
and
g
a
measurable
function
from
Θ
into
some
R
d
.T
h
e
n
f
o
r
squarederror
loss,
there
exists
a
Bayes
estimator
for
g
(
θ
)
if
and
only
if
there
exists
an
estimator
U
for
g
(
θ
)
with
ﬁnite
risk,
r
(
π,
U
)=

U
(
x
)
−
g
(
θ
)
2
dP
θ
(
x
)
dπ
(
θ
)
<
.

∞
Then
a
Bayes
estimator
is
given
by
T
(
x
):=
g
(
θ
)
dπ
x
(
θ
)
where
the
integral
with
respect
to
the
posterior
π
x
exists
and
is
ﬁnite
for
v
almost
all
x
.
T
is
the
unique
Bayes
estimator
up
to
equality
v
almost
everywhere.
Thus
T
is
an
admissible
estimator
of
g
.
Proof.
Since
·
2
is
the
sum
of
squares
of
coordinates,
we
can
assume
d
=1
.
By

Propositions
1.3.5
and
1.3.13,
the
posterior
distributions
π
x
have
the
properties
of
regular
conditional
probabilities
of
θ
given
x
as
deﬁned
in
RAP,
Section
10.2.
“Only
if”
holds
since
by
deﬁnition,
a
Bayes
estimator
has
ﬁnite
risk.
To
prove
“if,”
let
U
have
ﬁnite
risk,
r
(
π,
U
)
<
∞
.L
e
t
dQ
(
θ,
x
):
=
dP
θ
(
x
)
dπ
(
θ
)
be
the
usual
joint
distribution
of
θ
and
x
.
Then
the
function
(
θ,
x
)
→
U
(
x
)
−
g
(
θ
)isin
L
2
(
Q
),
even
though
possibly
neither
x
U
(
x
)nor
θ
→
g
(
θ
)is
.
Thus
±
U
(
x
)
−
g
(
θ
)
∈L
1
(
Q
),
and
we
have
the
→
±
conditional
expectation
(by
RAP,
Theorem
10.2.5)
E
(
U
(
x
)
−
g
(
θ
)
x
)=
U
(
x
)
−
g
(
θ
)
dπ
x
(
θ
)=
U
(
x
)
−
g
(
θ
)
dπ
x
(
θ
)

for
v
almost
all
x
,since
U
(
x
)
doesn’t
depend
on
θ
.Thu
s
T
(
x
)
is
welldeﬁned
for
v
almost
all
x
.Now
x
→
U
(
x
)
−
T
(
x
)
is
the
orthogonal
projection
in
L
2
(
Q
)of
U
(
x
)
−
g
(
θ
)into
the
space
H
of
squareintegrable
functions
of
x
for
Q
(RAP,
Theorem
10.2.9),
which
is
unique
up
to
a.s.
equality
(RAP,
Theorem
5.3.8).
Thus
(
U
(
x
)
−
g
(
θ
)
−
f
(
x
))
2
dQ
(
θ,
x
)
is
minimized