6.3
Connections
Between
a.s.
and
i.p.
Convergence
171
these observations must decide which is the correct model; that is, which is the
correct value of(}. Statistical
estimation
means: select the correct model.
For example, suppose
S1
=
lR
00
,
B
=
B(lR
00
).
Let
w
=
(x
1
,
xz,
... )
and
define
Xn(w)
=
Xn.
For each(}
E
IR,
let
Pe
be product measure on
lR
00
which
makes
{Xn, n
1}
iid with common
N(O,
1)
distribution. Based on observing
X
1,
...
,
X
n,
one estimates
(}
with an appropriate function
of
the observations
On
=
On(Xt.
...
,
Xn).
On(X1,
...
,
Xn)
is called a
statistic
and is also an
estimator.
When one actually
does the experiment and observes,
then
O(x1,
...
•
xn)
is called the
estimate.
So
the
estimator
is a random element
while the
estimate
is a number or maybe a vector if(} is multidimensional.
In this example, the usual choice
of
estimator is
On
=
2:7
=
1
X;
In.
The estima-
tor
On
is
weakly
consistent
if
for all
(}
E
8
Pe[IOn-
01
>
E]-+
0,
n-+
oo;
that is,
A
p8
On
-+
0.
This indicates that no matter what the true parameter is or to put it another way, no
matter what the true (but unknown) state
of
nature is,
0
does a good job estimating
the true parameter.
On
is
strongly consistent
if
for
all(}
E
e,
On
-+
(},
Pe-a.s.
This is obviously stronger than weak consistency.
6.3
Connections Between a.s. and i.p. Convergence
Here we discuss the basic relations between convergence in probability and almost
sure convergence. These relations have certain ramifications such as extension
of
the dominated convergence principle to convergence in probability.
Theorem
6.3.1
(Relations between i.p.
and
a.s. convergence)
Suppose
that
{Xn,
X, n
1}
are
real-valued
random
variables.
(a)
Cauchy
criterion:
{X
n}
converges
in
probability iff
{X
n}
is
Cauchy
in
prob-
ability.
Cauchy
in
probability
means
p
Xn
-Xm-+
0,
asn,m-+
00.
or
more
precisely,
given
any
E
>
0,
8
>
0,
there
exists
no
=
no(E,
8)
such
that for
all
r,
s
no
we
have
P[IXr-
Xsl
>
E]
<
8.
(6.1)