28.7 Estimation Theory
501
2. Furthermore,
Equation 28.239
states that the
η
i
are normally distributed around the
zero point with an unknown, nonzero variance
σ
2
. To suppose measurement errors to
be normally distributed is quite common and correct in most cases. The white noise
63
in transmission of signals for example is often modeled with Gaussian distributed
64
amplitudes. This second assumption includes, of course, the first one: Being normally
distributed with
N
(
μ
= 0
,σ
2
) implies a zero expected value of the error.
3. With
Equation 28.240
, we assume that the errors
η
i
of the single measurements are
stochastically independent. If there existed a connection between them, it would be part
of the underlying physical law
ϕ
and could be incorporated in our measurement device
and again be subtracted.
Objective: Estimation
Assume that we can choose from a, possible infinite large, set of functions (estimators)
f
∈
F
.
f
∈
F
⇒
f
:
R
mapsto→
R
(28.241)
From this set we want to pick the function
f
⋆
∈
F
with that resembles
ϕ
the best (i. e.,
better than all other
f
∈
F
:
f
negationslash≡
f
⋆
).
ϕ
is not necessarily an element of
F
, so we cannot
always presume to find a
f
⋆
≡
ϕ
.
Each estimator
f
deviates by the estimation error
ε
(
f
) (see
Definition 28.53 on page 499
)
from the
y
i
-values. The estimation error depends on
f
and may vary for different estimators.
y
i
=
f
(
x
i
) +
ε
i
(
f
)
∀
i
: 0
<i
≤
n
(28.242)
We consider all
f
∈
F
to be valid estimators for
ϕ
and simple look for the one that “fits
best”. We now can combine
Equation 28.242
with
Equation 28.237
:
f
(
x
i
) +
ε
i
(
f
) =
y
i
=
ϕ
(
x
i
) +
η
i
∀
i
: 0
<i
≤
n
(28.243)
We do not know
ϕ
and thus, cannot determine the
η
i
. According to the likelihood method,
we pick the function
f
∈
F
that would have most probably produced the outcomes
y
i
. In
other words, we have to maximize the likelihood of the occurrence of the
ε
i
(
f
). The likelihood
here is defined under the assumption that the true measurement errors
η
i
are normally
distributed (see
Equation 28.239
). So what we can do is to determine the
ε
i
in a way that
their occurrence is most probable according to the distribution of the random variable that
created the
η
i
,
N
(0
,σ
2
). In the best case, the
ε
(
f
⋆
) =
η
i
and thus,
f
⋆
is equivalent to
ϕ
(
x
i
),
at least in for the sample information
A
available to us.
Maximizing the Likelihood
Therefore, we can regard the
ε
i
(
f
) as outcomes of independent random experiments, as
uncorrelated random variables, and combine them to a multivariate normal distribution.
For the ease of notation, we define the
ε
(
f
) to be the vector containing all the single
ε
i
(
f
)-
values.
ε
(
f
) =
ε
1
(
f
)
ε
2
(
f
)
.
.
.
ε
n
(
f
)
(28.244)
63
http://en.wikipedia.org/wiki/White_noise
[accessed 2007-07-03]
64
http://en.wikipedia.org/wiki/Gaussian_noise
[accessed 2007-07-03]