ECEN 489: Information Theory, Inference, and
Learning Algorithms
Chapter 22: Maximum Likelihood and Clustering
Dr. Chao TIAN
Texas A&M University
1 / 20
Maximum Likelihood
Choose the parameter
θ
that maximizes
P
(
{
x
}
θ
).
The Gaussian example: log likelihood
ln
P
(
{
x
n
}
N
n
=1

μ, σ
2
) =

N
ln(
√
2
πσ
)

∑
n
(
x
n

μ
)
2
2
σ
2
.
Sample mean: ¯
x
,
∑
N
n
=1
x
n
/
N
;
Sample squared deviation:
S
,
∑
N
n
=1
(
x
n

¯
x
)
2
.
ln
P
(
{
x
n
}
N
n
=1

μ, σ
2
) =

N
ln(
√
2
πσ
)

N
(
μ

¯
x
)
2
+
S
2
σ
2
.
2 / 20
Maximum Likelihood
Choose the parameter
θ
that maximizes
P
(
{
x
}
θ
).
The Gaussian example: log likelihood
ln
P
(
{
x
n
}
N
n
=1

μ, σ
2
) =

N
ln(
√
2
πσ
)

∑
n
(
x
n

μ
)
2
2
σ
2
.
Sample mean: ¯
x
,
∑
N
n
=1
x
n
/
N
;
Sample squared deviation:
S
,
∑
N
n
=1
(
x
n

¯
x
)
2
.
ln
P
(
{
x
n
}
N
n
=1

μ, σ
2
) =

N
ln(
√
2
πσ
)

N
(
μ

¯
x
)
2
+
S
2
σ
2
.
2 / 20
Maximum Likelihood
Choose the parameter
θ
that maximizes
P
(
{
x
}
θ
).
The Gaussian example: log likelihood
ln
P
(
{
x
n
}
N
n
=1

μ, σ
2
) =

N
ln(
√
2
πσ
)

∑
n
(
x
n

μ
)
2
2
σ
2
.
Sample mean: ¯
x
,
∑
N
n
=1
x
n
/
N
;
Sample squared deviation:
S
,
∑
N
n
=1
(
x
n

¯
x
)
2
.
ln
P
(
{
x
n
}
N
n
=1

μ, σ
2
) =

N
ln(
√
2
πσ
)

N
(
μ

¯
x
)
2
+
S
2
σ
2
.
2 / 20
Maximum Likelihood
Choose the parameter
θ
that maximizes
P
(
{
x
}
θ
).
The Gaussian example: log likelihood
ln
P
(
{
x
n
}
N
n
=1

μ, σ
2
) =

N
ln(
√
2
πσ
)

∑
n
(
x
n

μ
)
2
2
σ
2
.
Sample mean: ¯
x
,
∑
N
n
=1
x
n
/
N
;
Sample squared deviation:
S
,
∑
N
n
=1
(
x
n

¯
x
)
2
.
ln
P
(
{
x
n
}
N
n
=1

μ, σ
2
) =

N
ln(
√
2
πσ
)

N
(
μ

¯
x
)
2
+
S
2
σ
2
.
2 / 20
Sufficient Statistics
Let
T
(
{
x
n
}
N
n
=1
) be some function of
{
x
n
}
N
n
=1
:
T
(
{
x
n
}
N
n
=1
)
↔ {
x
n
}
N
n
=1
↔
θ
implies
I
(
{
x
n
}
N
n
=1
;
θ
)
≥
I
(
T
(
{
x
n
}
N
n
=1
);
θ
)
A sufficient statistics if:
I
(
{
x
n
}
N
n
=1
;
θ
) =
I
(
T
(
{
x
n
}
N
n
=1
);
θ
);
{
x
n
}
N
n
=1
↔
T
(
{
x
n
}
N
n
=1
)
↔
θ
is also a Markov chain;
The likelihood
P
(
{
x
n
}
N
n
=1

θ
,
T
(
{
x
n
}
N
n
=1
)) is not a function of
θ
.
In the Gaussian setting: (¯
x
,
S
) is a sufficient statistics of (
μ, σ
2
)
ln
P
(
{
x
n
}
N
n
=1

μ, σ
2
) =

N
ln(
√
2
πσ
)

N
(
μ

¯
x
)
2
+
S
2
σ
2
.
3 / 20
Sufficient Statistics
Let
T
(
{
x
n
}
N
n
=1
) be some function of
{
x
n
}
N
n
=1
:
T
(
{
x
n
}
N
n
=1
)
↔ {
x
n
}
N
n
=1
↔
θ
implies
I
(
{
x
n
}
N
n
=1
;
θ
)
≥
I
(
T
(
{
x
n
}
N
n
=1
);
θ
)
A sufficient statistics if:
I
(
{
x
n
}
N
n
=1
;
θ
) =
I
(
T
(
{
x
n
}
N
n
=1
);
θ
);
{
x
n
}
N
n
=1
↔
T
(
{
x
n
}
N
n
=1
)
↔
θ
is also a Markov chain;
The likelihood
P
(
{
x
n
}
N
n
=1

θ
,
T
(
{
x
n
}
N
n
=1
You've reached the end of your free preview.
Want to read all 29 pages?
 Fall '19