EM starts with some initial selection for the model parameters, which we
denote
θ
old
.
Initial Parameters.
To obtain
θ
old
, we proceed by assuming that the
highstate emission density, Pr(
X
t
=
x
t

Z
t
= 1
, φ
) only emits positive
counts. This, in effect, makes
Z
t
an observed random variable. Let
b
t,
0
=
I
(
X
t
= 0),
b
t,
1
=
I
(
X
t
>
0). We use initial transition probabilities defined
by the maximum likelihood estimators (MLEs) of the observed Markov
chain: ˜
p
01
=
n
01
n
01
+
n
00
˜
p
10
=
n
10
n
10
+
n
11
, where
n
ij
is the number of times that
the consecutive pair (
b
t
−
1
,i
, b
t,j
) was observed in
x
. An initial estimate for
π
is the steadystate probability given by ˜
π
=
˜
p
01
˜
p
01
+˜
p
10
.
To obtain initial estimates for the highstate emission parameters
φ
,
we collect the samples of
X
such that
X
t
>
0, and call that collection
Y
.
We then calculate ˜
µ
and ˜
σ
2
, the sample mean and variance of
Y
. Finally,
we reparameterize from (˜
µ,
˜
σ
2
) to (˜
µ,
˜
s
) where ˜
s
is the initial size param
eter, via ˜
s
=
˜
µ
2
˜
σ
2
−
˜
µ
. This approach ignores the fact that the highstate
distribution can emit zeros, but for our application these initial values
were suﬃcient for the EM algorithm to converge in a reasonable number of
iterations.
The E step.
In the E step, we take these initial parameter values and find
the posterior distribution of the latent variables Pr(
Z
=
z

X
=
x
,
θ
old
).
This posterior distribution is then used to evaluate the expectation of the
logarithm of the completedata likelihood function, as a function of the
parameters
θ
, to give the function
Q
(
θ
,
θ
old
) defined by
Q
(
θ
,
θ
old
) =
Z
Pr(
Z
=
z

X
=
x
,
θ
old
) log Pr(
X
=
x
,
Z
=
z

θ
)
.
(3.5)
It has been shown (Baum and Sell, 1968; Baker, 1975) that maximization of
Q
(
θ
,
θ
old
) results in increased likelihood. To evaluate
Q
(
θ
,
θ
old
), we intro
duce some notation. Let
γ
(
z
t
) be the marginal posterior of
z
t
and
ξ
(
z
t
−
1
, z
t
)
be the joint posterior of two successive latent variables, so
γ
(
z
t
) = Pr(
Z
t
=
z
t

X
=
x
,
θ
old
)
ξ
(
z
t
−
1
, z
t
) = Pr(
Z
t
−
1
=
z
t
−
1
, Z
t
=
z
t

X
=
x
,
θ
old
)
.
Now for
k
= 0
,
1, the two states of the Markov chain, we denote
z
tk
=
I
(
z
t
=
k
), which is 1 if
z
t
is in state
k
and 0 otherwise. Let
γ
(
z
tk
)
Copyright © 2014. Imperial College Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
U.S. or applicable copyright law.
EBSCO Publishing : eBook Collection (EBSCOhost)  printed on 2/16/2016 3:37 AM via CGCGROUP OF
COLLEGES (GHARUAN)
AN: 779681 ; Heard, Nicholas, Adams, Niall M..; Data Analysis for Network Cybersecurity
Account: ns224671