3.2. The Scan Statistic
In this section, we describe the methodology behind scanning for local
anomalies in a graph over time. Windowing in this space is then discussed,
followed by the definition of the scan statistic.
3.2.1.
Windows in the cross product space
We are interested in examining sets of windows in the
T ime
×
Graph
prod
uct space. We define these sets of windows as follows. We have a graph
G
= (
V, E
) with node set
V
and edge set
E
. For each edge
e
∈
E
, at
discrete time points
t
∈ {
1
, . . . , T
}
, we have a data process
X
e
(
t
). We
denote the set of time windows on edges
e
over discretized time intervals
(
s, s
+ 1
, . . . , k
) as Ω =
{
[
e,
(
s, s
+ 1
, . . . , k
)] :
e
∈
E,
0
≤
s < k
≤
T
}
.
The set of all subsets of windows, Γ =
{{
w
1
, w
2
, . . .
}
:
w
j
∈
Ω
}
, is
usually very large, and we are normally interested in only a subset, Γ
s
⊂
Γ,
that contains locality constraints in time and in graph space. We therefore
restrict our attention to sets of windows
γ
∈
Γ
s
.
For convenience, we denote
X
(
γ
) as the data in the window given by
γ
.
Next, we assume that for any time point
t
and edge
e
, we can describe
X
e
(
t
)
with a stochastic process (specific examples are given in Section 3.4) with
parameter function given by
θ
e
(
t
). We denote the values of the parameter
functions evaluated in the corresponding set of windows
γ
by
θ
(
γ
). Finally,
we denote the likelihood of the stochastic process on
γ
as
L
(
θ
(
γ
)

X
(
γ
)).
At this point, it is worth returning to our discussion in Section 3.1.3 of
the 3path used to detect traversal. In this example,
X
e
(
t
) are the directed
time series of counts of connections between the pair of hosts that define
each edge
e
. Then, Ω is the set of all (edge, time interval) pairs. We would
like to combine edges to form shapes, so we take all subsets of Ω and
call that Γ. For this example, we now restrict our set of shapes to sets
consisting of three (edge, time interval) pairs such that the edges form a
directed 3path, and the time interval is selected to be the same on each
edge. In the simulations and real network example, the time intervals are
30 minutes long, and overlap by ten minutes with the next time window,
and are identical on each edge in the shape. These are then the windows
γ
that are used in the 3path scan shape.
Copyright © 2014. Imperial College Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
U.S. or applicable copyright law.
EBSCO Publishing : eBook Collection (EBSCOhost)  printed on 2/16/2016 3:37 AM via CGCGROUP OF
COLLEGES (GHARUAN)
AN: 779681 ; Heard, Nicholas, Adams, Niall M..; Data Analysis for Network Cybersecurity
Account: ns224671