Inference for Graphs and Networks
7
Fitting the parameter
p
is straightforward; the maximum likelihood
estimator (MLE) corresponds to the sample proportion of observed links:
p
:=
1
(
n
2
)
i<j
A
ij
=
1
n
(
n
−
1)
n
i
=1
n
j
=1
A
ij
.
Example 1.1, for instance, yields
p
= 14
/
45.
Given a relational data set of interest, we can test the agreement of data
in
A
with this model by employing an appropriately selected test statistic.
If we wish to test this uniformly generic model with respect to the notion of
network structure, we may explicitly define an alternate model and appeal
to the classical Neyman–Pearson testing framework.
In this vein, the Erd¨
os–R´
enyi model can be generalized in a natural
way to capture the notion of local rather than global exchangeability: we
simply allow Bernoulli parameters to depend on
k
-ary categorical covariates
c
(
i
) associated with each node
i
∈ {
1
,
2
, . . ., n
}
, where the
k
≤
n
categories
represent groupings of nodes. Formally we define
c
∈
Z
n
k
;
c
(
i
) :
{
1
,
2
, . . ., n
} →
Z
k
,
and a set of
(
k
+1
2
)
distinct Bernoulli parameters governing link probabilities
within and between these categories, arranged into a
k
×
k
symmetric matrix
and indexed as
p
c
(
i
)
c
(
j
)
for
i, j
∈ {
1
,
2
, . . ., n
}
.
In the case of binary categorical covariates, we immediately obtain
a formulation of Holland and Leinhardt (1981), the simplest example of
a so-called
stochastic block model
. In this network model, pairwise links
between nodes correspond again to Bernoulli trials, but with a parameter
chosen from the set
{
p
00
, p
01
, p
11
}
according to binary categorical covariates
associated with the nodes in question.
Definition 1.2 (Simple Stochastic Block Model).
Let
c
∈ {
0
,
1
}
n
be
a
binary
n
-vector
for
some
integer
n
>
1
,
and
fix
parameters
p
00
, p
01
, p
11
∈
[0
,
1]
. Set
p
10
=
p
01
;
the model then corresponds to matrices
A
∈ {
0
,
1
}
n
×
n
defined element-wise as
∀
i, j
∈ {
1
,
2
, . . ., n
}
:
i < j, A
ij
∼
Bernoulli(
p
c
(
i
)
c
(
j
)
);
A
ji
=
A
ij
, A
ii
= 0
.
If the vector of covariates
c
is given, then finding the maximum-
likelihood parameter estimates
{
p
00
, p
01
, p
11
}
is trivial after a re-ordering of
Copyright © 2014. Imperial College Press. All rights reserved. May not be reproduced in any form without permission from the publisher, except fair uses permitted under
U.S. or applicable copyright law.
EBSCO Publishing : eBook Collection (EBSCOhost) - printed on 2/16/2016 3:37 AM via CGC-GROUP OF
COLLEGES (GHARUAN)
AN: 779681 ; Heard, Nicholas, Adams, Niall M..; Data Analysis for Network Cyber-security
Account: ns224671