This preview shows page 1. Sign up to view the full content.
Unformatted text preview: or modelling this type of data is called conditional random ﬁeld (CRF,
cf. Lafferty et al. (2001)). Again we consider the observed vector of words t and
the corresponding vector of labels L. The labels have a graph structure. For a
label Lc let N (c) be the indices of neighboring labels. Then (t, L) is a conditional
random ﬁeld when conditioned on the vector t of all terms the random variables
obey the Markov property
p( Lc |t, Ld ; d = c) = p( Lc |t, Ld ; d ∈ N (c)) (19) i.e. the whole vector t of observed terms and the labels of neighbors may inﬂuence the distribution of the label Lc . Note that we do not model the distribution
p(t) of the observed words, which may exhibit arbitrary dependencies.
We consider the simple case that the words t = (t1 , t2 , . . . , tn ) and the corresponding labels L1 , L2 , . . . , Ln have a chain structure and that Lc depends only
on the preceding and succeeding labels Lc−1 and Lc+1 . Then the conditional
distribution p(L|t) has the form
n −1 m j
View Full Document
- Summer '11