Unformatted text preview: =
exp ⎝ ∑ ∑ λ jr f jr ( L j , t) + ∑ ∑ µ jr g jr ( L j , L j−1 , t)⎠
const j =1 r =1 j =1 r =1 where f jr ( L j , t) and g jr ( L j , L j−1 , t) are different features functions related to
L j and the pair L j , L j−1 respectively. CRF models encompass hidden Markov Band 20 – 2005 47 Hotho, Nürnberger, and Paaß
models, but they are much more expressive because they allow arbitrary dependencies in the observation sequence and more complex neighborhood structures
of labels. As for most machine learning algorithms a training sample of words
and the correct labels is required. In addition to the identity of words arbitrary
properties of the words, like part-of-speech tags, capitalization, preﬁxes and sufﬁxes, etc. may be used leading to sometimes more than a million features. The
unknown parameter values λ jr and µ jr are usually estimated using conjugate
gradient optimization routines (McCallum 2003).
McCallum (2003) applies CRFs with feature selection to named entity recognition and re...
View Full Document