This preview shows page 1. Sign up to view the full content.
Unformatted text preview: they do not take into
account the predicted labels of the surrounding words. This can be done using
probabilistic models of sequences of labels and features. Frequently used is the
hidden Markov model (HMM), which is based on the conditional distributions
of current labels L( j) given the previous label L( j−1) and the distribution of the 46 LDV-FORUM A Brief Survey of Text Mining
current word t( j) given the current and the previous labels L( j) , L( j−1) .
L ( j ) ∼ p ( L ( j ) | L ( j −1) ) t ( j ) ∼ p ( t ( j ) | L ( j ) , L ( j −1) ) (18) A training set of words and their correct labels is required. For the observed
words the algorithm takes into account all possible sequences of labels and
computes their probabilities. An efﬁcient learning method that exploits the sequential structure is the Viterbi algorithm (Rabiner 1989). Hidden Markov models were successfully used for named entity extraction, e.g. in the Identiﬁnder
system (Bikel et al. 1999).
3.3.3 Conditional Random Fields Hidden Markov models require the conditional independence of features of
different words given the labels. This is quite restrictive as we would like to
include features which correspond to several words simultaneously. A recent
View Full Document
This note was uploaded on 06/19/2011 for the course IT 2258 taught by Professor Aymenali during the Summer '11 term at Abu Dhabi University.
- Summer '11