xi yi x yi 1 yi1 px1 xi y1

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview:   j :x=xj ,y =yj p(yj , yj 1 1 | x1 . . . xm ) We can do this with the forward backward algorithm!!! Forward, Backward, Again… p(x1 . . . xn , yi ) = p(x1 . . . xi , yi )p(xi+1 . . . xn |yi ) n༆  Sum over all paths, on both sides of each yi (i, yi ) = p(x1 . . . xi , yi ) = = X yi 1 = yi+1 p(x1 . . . xi , y1 . . . yi ) y1 ...yi e(xi |yi )q (yi |yi (i, yi ) = p(xi+1 . . . xn |yi ) = X X 1) X 1 (i 1, yi 1) p(xi+1 . . . xn , yi+1 . . . yn ) yi+1 ...yn e(xi+1 |yi+1 )q (yi+1 |yi ) (i + 1, yi+1 ) EM for HMMs n༆  For t = 1..T 1)  [E-step] calculate posteriors (soft completions) for each training example i: X X t t t1 pt 1 ( y j | x 1 . . . x m ) c (y ) = p (x1 . . . xm , yj ) c (y, x) = j :y j = y t 0 c (y, y ) = X j :y 0 =yj ,y =yj j :x=xj ,y =yj pt 1 (yj , yj 1 1 | x1 . . . xm ) 2)  [M-step] compute maximum likelihood estimates, given counts ct (yi 1 , yi ) q ( y i | yi 1 ) = t c ( yi 1 ) t ct (y, x) e t (x | y ) = t c (y ) where there is a different HMM for each iteration t: n t t p (x1 . . . xn , y1 . . . yn ) = q (ST OP |yn ) Y i=1 q t ( yi | yi 1 )e t ( x i | yi ) Unsupervised Learning Results n༆  EM for HMM n༆  n༆  Bayesian HMM Learning [Goldwater, Griffiths 07] n༆  n༆  n༆  n༆  Significant effort in specifying prior distriubtions Integrate our parameters e(x|y) and t(y’|y) POS Accuracy: 86.8% Unsupervised, feature rich models [Smith, Eisner 05] n༆  n༆  n༆  n༆  POS Accuracy: 74.7% Challenge: represent p(x,y) as a log-linear model, which requires normalizing over all possible sentences x Smith presents a very clever approximation, based on local neighborhoods of x POS Accuracy: 90.1% Newer, feature rich methods do better, not near supervised SOTA Semi-supervised Learning n༆  n༆  AKA: boot strapping, self training, etc. Task: learn from two types of data n༆  n༆  n༆  n༆  Tagged Sentences Raw / unlabeled sentences Output: a complete POS tagger What should we do? n༆  n༆  n༆  Use labeled data to initialize EM? Sum the counts (real and expected) together? Something fancier? Merialdo: Setup n༆  n༆  Some initial results [Merialdo 94] Setup n༆  n༆  You know the set of possible tags for each word You have k fully labeled training examples n༆  n༆  n༆  Estimate e(x|y) and t(y’|y) on this data Use the supervised model to initialize the EM algorithms, and run it on all of the data Question: Will this work? Merialdo: Results Results Merialdo: Co-Training / Self-Training n༆  n༆  Simple approach, often (but not always) works… Repeat n༆  n༆  n༆  n༆  n༆  Learn N independent classifiers on supervised data Use each classifier to tag new, unlabeled data Select subset of unlabeled data (where models agree and are most confident) and add to labeled data (with au...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online