Unformatted text preview: get P(F | E), we sum over all alignments: j ! P ( F | E ) = ! P ( F, A | E ) = ! ( I + 1)J A A J " t ( f j , ea ) j j =1 Dan Jurafsky Decoding for IBM Model 1 •  Goal is to ﬁnd the most probable alignment given a parameterized model. ˆ A = argmax P( F , A | E ) A J P( J | E ) = argmax t ( f j , ea j ) J! ( I + 1) j=1 A J = argmax ! t ( f j , ea j ) A j =1 Since transla1on choice for each posi1on j is independent, the product is maximized by maximizing each term: a j = argmax t ( f j , ei ) 0≤i ≤ I 1≤ j ≤ J Machine Translation Alignment and IBM Model 1 Machine Translation Learning Word Alignments in IBM Model 1 Dan Jurafsky Word Alignment •  Given a pair of sentences (one English, one French) •  Learn which English words align to which French words •  Method: IBM Model 1 •  An itera1ve unsupervised algorithm •  The EM (Expecta1on ­Maximiza1on) algorithm Dan Jurafsky EM for training alignment probabili\$es Kevin Knight’s example … la maison … la maison bleue … la ﬂeur … … the house … the blue house … the ﬂower … Ini1al stage: •  All word alignments equally likely •  All P(french ­word | english ­word) equally likely Dan Jurafsky EM for training alignment probabili\$es Kevin Knight’s example … la maison … la maison bleue … la ﬂeur … … the house … the blue house … the ﬂower … “la” and “the” observed to co ­occur frequently, so P(la | the) is increased. Dan Jurafsky EM for training alignment probabili\$es Kevin Knight’s example … la maison … la maison bleue … la ﬂeur … … the house … the blue house … the ﬂower … “house” co ­occurs with both “la” and “maison”, •  but P( maison | house) can be raised without limit, to 1.0 •  while P( la | house) is limited because of “the” •  (pigeonhole principle) Dan Jurafsky EM for training alignment probabili\$es Kevin Knight’s example … la maison … la maison bleue … la ﬂeur … … the house … the blue house … the ﬂower … settling down after another iteration Dan Jurafsky EM for training alignment prob...
