Unformatted text preview: e [Knight 1999] •  Two standard ways to make the search more efficient •  Pruning the search space •  Recombining similar hypotheses 66 Machine Translation Phrase-Based Translation Machine Translation Evaluating MT: BLEU Dan Jurafsky Evalua$ng MT: Using human evaluators •  Fluency: How intelligible, clear, readable, or natural in the target language is the transla1on? •  Fidelity: Does the transla1on have the same meaning as the source? •  Adequacy: Does the transla1on convey the same informa1on as source? •  Bilingual judges given source and target language, assign a score •  Monolingual judges given reference transla1on and MT result. •  Informa$veness: Does the transla1on convey enough informa1on as the source to perform a task? •  What % of ques1ons can monolingual judges answer correctly about the source sentence given only the transla1on. Dan Jurafsky Automa$c Evalua$on of MT George A. Miller and J. G. Beebe ­Center. 1958. Some Psychological Methods for Evalua1ng the Quality of Transla1ons. Mechanical Transla1on 3:73 ­80. •  Human evalua1on is expensive and very slow •  Need an evalua1on metric that takes seconds, not months •  Intui1on: MT is good if it looks like a human transla1on 1.  Collect one or more human reference transla)ons of the source. 2.  Score MT output based on its similarity to the reference transla1ons. •  •  •  •  BLEU NIST TER METEOR Dan Jurafsky BLEU (Bilingual Evalua$on Understudy) Kishore Papineni, Salim Roukos, Todd Ward and Wei ­Jing Zhu. 2002. BLEU: A method for automa1c evalua1on of machine transla1on. Proceedings of ACL 2002. •  “n ­gram precision” •  Ra1o of correct n ­grams to the total number of output n ­grams •  Correct: Number of n ­grams (unigram, bigram, etc.) the MT output shares with the reference transla1ons. •  Total: Number of...
