spelling

Acres 19 12874 0000318463 dan jurafsky channel model

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ce 2 •  Also allow inser/on of space or hyphen •  thisidea  this idea! •  inlaw  in-law! 17 Dan Jurafsky Language Model •  Use any of the language modeling algorithms we’ve learned •  Unigram, bigram, trigram •  Web ­scale spelling correc/on •  Stupid backoff 18 Dan Jurafsky Unigram Prior probability Counts from 404,253,213 words in Corpus of Contemporary English (COCA) word actress Frequency of word P(word) 9,321! .0000230573! cress 220! .0000005442! caress 686! .0000016969! access 37,038! .0000916207! across 120,844! .0002989314! acres 19 12,874! .0000318463! Dan Jurafsky Channel model probability •  Error model probability, Edit probability •  Kernighan, Church, Gale 1990 •  Misspelled word x = x1, x2, x3… xm •  Correct word w = w1, w2, w3,…, wn •  P(x|w) = probability of the edit 20 •  (dele/on/inser/on/subs/tu/on/transposi/on) Dan Jurafsky Compu'ng error probability: confusion matrix...
View Full Document

This document was uploaded on 02/14/2014.

Ask a homework question - tutors are online