LanguageModels.Spring2011

LanguageModels.Spring2011 - 1 Language models Probabilities...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Language models Probabilities also enter in the very important activity of ranking results that are retrieved by search engines. This is done using a kind of computation which has a number of different theoretical justifications but one of the most interesting of them is something called language modeling. So let‟s look at how language modeling works. Language modeling proposes a ridiculous model of how web pages get to be written. In this model the author of the page proceeds in two steps: first, he or she picks out a topic; second, he or she makes use of a vocabulary or model of language for that topic. A model of language is somewhat different from just a simple vocabulary. A model is a table or list which contains all the words that a person would use when writing about that topic, together with a probability, which is the probability (of course related to the actual observed frequency) that this particular word will show up if a person is writing about that particular topic. So far so good. The next step of language modeling is completely ridiculous. It says that the author sits down to write and the way he writes is to first randomly pick a word from the vocabulary according to the probabilities of that vocabulary and write it down. Then he randomly picks the next word and writes it down, and so forth continuing randomly selecting words from the vocabulary until he has written as much as he intended to. Now, we know that this is not the way real pages are written. If they were written this way nearly all of them would look like nonsense, because the words would be in random order. Nonetheless, thousands of experiments, and the commercial success of the great search engine companies, have proven that thinking about things in this way actually does lead to better searching. Let‟s see how that works. The way we are going to do this is that we are going to take two different language models describing the same very simple language, a language with only four different words, and we‟re then going to calculate the probability of writing a particular sentence (because we don‟t have time to look at an entire page worth of words) according to each of the two models. Comparing those two probabilities helps us to estimate the odds that the sentence has been written from one of the models or has been written from the other one.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/20/2012 for the course 790 373 taught by Professor Boros during the Fall '09 term at Rutgers.

Page1 / 6

LanguageModels.Spring2011 - 1 Language models Probabilities...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online