{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

LanguageModels.Spring2011

# LanguageModels.Spring2011 - 1 Language models Probabilities...

This preview shows pages 1–2. Sign up to view the full content.

1 Language models Probabilities also enter in the very important activity of ranking results that are retrieved by search engines. This is done using a kind of computation which has a number of different theoretical justifications but one of the most interesting of them is something called language modeling. So let‟s look at how language modeling works. Language modeling proposes a ridiculous model of how web pages get to be written. In this model the author of the page proceeds in two steps: first, he or she picks out a topic; second, he or she makes use of a vocabulary or model of language for that topic. A model of language is somewhat different from just a simple vocabulary. A model is a table or list which contains all the words that a person would use when writing about that topic, together with a probability, which is the probability (of course related to the actual observed frequency) that this particular word will show up if a person is writing about that particular topic. So far so good. The next step of language modeling is completely ridiculous. It says that the author sits down to write and the way he writes is to first randomly pick a word from the vocabulary according to the probabilities of that vocabulary and write it down. Then he randomly picks the next word and writes it down, and so forth continuing randomly selecting words from the vocabulary until he has written as much as he intended to. Now, we know that this is not the way real pages are written. If they were written this way nearly all of them would look like nonsense, because the words would be in random order. Nonetheless, thousands of experiments, and the commercial success of the great search engine companies, have proven that thinking about things in this way actually does lead to better sear ching. Let‟s see how that works. The way we are going to do this is that we are going to take two different language models describing the same very simple language, a language with only four different words, and we‟re then going to calculate the probability of writing a particular sentence (because we don‟t have time to look at an entire page worth of words) according to each of the two models. Comparing those two probabilities helps us to estimate the odds that the sentence has been written from one of the models or has been written from the other one.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern