p111-lafferty[1]

p111-lafferty[1] - Document Language Models Query Models...

Info icon This preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Document Language Models, Query Models, and Risk Minimization for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Chengxiang Zhai School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 ABSTRACT We present a framework for information retrieval that com- bines document models and query models using a proba- bilistic ranking function based on Bayesian decision theory. The framework suggests an operational retrieval model that extends recent developments in the language modeling ap- proach to information retrieval. A language model for each document is estimated, as well as a language model for each query, and the retrieval problem is cast in terms of risk min- imization. The query language model can be exploited to model user preferences, the context of a query, synonomy and word senses. While recent work has incorporated word translation models for this purpose, we introduce a new method using Markov chains defined on a set of documents to estimate the query models. The Markov chain method has connections to algorithms from link analysis and social networks. The new approach is evaluated on TREC col- lections and compared to the basic language modeling ap- proach and vector space models together with query expan- sion using Rocchio. Significant improvements are obtained over standard query expansion methods for strong baseline TF-IDF systems, with the greatest improvements attained for short queries on Web data. 1. INTRODUCTION The language modeling approach to information retrieval has recently been proposed as a new alternative to tradi- tional vector space models and other probabilistic models. In the use of language modeling by Ponte and Croft [17], a unigram language model is estimated for each document, and the likelihood of the query according to this model is used to score the document for ranking. Miller et al. [15] smooth the document language model with a background model using hidden Markov model techniques, and demon- strate good performance on TREC benchmarks. Berger and Lafferty [1] use methods from statistical machine transla- tion to incorporate synonomy into the document language Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGIR’01, September 9-12, 2001, New Orleans, Louisiana, USA Copyright 2001 ACM 1-58113-331-6/01/0009 ... $ 5.00. model, achieving effects similar to query expansion in more standard approaches to IR. The relative simplicity and effec- tiveness of the language modeling approach, together with the fact that it leverages statistical methods that have been developed in speech recognition and other areas, makes it an
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern