This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Query Chains: Learning to Rank from Implicit Feedback Filip Radlinski Department of Computer Science Cornell University Ithaca, NY, USA email@example.com Thorsten Joachims Department of Computer Science Cornell University Ithaca, NY, USA firstname.lastname@example.org ABSTRACT This paper presents a novel approach for using clickthrough data to learn ranked retrieval functions for web search re- sults. We observe that users searching the web often perform a sequence, or chain, of queries with a similar information need. Using query chains, we generate new types of prefer- ence judgments from search engine logs, thus taking advan- tage of user intelligence in reformulating queries. To validate our method we perform a controlled user study comparing generated preference judgments to explicit relevance judg- ments. We also implemented a real-world search engine to test our approach, using a modified ranking SVM to learn an improved ranking function from preference data. Our results demonstrate significant improvements in the ranking given by the search engine. The learned rankings outper- form both a static ranking function, as well as one trained without considering query chains. Categories and Subject Descriptors H.3.3 [ Information Storage and Retrieval ]: Information Search and Retrieval General Terms Algorithms, Experimentation, Measurement Keywords Search Engines, Implicit Feedback, Machine Learning, Sup- port Vector Machines, Clickthrough Data 1. INTRODUCTION Designing effective ranking functions for free text retrieval has proved notoriously difficult. Retrieval functions designed for one collection and application often do not work well on other collections without additional time consuming modi- fications. This has led to interest in using machine learning methods for automatically learning ranked retrieval func- tions. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD05, August 2124, 2005, Chicago, Illinois, USA. Copyright 2005 ACM 1-59593-135-X/05/0008 ... $ 5.00. For this learning task, training data can be collected in two ways. One approach relies on actively soliciting training data by recording user queries and then asking users to ex- plicitly provide relevance judgments on retrieved documents (such as [7, 13, 22]). Few users are willing to do this, making significant amounts of such data difficult to obtain. An al- ternative approach is to extract implicit relevance feedback from search engine log files (such as in [6, 15]). This allows virtually unlimited data to be collected at very low cost, although interpretation is more complex....
View Full Document
This note was uploaded on 08/06/2008 for the course CSE 450 taught by Professor Davison during the Spring '08 term at Lehigh University .
- Spring '08
- Computer Science