lecture18-learning-ranking-handouts-6-per

lecture18-learning-ranking-handouts-6-per -...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
5/30/11 1 Introduc)on to Informa(on Retrieval CS276: Informa)on Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 19: Learning to Rank Introduc)on to Informa)on Retrieval Final exam Tue Jun 7 th from 12:15‐3:15pm; Gates B01, B03 Alternate ±nal Friday Jun 3 rd 12:15‐3:15pm; Gates 260 Open book/notes Devices ok provided they’re not connected to other devices, or the internet Calculator recommended Introduc)on to Informa)on Retrieval Machine learning for IR ranking? We’ve looked at methods for ranking documents in IR Cosine similarity, inverse document frequency, pivoted document length normaliza)on, Pagerank, … How do we combine these signals into a good ranker? We’ve looked at methods for classifying documents using supervised machine learning classi±ers Naïve Bayes, Rocchio, kNN, SVMs Surely we can also use machine learning to ±gure out how to combine scoring/ranking signals? A.k.a. “machine‐learned relevance” or “learning to rank” Sec. 15.4 Introduc)on to Informa)on Retrieval Machine learning for IR ranking This “good idea” has been ac)vely researched – and ac)vely deployed by the major web search engines – in the last 5 years Why didn’t it happen earlier? Modern supervised ML has been around for about 15 years… Naïve Bayes has been around for about 45 years… Introduc)on to Informa)on Retrieval Why weren’t early a²empts very successful/ influen)al? Limited training data Especially for real world use (as opposed to wri)ng academic papers³, it was very hard to gather test collec)on queries and relevance judgments that are representa)ve of real user needs and judgments on documents returned This has changed, both in academia and industry Poor machine learning techniques Insufficient customiza)on to IR problem Not enough features for ML to show value The Web provided impetus with constantly evolving spam Introduc)on to Informa)on Retrieval Why wasn’t ML much needed? Tradi)onal ranking func)ons in IR used a very small number of features, e.g., Term frequency Inverse document frequency Document length It was easy to tune weigh)ng coefficients by hand And people did
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
5/30/11 2 Introduc)on to Informa)on Retrieval Why is ML needed now Modern systems – especially on the Web – use a great number of features: Arbitrary useful features – not a single uniFed model Log frequency of query word in anchor text? Query word in color on page? # of images on page? # of (out) links on page? PageRank of page? URL length? URL contains “~”? Page edit recency? Page length? Major web search engines publicly state that they use “hundreds” of such features – and they keep changing Introduc)on to Informa)on Retrieval Simple example Consider the presence of query terms in the Title (T) and the Body (B) of a document Boolean indicator (0/1) of whether the query term occurs in the Title (s T ) or Body (s
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 8

lecture18-learning-ranking-handouts-6-per -...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online