lecture18-learning-ranking-handouts-6-per

lecture18-learning-ranking-handouts-6-per -...

Info icon This preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
5/30/11 1 Introduc)on to Informa(on Retrieval CS276: Informa)on Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 19: Learning to Rank Introduc)on to Informa)on Retrieval Final exam Tue Jun 7 th from 12:15‐3:15pm; Gates B01, B03 Alternate ±nal Friday Jun 3 rd 12:15‐3:15pm; Gates 260 Open book/notes Devices ok provided they’re not connected to other devices, or the internet Calculator recommended Introduc)on to Informa)on Retrieval Machine learning for IR ranking? We’ve looked at methods for ranking documents in IR Cosine similarity, inverse document frequency, pivoted document length normaliza)on, Pagerank, … How do we combine these signals into a good ranker? We’ve looked at methods for classifying documents using supervised machine learning classi±ers Naïve Bayes, Rocchio, kNN, SVMs Surely we can also use machine learning to ±gure out how to combine scoring/ranking signals? A.k.a. “machine‐learned relevance” or “learning to rank” Sec. 15.4 Introduc)on to Informa)on Retrieval Machine learning for IR ranking This “good idea” has been ac)vely researched – and ac)vely deployed by the major web search engines – in the last 5 years Why didn’t it happen earlier? Modern supervised ML has been around for about 15 years… Naïve Bayes has been around for about 45 years… Introduc)on to Informa)on Retrieval Why weren’t early a²empts very successful/ influen)al? Limited training data Especially for real world use (as opposed to wri)ng academic papers³, it was very hard to gather test collec)on queries and relevance judgments that are representa)ve of real user needs and judgments on documents returned This has changed, both in academia and industry Poor machine learning techniques Insufficient customiza)on to IR problem Not enough features for ML to show value The Web provided impetus with constantly evolving spam Introduc)on to Informa)on Retrieval Why wasn’t ML much needed? Tradi)onal ranking func)ons in IR used a very small number of features, e.g., Term frequency Inverse document frequency Document length It was easy to tune weigh)ng coefficients by hand And people did
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
5/30/11 2 Introduc)on to Informa)on Retrieval Why is ML needed now Modern systems – especially on the Web – use a great number of features: Arbitrary useful features – not a single uniFed model Log frequency of query word in anchor text? Query word in color on page? # of images on page? # of (out) links on page? PageRank of page? URL length? URL contains “~”? Page edit recency? Page length? Major web search engines publicly state that they use “hundreds” of such features – and they keep changing Introduc)on to Informa)on Retrieval Simple example Consider the presence of query terms in the Title (T) and the Body (B) of a document Boolean indicator (0/1) of whether the query term occurs in the Title (s T ) or Body (s
Image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern