124.11.lec9

124.11.lec9 - CS 124/LINGUIST 180 From Click to edit Master...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon
Click to edit Master subtitle style 1/10/09 Dan Jurafsky Lecture 9: Information Retrieval II (Ranked Retrieval) Thanks to Chris Manning for these slides from his CS 276 Information Retrieval and Web Search class!
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Slide from Chris Manning's 276 class Outline: Ranked Retrieval The vector space model: tf-idf: tf: Term frequency idf: Inverse document frequency Advanced Issues: Zone indices Query Term Proximity Computing cosine ranking efficiently Combining many different features for ranking: Machine Learning
Background image of page 2
Slide from Chris Manning's 276 class Ranked retrieval Thus far, our queries have all been Boolean. Documents either match or don’t. Good for expert users with precise understanding of their needs and the collection. Also good for applications: Applications can easily consume 1000s of results. Not good for the majority of users. Most users incapable of writing Boolean queries (or they are, but they think it’s too much work). Slide from Chris Manning's 276
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Slide from Chris Manning's 276 class Problem with Boolean search: Boolean queries often result in either too few (=0) or too many (1000s) results. Query 1: “ standard user dlink 650 ” → 200,000 hits Query 2: “ standard user dlink 650 no card found ”: 0 hits It takes a lot of skill to come up with a query that produces a manageable number of hits. AND gives too few; OR gives too many Ch. 6 Slide from Chris Manning's 276
Background image of page 4
Slide from Chris Manning's 276 class Ranked retrieval models Rather than a set of documents satisfying a query expression, in ranked retrieval models , the system returns an ordering over the (top) documents in the collection with respect to a query Free text queries : Rather than a query language of operators and expressions, the user’s query is just one or more words in a human language In principle, there are two separate choices here, but in practice, ranked retrieval 55
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Slide from Chris Manning's 276 class Feast or famine: not a problem in When a system produces a ranked result set, large result sets are not an issue Indeed, the size of the result set is not an issue We just show the top k ( ≈ 10) results We don’t overwhelm the user Premise: the ranking algorithm works Ch. 6 Slide from Chris Manning's 276
Background image of page 6
Slide from Chris Manning's 276 class Scoring as the basis of ranked We wish to return in order the documents most likely to be useful to the searcher How can we rank-order the documents in the collection with respect to a query? Assign a score – say in [0, 1] – to each document This score measures how well document and query “match”. Ch. 6 Slide from Chris Manning's 276
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Slide from Chris Manning's 276 class Query-document matching We need a way of assigning a score to a query/document pair Let’s start with a one-term query If the query term does not occur in the document: score should be 0 The more frequent the query term in the document, the higher the score (should be) We will look at a number of alternatives for this.
Background image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 06/01/2011.

Page1 / 65

124.11.lec9 - CS 124/LINGUIST 180 From Click to edit Master...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online