lecture7-vectorspace-handout-6-per

712 generic approach index eliminaon find a set

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: omputa*on   Can we avoid all this computa*on?   Yes, but may some*mes get it wrong   a doc not in the top K may creep into the list of K output docs   Is this such a bad thing?         Sec. 7.1.1 Introduc)on to Informa)on Retrieval Sec. 7.1.1 User has a task and a query formula*on Cosine matches docs to query Thus cosine is anyway a proxy for user happiness If we get a list of K docs close to the top K by cosine measure, should be ok Introduc)on to Informa)on Retrieval Sec. 7.1.2 Generic approach Index elimina*on   Find a set A of contenders, with K < |A| << N   Basic algorithm cosine computa*on algorithm only considers docs containing at least one query term   Take this further:   A does not necessarily contain the top K, but has many docs from among the top K   Return the top K docs in A   Think of A as pruning non ­contenders   The same approach is also used for other (non ­ cosine) scoring func*ons   Will look at several schemes following this approach Introduc)on to Informa)on Retrieval Sec. 7.1.2   Only consider high ­idf query terms   Only consider docs containing many query terms Introduc)on to Informa)on Retrieval Sec. 7.1.2 High ­idf query terms only Docs containing many query terms   For a query such as catcher in the rye   Only accumulate scores from catcher and rye   Intui*on: in and the contribute lifle to the scores and so don t alter rank ­ordering much   Benefit:   Any doc with at least one query term is a candidate for the top K output list   For mul* ­term queries, only compute scores for docs containing several of the query terms   Pos*ngs of low ­idf terms hav...
View Full Document

Ask a homework question - tutors are online