lecture7-vectorspace-handout-6-per

Ecient ranking compung a single cosine eciently

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: cosine scores Introduc)on to Informa)on Retrieval Sec. 7.1 Efficient cosine ranking   Find the K docs in the collec*on “nearest” to the query ⇒ K largest query ­doc cosines.   Efficient ranking:   Compu*ng a single cosine efficiently.   Choosing the K largest cosine values efficiently.   Can we do this without compu*ng all N cosines? Introduc)on to Informa)on Retrieval Sec. 7.1 Introduc)on to Informa)on Retrieval Sec. 7.1 Efficient cosine ranking Special case – unweighted queries   What we’re doing in effect: solving the K ­nearest neighbor problem for a query vector   In general, we do not know how to do this efficiently for high ­dimensional spaces   But it is solvable for short queries, and standard indexes support this well   No weigh*ng on query terms   Assume each query term occurs only once   Then for ranking, don t need to normalize query vector   Slight simplifica*on of algorithm from Lecture 6 Introduc)on to Informa)on Retrieval Compu*ng the K largest cosines: selec*on vs. sor*ng Sec. 7.1   Typically we want to retrieve the top K docs (in the cosine ranking for the query)   not to totally order all docs in the collec*on   Can we pick off docs with K highest cosines?   Let J = number of docs with nonzero cosines   We seek the K best of these J Introduc)on to Informa)on Retrieval S ec. 7.1 Use heap for selec*ng top K   Binary tree in which each node’s value > the values of children   Takes 2J opera*ons to construct, then each of K “winners” read off in 2log J steps.   For J=1M, K=100, this is about 10% of the cost of sor*ng. 1 .9 .3 .3 .8 .1 .1 2 Introduc)on to Informa)on Retrieval Sec. 7.1.1 Introduc)on to Informa)on Retrieval Boflenecks Cosine similarity is only a proxy   Primary computa*onal bofleneck in scoring: cosine c...
View Full Document

Ask a homework question - tutors are online