lecture7-vectorspace-handout-6-per

# Ecient ranking compung a single cosine eciently

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: cosine scores Introduc)on to Informa)on Retrieval Sec. 7.1 Eﬃcient cosine ranking   Find the K docs in the collec*on “nearest” to the query ⇒ K largest query ­doc cosines.   Eﬃcient ranking:   Compu*ng a single cosine eﬃciently.   Choosing the K largest cosine values eﬃciently.   Can we do this without compu*ng all N cosines? Introduc)on to Informa)on Retrieval Sec. 7.1 Introduc)on to Informa)on Retrieval Sec. 7.1 Eﬃcient cosine ranking Special case – unweighted queries   What we’re doing in eﬀect: solving the K ­nearest neighbor problem for a query vector   In general, we do not know how to do this eﬃciently for high ­dimensional spaces   But it is solvable for short queries, and standard indexes support this well   No weigh*ng on query terms   Assume each query term occurs only once   Then for ranking, don t need to normalize query vector   Slight simpliﬁca*on of algorithm from Lecture 6 Introduc)on to Informa)on Retrieval Compu*ng the K largest cosines: selec*on vs. sor*ng Sec. 7.1   Typically we want to retrieve the top K docs (in the cosine ranking for the query)   not to totally order all docs in the collec*on   Can we pick oﬀ docs with K highest cosines?   Let J = number of docs with nonzero cosines   We seek the K best of these J Introduc)on to Informa)on Retrieval S ec. 7.1 Use heap for selec*ng top K   Binary tree in which each node’s value > the values of children   Takes 2J opera*ons to construct, then each of K “winners” read oﬀ in 2log J steps.   For J=1M, K=100, this is about 10% of the cost of sor*ng. 1 .9 .3 .3 .8 .1 .1 2 Introduc)on to Informa)on Retrieval Sec. 7.1.1 Introduc)on to Informa)on Retrieval Boflenecks Cosine similarity is only a proxy   Primary computa*onal bofleneck in scoring: cosine c...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online