lecture7-vectorspace-handout-6-per

Introducon to informaon retrieval s ec 716 visualizaon

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: A means for segmen*ng index into two *ers Introduc)on to Informa)on Retrieval Sec. 7.1.5 Introduc)on to Informa)on Retrieval Sec. 7.1.5 Impact ­ordered pos*ngs 1. Early termina*on   We only want to compute scores for docs for which wft,d is high enough   We sort each pos*ngs list by wft,d   Now: not all pos*ngs in a common order!   How do we compute scores in order to pick off top K?   When traversing t s pos*ngs, stop early amer either   a fixed number of r docs   wft,d drops below some threshold   Take the union of the resul*ng sets of docs   One from the pos*ngs of each query term   Compute only the scores for docs in this union   Two ideas follow 5 Introduc)on to Informa)on Retrieval Sec. 7.1.5 Introduc)on to Informa)on Retrieval Sec. 7.1.6 2. idf ­ordered terms Cluster pruning: preprocessing   When considering the pos*ngs of query terms   Look at them in order of decreasing idf   Pick √N docs at random: call these leaders   For every other doc, pre ­compute nearest leader   High idf terms likely to contribute most to score   As we update score contribu*on from each query term   Stop if doc scores rela*vely unchanged   Can apply to cosine or some other net scores Introduc)on to Informa)on Retrieval Sec. 7.1.6 Cluster pruning: query processing   Docs afached to a leader: its followers;   Likely: each leader has ~ √N followers. Introduc)on to Informa)on Retrieval S ec. 7.1.6 Visualiza*on   Process a query as follows:   Given query Q, find its nearest leader L.   Seek K nearest docs from among L’s followers. Query Leader Introduc)on to Informa)on Retrieval Sec. 7.1.6 Introduc)on to Informa)on Retriev...
View Full Document

Ask a homework question - tutors are online