Unformatted text preview: A means for segmen*ng index into two *ers Introduc)on to Informa)on Retrieval Sec. 7.1.5 Introduc)on to Informa)on Retrieval Sec. 7.1.5 Impact ­ordered pos*ngs 1. Early termina*on   We only want to compute scores for docs for which wft,d is high enough   We sort each pos*ngs list by wft,d   Now: not all pos*ngs in a common order!   How do we compute scores in order to pick oﬀ top K?   When traversing t s pos*ngs, stop early amer either   a ﬁxed number of r docs   wft,d drops below some threshold   Take the union of the resul*ng sets of docs   One from the pos*ngs of each query term   Compute only the scores for docs in this union   Two ideas follow 5 Introduc)on to Informa)on Retrieval Sec. 7.1.5 Introduc)on to Informa)on Retrieval Sec. 7.1.6 2. idf ­ordered terms Cluster pruning: preprocessing   When considering the pos*ngs of query terms   Look at them in order of decreasing idf   Pick √N docs at random: call these leaders   For every other doc, pre ­compute nearest leader   High idf terms likely to contribute most to score   As we update score contribu*on from each query term   Stop if doc scores rela*vely unchanged   Can apply to cosine or some other net scores Introduc)on to Informa)on Retrieval Sec. 7.1.6 Cluster pruning: query processing   Docs afached to a leader: its followers;   Likely: each leader has ~ √N followers. Introduc)on to Informa)on Retrieval S ec. 7.1.6 Visualiza*on   Process a query as follows:   Given query Q, ﬁnd its nearest leader L.   Seek K nearest docs from among L’s followers. Query Leader Introduc)on to Informa)on Retrieval Sec. 7.1.6 Introduc)on to Informa)on Retriev...
