IR-part1

64 introducon to informaon retrieval evidence

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Issues for biword indexes §༊  False posi*ves, as noted before §༊  Index blowup due to bigger dic*onary §༊  Infeasible for more than biwords, big even for them §༊  Biword indexes are not the standard solu*on (for all biwords) but can be part of a compound strategy Introduc)on to Informa)on Retrieval Sec. 2.4.2 Solu*on 2: Posi*onal indexes §༊  In the pos*ngs, store, for each term the posi*on(s) in which tokens of it appear: <term, number of docs containing term; doc1: posi*on1, posi*on2 … ; doc2: posi*on1, posi*on2 … ; etc.> Introduc)on to Informa)on Retrieval S ec. 2.4.2 Posi*onal index example <be: 993427; 1: 7, 18, 33, 72, 86, 231; 2: 3, 149; 4: 17, 191, 291, 430, 434; 5: 363, 367, …> Which of docs 1,2,4,5 could contain “to be or not to be”? §༊  For phrase queries, we use a merge algorithm recursively at the document level §༊ ...
View Full Document

This document was uploaded on 02/14/2014.

Ask a homework question - tutors are online