IR-part1

18 introducon to informaon retrieval sec 12

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: half- a- trillion 0’s and 1’s. §༊  But it has no more than one billion 1’s. Why? §༊  matrix is extremely sparse. §༊  What’s a be_er representa*on? §༊  We only record the 1 posi*ons. 15 Introduc)on to Informa)on Retrieval Introduc*on to Informa(on Retrieval Term- document incidence matrices Introduc)on to Informa)on Retrieval Introduc*on to Informa(on Retrieval The Inverted Index The key data structure underlying modern IR Introduc)on to Informa)on Retrieval Sec. 1.2 Inverted index §༊  For each term t, we must store a list of all documents that contain t. §༊  Iden*fy each doc by a docID, a document serial number §༊  Can we used fixed- size arrays for this? Brutus 1 Caesar 1 Calpurnia 2 2 2 31 4 11 31 45 173 174 4 5 6 16 57 132 54 101 What happens if the word Caesar is added to document 14? 18 Introduc)on to Informa)on Retrieval Sec. 1.2 Inverted index §༊  We need variable- size pos*ngs lists §༊  On disk, a con*nuous run of pos*ngs is normal and best §༊  In memory, can use linked lis...
View Full Document

Ask a homework question - tutors are online