lecture1-intro-handout-6-per

15 sec 12 introducontoinformaonretrieval documents to

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: llion
1’s.
   For
each
term
t,
we
must
store
a
list
of
all
documents
 that
contain
t.
   matrix
is
extremely
sparse.
   Iden)fy
each
by
a
docID,
a
document
serial
number
   Can
we
use
fixed‐size
arrays
for
this?
   What’s
a
beber
representa)on?
   We
only
record
the
1
posi)ons.
 Brutus 1 Caesar 1 Calpurnia 2 2 2 31 4 11 31 45 173 174 4 5 6 16 57 132 54 101 What happens if the word Caesar is added to document 14? 13
 Sec. 1.2 Introduc)on to Informa)on Retrieval Inverted
index
   On
disk,
a
con)nuous
run
of
pos)ngs
is
normal
and
best
   In
memory,
can
use
linked
lists
or
variable
length
arrays
   Some
tradeoffs
in
size/ease
of
inser)on
 Caesar Calpurnia Dictionary Sec. 1.2 Introduc)on to Informa)on Retrieval Inverted
index
construc)on
   We
need
variable‐size
pos)ngs
lists
 Brutus 14
 1 1 2 2 2 31 Pos)ng 4 11 31 45 173 174 4 5 6 16 57 132 Friends, Romans, countrymen. Tokenizer Token stream More on these later. Friends Romans Countrymen Linguistic modules friend Modified tokens 54 101 roman countryman Indexer friend Postings Sorted by docID (more later on why). 15
 Sec. 1.2 Introduc)on to Informa)on Retrie...
View Full Document

This document was uploaded on 02/26/2014.

Ask a homework question - tutors are online