lecture1-intro-handout-6-per

Avg6byteswordincludingspacespunctuaon 11

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: that
is
 relevant
to
the
user’s
informa)on
need
and
helps
the
 user
complete
a
task
 TASK Misconception? Info about removing mice without killing them Info Need Mistranslation? Verbal form Misformulation? Query mouse trap SEARCH ENGINE 9
 Introduc)on to Informa)on Retrieval Sec. 1.1 Query Refinement Results Introduc)on to Informa)on Retrieval Corpus Sec. 1.1 How
good
are
the
retrieved
docs?
 Bigger
collec)ons
   Precision :
Frac)on
of
retrieved
docs
that
are
 relevant
to
user’s
informa)on
need
   Recall
:
Frac)on
of
relevant
docs
in
collec)on
that
are
 retrieved
   More
precise
defini)ons
and
measurements
to
 follow
in
later
lectures
   Consider
N =
1
million
documents,
each
with
about
 1000
words.
   Avg
6
bytes/word
including
spaces/punctua)on

 11
   6GB
of
data
in
the
documents.
   Say
there
are
M =
500K
dis)nct
terms
among
these.
 12
 2 Sec. 1.1 Introduc)on to Informa)on Retrieval Sec. 1.2 Introduc)on to Informa)on Retrieval Can’t
build
the
matrix
 Inverted
index
   500K
x
1M
matrix
has
half‐a‐trillion
0’s
and
1’s.
 Why?   But
it
has
no
more
than
one
bi...
View Full Document

This document was uploaded on 02/26/2014.

Ask a homework question - tutors are online