lecture15-webchar-handout-6-per

196 40 introducontoinformaonretrieval sec 196

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview:     Near
duplicates
 Frames
 Redirects
 Engine
)me‐outs
 Is
8‐word
query
good
enough?
 27
 Introduc)on to Informa)on Retrieval Sec. 19.5 Advantages
&
disadvantages
 Introduc)on to Informa)on Retrieval Sec. 19.5 Random
searches
   Sta)s)cally
sound
under
the
induced
weight.
   Biases
induced
by
random
query

         28
 Query
Bias:
Favors
content‐rich
pages
in
the
language(s)
of
the
lexicon
 Ranking
Bias:
Solu)on:
Use
conjunc)ve
queries
&
fetch
all
 Checking
Bias:
Duplicates,
impoverished
pages
omiped
 Document
or
query
restric)on
bias:
engine
might
not
deal
properly
 with
8
words
conjunc)ve
query
   Choose
random
searches
extracted
from
a
local
log
 [Lawrence
&
Giles
97]
or
build
“random
 searches”
[Notess]
   Use
only
queries
with
small
result
sets.

   Count
normalized
URLs
in
result
sets.
   Use
ra)o
sta)s)cs
   Malicious
Bias:
Sabotage
by
engine


   Opera)onal
Problems:
Time‐outs,
failures,
engine
inconsistencies,
 index
modifica)on.
 29
 30
 5 Sec. 19.5 Introduc)on to Informa)on Retrieval Advantages
&
disadvantages
 Introduc)on to Informa)on Retrieval Sec. 19.5 Random
searches
   575
&
1050
queries
from
the
NEC
RI
employee
logs
   6
Engines
in
1998,
11
in
1999
   Implementa)on:
   Restricted
to
queries
with
<
600
results
in
total
   Counted
URLs
from
each
engine
a_er
verifying
query
 match
   Computed
size
ra)o
&...
View Full Document

This document was uploaded on 02/26/2014.

Ask a homework question - tutors are online