lecture2-dictionary-handout-6-per

31 11 1 sec 23 wheredoweplaceskips 128 41 2

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 
query
contains
automobile,
look
under
car
as
well
   What
about
spelling
mistakes?
   Poten)ally
more
powerful,
but
less
efficient
   One
approach
is
soundex,
which
forms
equivalence
classes
 of
words
based
on
phone)c
heuris)cs
   More
in
lectures
3
and
9
 19
 Introduc)on to Informa)on Retrieval Sec. 2.2.4 20
 Sec. 2.2.4 Introduc)on to Informa)on Retrieval Lemma)za)on
 Stemming
   Reduce
inflec)onal/variant
forms
to
base
form
   E.g.,
   Reduce
terms
to
their
“roots”
before
indexing
   “Stemming”
suggest
crude
affix
chopping
   am, are,
is →
be
   language
dependent
   e.g.,
automate(s), automa?c, automa?on
all
reduced
to
 automat.
   car, cars, car's,
cars'
→
car   the boy's cars are different colors
→
the boy car be different color   Lemma)za)on
implies
doing
“proper”
reduc)on
to
 dic)onary
headword
form
 for example compressed and compression are both accepted as equivalent to compress. for exampl compress and compress ar both accept as equival to compress 21 Introduc)on to Informa)on Retrieval Sec. 2.2.4 22
 Introduc)on to Informa)on Retrieval Porter’s
algorithm
 Typical
rules
in
Porter
   Commonest
algorithm
for
stemming
English
         Sec. 2.2.4   Results
suggest
it’s
at
least
as
good
as
other
stemming
 op)ons
   Conven)ons
+
5
phases
of
reduc)ons
   phases
applied
sequen)ally
   each
phase
consists
of
a
set
of
commands
   sample
conven)on:
Of the rules in a compound command, select the one that applies to the longest suffix. sses
→
ss ies
→
i a)onal
→
ate )onal
→
)on   Rules
sensi)ve
to
the measure
of
words   
 (m>1) EMENT →
   replacement
→
replac   cement 
→
cement 23
 24
 4 Sec. 2.2.4 Introdu...
View Full Document

Ask a homework question - tutors are online