lecture15-webchar-handout-6-per

Forimageseshtone detectorssourcetextanalysis etc

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: religious,
lobbies
   Promo)on
funded
by
adver)sing
budget
   SEOs
responded
with
dense
repe))ons
of
chosen
terms
   Operators
   e.g.,
maui resort maui resort maui resort   O_en,
the
repe))ons
would
be
in
the
same
color
as
the
 background
of
the
web
page
   Contractors
(Search
Engine
Op)mizers)
for
lobbies,
companies
   Web
masters
   Hos)ng
services
   Repeated
terms
got
indexed
by
crawlers
   But
not
visible
to
humans
on
browsers
   Forums
   E.g.,
Web
master
world
(
www.webmasterworld.com
)
   Search
engine
specific
tricks

   Discussions
about
academic
papers


 Pure word density cannot be trusted as an IR signal 13
 Introduc)on to Informa)on Retrieval Sec. 19.2.2 14
 Sec. 19.2.2 Introduc)on to Informa)on Retrieval Variants
of
keyword
stuffing
 Cloaking
   Misleading
meta‐tags,
excessive
repe))on
   Hidden
text
with
colors,
style
sheet
tricks,
etc.
   Serve
fake
content
to
search
engine
spider
   DNS
cloaking:
Switch
IP
address.
Impersonate

 SPAM Meta-Tags = “… London hotels, hotel, holiday inn, hilton, discount, booking, reservation, sex, mp3, britney spears, , …” N Is this a Search Engine spider? Cloaking Y Real Doc 15
 Introduc)on to Informa)on Retrieval Sec. 19.2.2 16
 Introduc)on to Informa)on Retrieval More
spam
techniques
 The
war
against
spam
   Doorway
pages
   Quality
signals
‐
Prefer
 authorita)ve
pages
based
 on:
   Pages
op)mized
for
a
single
keyword
that
re‐direct
to
the
 real
target
page
   Link
spamming
   Mutual
admira)on
socie)es,
hidden
links,
awards
...
View Full Document

Ask a homework question - tutors are online