lecture15-webchar-handout-6-per

Lecture15-webchar-handout-6-per

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 
more
 on
these
later
   Domain flooding:
numerous
domains
that
point
or
re‐ direct
to
a
target
page
   
Robots
   Votes
from
authors
(linkage
 signals)
   Votes
from
users
(usage
signals)
   
Policing
of
URL
submissions
   An)
robot
test

   
Limits
on
meta‐keywords
   
Robust
link
analysis
   Ignore
sta)s)cally
implausible
 linkage
(or
text)
   Use
link
analysis
to
detect
 spammers
(guilt
by
associa)on)
   Fake
query
stream
–
rank
checking
programs
   “Curve‐fit”
ranking
programs
of
search
engines
   Millions
of
submissions
via
Add‐Url
 17
   Spam
recogni)on
by
 machine
learning
   Training
set
based
on
known
 spam
   Family
friendly
filters
   Linguis)c
analysis,
general
 classifica)on
techniques,
etc.
   For
images:
flesh
tone
 detectors,
source
text
analysis,
 etc.
   Editorial
interven)on
         Blacklists
 Top
queries
audited
 Complaints
addressed
 Suspect
papern
detec)on
 18
 3 Introduc)on to Informa)on Retrieval Introduc)on to Informa)on Retrieval More
on
spam
   Web
search
engines
have
policies
on
SEO
prac)ces
 they
tolerate/block
   hpp://help.yahoo.com/help/us/ysearch/index.html

   hpp://www.google.com/intl/en/webmasters/

   Adversarial
IR:
the
unending
(technical)
baple
 between
SEO’s
and
web
search
engines
   Research

hpp://airweb.cse.lehigh.edu/
 SIZE
OF
THE
WEB
 19
 Introduc)on to Informa)on Retrieval Sec. 19.5 What
is
the
size
of
the
web
?
 20
 Sec. 19.5 Introduc)on to Informa)on Retrieval What
can
we
apempt
to
measure?
   Issues
  The...
View Full Document

This document was uploaded on 02/26/2014.

Ask a homework question - tutors are online