spam - CS345 Data Mining Web Spam Detection Economic...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon
CS345 Data Mining Web Spam Detection
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Economic considerations b Search has become the default gateway to the web b Very high premium to appear on the first page of search results s e.g., e-commerce sites s advertising-driven sites
Background image of page 2
What is web spam? b Spamming = any deliberate action solely in order to boost a web page’s position in search engine results, incommensurate with page’s real value b Spam = web pages that are the result of spamming b This is a very broad defintion s SEO industry might disagree! s SEO = search engine optimization b Approximately 10-15% of web pages are spam
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Web Spam Taxonomy b We follow the treatment by Gyongyi and Garcia-Molina [2004] b Boosting techniques s Techniques for achieving high relevance/importance for a web page b Hiding techniques s Techniques to hide the use of boosting b From humans and web crawlers
Background image of page 4
Boosting techniques b Term spamming s Manipulating the text of web pages in order to appear relevant to queries b Link spamming s Creating link structures that boost page rank or hubs and authorities scores
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
b Repetition s of one or a few specific terms e.g., free, cheap, viagra s Goal is to subvert TF.IDF ranking schemes b Dumping s of a large number of unrelated terms s e.g., copy entire dictionaries b Weaving s Copy legitimate pages and insert spam terms at
Background image of page 6
Image of page 7
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 22

spam - CS345 Data Mining Web Spam Detection Economic...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online