webMiningOverview

webMiningOverview - CS 345 Data Mining Lecture 1...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon
CS 345 Data Mining Lecture 1 Introduction to Web Mining
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
What is Web Mining? b Discovering useful information from the World-Wide Web and its usage patterns b Applications s Web search e.g., Google, Yahoo,… s Vertical Search e.g., FatLens, Become,… s Recommendations e.g., Amazon.com s Advertising e.g., Google, Yahoo s Web site design e.g., landing page optimization
Background image of page 2
How does it differ from “classical” Data Mining? b The web is not a relation s Textual information and linkage structure b Usage data is huge and growing rapidly s Google’s usage logs are bigger than their web crawl s Data generated per day is comparable to largest conventional data warehouses b Ability to react in real-time to usage patterns s No human in the loop
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
The World-Wide Web b Huge b Distributed content creation, linking (no coordination) b Structured databases, unstructured text, semistructured b Content includes truth, lies, obsolete information, contradictions, … b Our modern-day Library of Alexandria The Web
Background image of page 4
Size of the Web b Number of pages s Technically, infinite b Because of dynamically generated content b Lots of duplication (30-40%) s Best estimate of “unique” static HTML pages comes from search engine claims b Google = 8 billion, Yahoo = 20 billion b Lots of marketing hype b Number of unique web sites s Netcraft survey says 72 million sites (http://news.netcraft.com/archives/web_server_survey.html)
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Netcraft survey http://news.netcraft.com/archives/web_server_survey.html
Background image of page 6
Image of page 7
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/31/2011 for the course CS 345 taught by Professor Dunbar,a during the Fall '07 term at UC Davis.

Page1 / 21

webMiningOverview - CS 345 Data Mining Lecture 1...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online