This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: 1 CS 345A Data Mining Lecture 1 Introduction to Web Mining What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns Web Mining v. Data Mining Structure (or lack of it) Textual information and linkage structure Scale Data generated per day is comparable to largest conventional data warehouses Speed Often need to react to evolving usage patterns in real-time (e.g., merchandising) Web Mining topics Web graph analysis Power Laws and The Long Tail Structured data extraction Web advertising Systems Issues Web Mining topics Web graph analysis Power Laws and The Long Tail Structured data extraction Web advertising Systems Issues Size of the Web Number of pages Technically, infinite Much duplication (30-40%) Best estimate of unique static HTML pages comes from search engine claims Google = 8 billion(?), Yahoo = 20 billion Number of web sites Netcraft survey says 72 million sites (http://news.netcraft.com/archives/web_server_survey.html) 2 Netcraft survey http://news.netcraft.com/archives/web_server_survey.html The web as a graph...
View Full Document
- Fall '09
- Data Mining