web_mining_overview

web_mining_overview - CS 345A Data Mining Lecture 1...

Info iconThis preview shows pages 1–9. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS 345A Data Mining Lecture 1 Introduction to Web Mining What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns Web Mining v. Data Mining ¡ Structure (or lack of it) ¢ Textual information and linkage structure ¡ Scale ¢ Data generated per day is comparable to largest conventional data warehouses ¡ Speed ¢ Often need to react to evolving usage patterns in real-time (e.g., merchandising) Web Mining topics ¡ Web graph analysis ¡ Power Laws and The Long Tail ¡ Structured data extraction ¡ Web advertising ¡ Systems Issues Web Mining topics ¡ Web graph analysis ¡ Power Laws and The Long Tail ¡ Structured data extraction ¡ Web advertising ¡ Systems Issues Size of the Web ¡ Number of pages ¢ Technically, infinite ¢ Much duplication (30-40%) ¢ Best estimate of “unique” static HTML pages comes from search engine claims ¡ Google = 8 billion(?), Yahoo = 20 billion Netcraft survey http://news.netcraft.com/archives/web_server_survey.html The web as a graph ¡ Pages = nodes, hyperlinks = edges ¢ Ignore content ¢ Directed graph ¡ High linkage ¢ 10-20 links/page on average ¢ Power-law degree distribution Structure of Web graph...
View Full Document

{[ snackBarMessage ]}

Page1 / 31

web_mining_overview - CS 345A Data Mining Lecture 1...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online