webMiningOverview

webMiningOverview - CS345 DataMining Lecture1...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon
    CS 345 Data Mining Lecture 1 Introduction to Web Mining
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
What is Web Mining? Discovering useful information from the  World-Wide Web and its usage patterns Applications Web search e.g., Google, Yahoo,… Vertical Search e.g., FatLens, Become,… Recommendations e.g., Amazon.com Advertising e.g., Google, Yahoo Web site design e.g., landing page  optimization
Background image of page 2
How does it differ from “classical” Data  Mining? The web is not a relation Textual information and linkage structure Usage data is huge and growing rapidly Google’s usage logs are bigger than their web  crawl Data generated per day is comparable to  largest conventional data warehouses Ability to react in real-time to usage  patterns No human in the loop
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
The World-Wide Web Huge Distributed content creation,  linking (no coordination) Structured databases,  unstructured text, semistructured  Content includes truth, lies,  obsolete information,  contradictions, … Our modern-day Library of  Alexandria The Web
Background image of page 4
Size of the Web Number of pages Technically, infinite Because of dynamically generated content Lots of duplication (30-40%) Best estimate of “unique” static HTML pages  comes from search engine claims Google = 8 billion, Yahoo = 20 billion Lots of marketing hype
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 6
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 03/04/2012.

Page1 / 21

webMiningOverview - CS345 DataMining Lecture1...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online