Web Mining Information and Pattern Discovery on the World Wide Web

2 a taxonomy of web mining in this section we present

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ons. 2 A Taxonomy of Web Mining In this section we present a taxonomy of Web mining, i.e. Web content mining and Web usage mining. We also describe and categorize some of the recent work and the related tools or techniques in each area. This taxonomy is depicted in Figure 1. 2.1 Web Content Mining The lack of structure that permeates the information sources on the World Wide Web makes automated discovery of Web-based information di cult. Traditional search engines such as Lycos, Alta Vista, WebCrawler, ALIWEB 29], MetaCrawler, and others provide some comfort to users, but do not generally provide structural information nor categorize, lter, or interpret documents. A recent study provides a comprehensive and statistically thorough comparative evaluation of the most popular search engines 32]. In recent years these factors have prompted researchers to develop more intelligent tools for information retrieval, such as intelligent Web agents, and to extend data mining techniques to provide a higher level of organization for semi-structured data available on the Web. We summarize some of these e orts below. 2.1.1 Agent-Based Approach. Generally, agent-based Web mining systems can be placed into the following three categories: Intelligent Search Agents: Several intelligent Web agents have been developed that search for relevant information using domain characteristics and user pro les to organize and interpret the discovered information. Agents such as Harvest 6], FAQFinder 19], Information Manifold 27], OCCAM 30], and ParaSite 51] rely either on pre-speci ed domain information about particular types of documents, or on hard coded models of the information sources to retrieve and interpret documents. Agents such as ShopBot 14] and ILA (Internet Learning Agent) 42] interact with and learn the structure of unfamiliar information sources. ShopBot retrieves product information from a variety of vendor sites using only general information about the product domain. ILA learns models of various information sources and translates these into its own concept hierarchy. Information Filtering/Categorization: A number of Web agents use various information retrieval techniques 17] and characteristics of open hypertext Web documents to automatically retrieve, lter, and categorize them 5, 9, 34, 55, 53]. HyPursuit 53] uses semantic information embedded in link structures and document content to create cluster hierarchies of hypertext documents, and structure an information space. BO (Bookmark Organizer) 34] combines hierarchical clustering techniques and user interaction to organize a collection of Web documents based on conceptual information. Personalized Web Agents: This category of Web agents learn user preferences and discover Web information sources based on these preferences, and those of other individuals with similar interests (using collaborative ltering). A few recent examples of such agents include the WebWatcher 3], PAINT 39], Syskill & Webert 41], GroupLens 47], Fire y 49], and others...
View Full Document

This document was uploaded on 02/15/2014.

Ask a homework question - tutors are online