Web Mining Information and Pattern Discovery on the World Wide Web

For selling advertisements on the world wide web

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: tructure. For selling advertisements on the World Wide Web, analyzing user access patterns helps in targeting ads to speci c groups of users. Most existing Web analysis tools 16, 23, 37] provide mechanisms for reporting user activity in the servers and various forms of data ltering. Using such tools it is possible to determine the number of accesses to the server and to individual les, the times of visits, and the domain names and URLs of users. However, these tools are designed to handle low to moderate tra c servers, and usually provide little or no analysis of data relationships among the accessed les and directories within the Web space. More sophisticated systems and techniques for discovery and analysis of patterns are now emerging. These tools can be placed into two main categories, as discussed below. 2.2.1 Pattern Discovery Tools. The emerging tools for user pattern discovery use sophisticated techniques from AI, data mining, psychology, and information theory, to mine for knowledge from collected data. For example, the WEBMINER system 12, 36] introduces a general architecture for Web usage mining. WEBMINER automatically discovers association rules and sequential patterns from server access logs. In 11] algorithms are introduced for nding maximal forward references and large reference sequences. These can, in turn be used to perform various types of user traversal path analysis, such as identifying the most traversed paths thorough a Web locality. Pirolli et. al. 43] use information foraging theory to com- bine path traversal patterns, Web page typing, and site topology information to categorize pages for easier access by users. 2.2.2 Pattern Analysis Tools. Once access patterns have been discovered, analysts need the appropriate tools and techniques to understand, visualize, and interpret these patterns, e.g. the WebViz system 45]. Others have proposed using OLAP techniques such as data cubes for the purpose of simplifying the analysis of usage statistics from server access logs 15]. The WEBMINER system 36] proposes an SQL-like query mechanism for querying the discovered knowledge (in the form of association rules and sequential patterns). These techniques and others are further discussed in the subsequent sections. 3 Pattern Discovery from Web Transactions As discussed in section 2.2, analysis of how users are accessing a site is critical for determining e ective marketing strategies and optimizing the logical structure of the Web site. Because of many unique characteristics of the client-server model in the World Wide Web, including di erences between the physical topology of Web repositories and user access paths, and the di culty in identi cation of unique users as well as user sessions or transactions, it is necessary to develop a new framework to enable the mining process. Speci cally, there are a number of issues in preprocessing data for mining that must be addressed before the mining algorithms can be run. These include developing a model of access log data, developing techniques to clean/ lter the raw data to eliminate outliers and/or irrelevant items, grouping individual page accesses into semantic unit...
View Full Document

This document was uploaded on 02/15/2014.

Ask a homework question - tutors are online