This preview shows page 1. Sign up to view the full content.
Unformatted text preview: tructure. For
selling advertisements on the World Wide Web, analyzing user access patterns helps in targeting ads to
speci c groups of users.
Most existing Web analysis tools 16, 23, 37] provide mechanisms for reporting user activity in the
servers and various forms of data ltering. Using such
tools it is possible to determine the number of accesses
to the server and to individual les, the times of visits,
and the domain names and URLs of users. However,
these tools are designed to handle low to moderate
tra c servers, and usually provide little or no analysis of data relationships among the accessed les and
directories within the Web space.
More sophisticated systems and techniques for discovery and analysis of patterns are now emerging.
These tools can be placed into two main categories,
as discussed below. 2.2.1 Pattern Discovery Tools. The emerging
tools for user pattern discovery use sophisticated techniques from AI, data mining, psychology, and information theory, to mine for knowledge from collected
data. For example, the WEBMINER system 12, 36]
introduces a general architecture for Web usage mining. WEBMINER automatically discovers association
rules and sequential patterns from server access logs.
In 11] algorithms are introduced for nding maximal forward references and large reference sequences.
These can, in turn be used to perform various types
of user traversal path analysis, such as identifying the
most traversed paths thorough a Web locality. Pirolli
et. al. 43] use information foraging theory to com- bine path traversal patterns, Web page typing, and
site topology information to categorize pages for easier access by users. 2.2.2 Pattern Analysis Tools. Once access patterns have been discovered, analysts need the appropriate tools and techniques to understand, visualize,
and interpret these patterns, e.g. the WebViz system
45]. Others have proposed using OLAP techniques
such as data cubes for the purpose of simplifying the
analysis of usage statistics from server access logs 15].
The WEBMINER system 36] proposes an SQL-like
query mechanism for querying the discovered knowledge (in the form of association rules and sequential
patterns). These techniques and others are further
discussed in the subsequent sections. 3 Pattern Discovery from Web
As discussed in section 2.2, analysis of how users
are accessing a site is critical for determining e ective marketing strategies and optimizing the logical
structure of the Web site. Because of many unique
characteristics of the client-server model in the World
Wide Web, including di erences between the physical
topology of Web repositories and user access paths,
and the di culty in identi cation of unique users as
well as user sessions or transactions, it is necessary to
develop a new framework to enable the mining process. Speci cally, there are a number of issues in preprocessing data for mining that must be addressed before the mining algorithms can be run. These include
developing a model of access log data, developing techniques to clean/ lter the raw data to eliminate outliers
and/or irrelevant items, grouping individual page accesses into semantic unit...
View Full Document
This document was uploaded on 02/15/2014.
- Spring '14