Web Mining Information and Pattern Discovery on the World Wide Web

Given the large number of patterns that may be mined

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: n the large number of patterns that may be mined, there appears to be a de nite need for a mechanism to specify the focus of the analysis. Such focus may be provided in at least two ways. First, constraints may be placed on the database (perhaps in a declarative language) to restrict the portion of the database to be mined for, e.g. 36]. Second, querying may be performed on the knowledge that has been extracted by the mining process, in which case a language for querying knowledge rather than data is needed. An SQL-like querying mechanism has been proposed for the WEBMINER system 36]. For example, The query SELECT association-rules(A*B*C*) FROM log.data WHERE date >= 970101 AND domain = "edu" AND support = 1.0 AND confidence = 90.0 extracts the rules involving the \.edu" domain after Jan 1, 1997, which start with URL A, and contain B and C in that order, and that have a minimum support of 1 % and a minimum con dence of 90 %. 5 Web Usage Mining Architecture We have developed a general architecture for Web usage mining which is presented in 13] and 36]. The WEBMINER is a system that implements parts of this general architecture. The architecture divides the Web usage mining process into two main parts. The rst part includes the domain dependent processes of transforming the Web data into suitable transaction form. This includes preprocessing, transaction identication, and data integration components. The second part includes the largely domain independent application of generic data mining and pattern matching techniques (such as the discovery of association rule and sequential patterns) as part of the system's data mining engine. The overall architecture for the Web mining process is depicted in Figure 2. Data cleaning is the rst step performed in the Web usage mining process. Some low level data integration tasks may also be performed at this stage, such as combining multiple logs, incorporating referrer logs, etc. After the data cleaning, the log entries must be partitioned into logical clusters using one or a series of transaction identi cation modules. The goal of transaction identi cation is to create meaningful clusters of references for each user. The task of identifying transactions is one of either dividing a large transaction into multiple smaller ones or merging small transactions into fewer larger ones. The input and output transaction formats match so that any number of modules to be combined in any order, as the data analyst sees t. Once the domain-dependent data transformation phase is completed, the resulting transaction data must be formatted to conform to the data model of the appropriate data mining task. For instance, the format of the data for the association rule discovery task may be di erent than the format necessary for mining sequential patterns. Finally, a query mech- Figure 2: A General Architecture for Web Usage Mining anism will allow the user (analyst) to provide more control over the discovery process by specifying various cons...
View Full Document

This document was uploaded on 02/15/2014.

Ask a homework question - tutors are online