Web Mining Information and Pattern Discovery on the World Wide Web

Since usually such transaction databases contain

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: . Since usually such transaction databases contain extremely large amounts of data, current association rule discovery techniques try to prune the search space according to support for items under consideration. Support is a measure based on the number of occurrences of user transactions within transaction logs. Discovery of such rules for organizations engaged in electronic commerce can help in the development of e ective marketing strategies. But, in addition, association rules discovered from WWW access logs can give an indication of how to best organize the organization's Web space. The problem of discovering sequential patterns 35, 52] is to nd inter-transaction patterns such that the presence of a set of items is followed by another item in the time-stamp ordered transaction set. In Web server transaction logs, a visit by a client is recorded over a period of time. The time stamp associated with a transaction in this case will be a time interval which is determined and attached to the transaction during the data cleaning or transaction identi cation processes. The discovery of sequential patterns in Web server access logs allows Web-based organizations to predict user visit patterns and helps in targeting advertising aimed at groups of users based on these patterns. By analyzing this information, the Web mining system can determine temporal relationships among data items such as the following: 30% of clients who visited /company/products, had done a search in Yahoo, within the past week on keyword or 60% of clients who placed an online order in /company/product1, also placed an online order in /company/product4 within 15 days. w Another important kind of data dependency that can be discovered, using the temporal characteristics of the data, are similar time sequences. For example, we may be interested in nding common characteristics of all clients that visited a particular le within the time period 1 2 ]. Or, conversely, we may be interested in a time interval (within a day, or within a week, etc.) in which a particular le is most accessed. Discovering classi cation rules 20, 54] allows one to develop a pro le of items belonging to a particular group according to their common attributes. This pro le can then be used to classify new data items that are added to the database. In Web usage mining, classi cation techniques allow one to develop a pro le for clients who access particular server les based on demographic information available on those clients, or based on their access patterns. For example, classi cation on WWW access logs may lead to the discovery of relationships such as the following: t t clients from state or government agencies who visit the site tend to be interested in the page /company/product1 or 50% of clients who placed an online order in /company/product2, were in the 20-25 age group and lived on the West Coast. Clustering analysis 24, 38] allows one to group together clients or data items that have similar characteristics. C...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online