Web Mining Information and Pattern Discovery on the World Wide Web

Clustering of client information or data items on web

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: lustering of client information or data items on Web transaction logs, can facilitate the development and execution of future marketing strategies, both online and o -line, such as automated return mail to clients falling within a certain cluster, or dynamically changing a particular site for a client, on a return visit, based on past classi cation of that client. 4 Analysis of Discovered Patterns The discovery of Web usage patterns, carried out by techniques described earlier, would not be very useful unless there were mechanisms and tools to help an analyst better understand them. Hence, in addition to developing techniques for mining usage patterns from Web logs, there is a need to develop techniques and tools for enabling the analysis of discovered patterns. These techniques are expected to draw from a number of elds including statistics, graphics and visualization, usability analysis, and database querying. In this section we provide a survey of the existing tools and techniques. Usage analysis of Web access behavior being a very new area, there is very little work in it, and correspondingly this survey is not very extensive. Visualization has been used very successfully in helping people understand various kinds of phenomena, both real and abstract. Hence it is a natural choice for understanding the behavior of Web users. Pitkow, et al 45] have developed the WebViz system for visualizing WWW access patterns. A Web path paradigm is proposed in which sets of server log entries are used to extract subsequences of Web traversal patterns called Web paths. WebViz allows the analyst to selectively analyze the portion of the Web that is of interest by ltering out the irrelevant portions. The Web is visualized as a directed graph with cycles, where nodes are pages and edges are (inter-page) hyperlinks. On-Line Analytical Processing (OLAP) is emerging as a powerful paradigm for strategic analysis of databases in business settings. It has been recently demonstrated that the functional and performance needs of OLAP require that new information structures be designed. This has led to the development of the data cube information model 18], and techniques for its e cient implementation 2, 22, 50]. Recent work 15] has shown that the analysis needs of Web usage data have much in common with those of a data warehouse, and hence OLAP techniques are quite applicable. The access information in server logs is modeled as an append-only history, which grows over time. Since the size of server logs grows quite rapidly, it may not be possible to provide on-line analysis of all of it. Therefore, there is a need to summarize the log data, perhaps in various ways, to make its on-line analysis feasible. Making portions of the log selectively (in)visible to various analysts may be required for security reasons. One of the reasons attributed to the great success of relational database technology has been the existence of a high-level, declarative, query language, which allows an application to express what conditions must be satis ed by the data it needs, rather than having to specify how to get the required data. Give...
View Full Document

Ask a homework question - tutors are online