Web Mining Information and Pattern Discovery on the World Wide Web

Web Mining Information and Pattern Discovery on the World Wide Web

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: traints. For more details on the WEBMINER system refer to 13, 36]. 6 Research Directions The techniques being applied to Web content mining draw heavily from the work on information retrieval, databases, intelligent agents, etc. Since most of these techniques are well known and reported elsewhere, we have focused on Web usage mining in this survey instead of Web content mining. In the following we provide some directions for future research. 6.1 Data Pre-Processing for Mining Web usage data is collected in various ways, each mechanism collecting attributes relevant for its purpose. There is a need to pre-process the data to make it easier to mine for knowledge. Speci cally, we believe that issues such as instrumentation and data collection, data integration and transaction identi cation need to be addressed. Clearly improved data quality can improve the quality of any analysis on it. A problem in the Web domain is the inherent con ict between the analysis needs of the analysts (who want more detailed usage data collected), and the privacy needs of users (who want as little data collected as possible). This has lead to the development of cookie les on one side and cache busting on the other. The emerging OPS standard on collecting pro le data may be a compromise on what can and will be collected. However, it is not clear how much compliance to this can be expected. Hence, there will be a continual need to develop better instrumentation and data collection techniques, based on whatever is possible and allowable at any point in time. Portions of Web usage data exist in sources as diverse as Web server logs, referral logs, registration les, and index server logs. Intelligent integration and correlation of information from these diverse sources can reveal usage information which may not be evident from any one of them. Techniques from data integration 33] should be examined for this purpose. Web usage data collected in various logs is at a very ne granularity. Therefore, while it has the advantage of being extremely general and fairly detailed, it also has the corresponding drawback that it cannot be analyzed directly, since the analysis may start focusing on micro trends rather than on the macro trends. On the other hand, the issue of whether a trend is micro or macro depends on the purpose of a speci c analysis. Hence, we believe there is a need to group individual data collection events into groups, called Web transactions 12], before feeding it to the mining system. While 11, 12, 36] have proposed techniques to do so, more attention needs to be given to this issue. 6.2 The Mining Process The key component of Web mining is the mining process itself. As discussed in this paper, Web mining has adapted techniques from the eld of data mining, databases, and information retrieval, as well as developing some techniques of its own, e.g. path analysis. A lot of work still remains to be done in adapting known mining techniques as well as developing new ones. Web usage mining studies reported to date have mined for association rules, temporal sequences, clusters, and path expressio...
View Full Document

This document was uploaded on 02/15/2014.

Ask a homework question - tutors are online