Unformatted text preview: traints. For more details on the WEBMINER
system refer to 13, 36]. 6 Research Directions
The techniques being applied to Web content mining draw heavily from the work on information retrieval, databases, intelligent agents, etc. Since most
of these techniques are well known and reported elsewhere, we have focused on Web usage mining in this
survey instead of Web content mining. In the following we provide some directions for future research. 6.1 Data Pre-Processing for Mining Web usage data is collected in various ways, each
mechanism collecting attributes relevant for its purpose. There is a need to pre-process the data to make
it easier to mine for knowledge. Speci cally, we believe that issues such as instrumentation and data collection, data integration and transaction identi cation
need to be addressed.
Clearly improved data quality can improve the
quality of any analysis on it. A problem in the Web
domain is the inherent con ict between the analysis
needs of the analysts (who want more detailed usage
data collected), and the privacy needs of users (who
want as little data collected as possible). This has
lead to the development of cookie les on one side and
cache busting on the other. The emerging OPS standard on collecting pro le data may be a compromise
on what can and will be collected. However, it is not
clear how much compliance to this can be expected.
Hence, there will be a continual need to develop better
instrumentation and data collection techniques, based on whatever is possible and allowable at any point in
Portions of Web usage data exist in sources as diverse as Web server logs, referral logs, registration
les, and index server logs. Intelligent integration and
correlation of information from these diverse sources
can reveal usage information which may not be evident from any one of them. Techniques from data
integration 33] should be examined for this purpose.
Web usage data collected in various logs is at a very
ne granularity. Therefore, while it has the advantage
of being extremely general and fairly detailed, it also
has the corresponding drawback that it cannot be analyzed directly, since the analysis may start focusing
on micro trends rather than on the macro trends. On
the other hand, the issue of whether a trend is micro
or macro depends on the purpose of a speci c analysis.
Hence, we believe there is a need to group individual
data collection events into groups, called Web transactions 12], before feeding it to the mining system.
While 11, 12, 36] have proposed techniques to do so,
more attention needs to be given to this issue. 6.2 The Mining Process
The key component of Web mining is the mining
process itself. As discussed in this paper, Web mining
has adapted techniques from the eld of data mining,
databases, and information retrieval, as well as developing some techniques of its own, e.g. path analysis. A
lot of work still remains to be done in adapting known
mining techniques as well as developing new ones.
Web usage mining studies reported to date have
mined for association rules, temporal sequences, clusters, and path expressio...
View Full Document
- Spring '14
- Data Mining, Web page, World Wide Web, web usage mining