Web Mining Information and Pattern Discovery on the World Wide Web

For example syskill webert utilizes a user pro le and

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 4]. For example, Syskill & Webert utilizes a user pro le and learns to rate Web pages of interest using a Bayesian classi er. 2.1.2 Database Approach. Database approaches to Web mining have focused on techniques for organizing the semi-structured data on the Web into more structured collections of resources, and using standard database querying mechanisms and data mining techniques to analyze it. Multilevel Databases: The main idea behind this approach is that the lowest level of the database contains semi-structured information stored in various Web repositories, such as hypertext documents. At the higher level(s) meta data or generalizations are extracted from lower levels and organized in structured collections, i.e. relational or object-oriented databases. For example, Han, et. al. 56] use a multilayered database where each layer is obtained via generalization and transformation operations performed on the lower layers. Kholsa, et. al. 25] propose the creation and maintenance of meta-databases at each information providing domain and the use of a global schema for the meta-database. King & Novak 26] propose the incremental integration of a portion of the schema from each information source, rather than relying on a global heterogeneous database schema. The ARANEUS system 40] extracts relevant information from hypertext documents and integrates these into higher-level derived Web Hypertexts which are generalizations of the notion of database views. Web Query Systems: Many Web-based query systems and languages utilize standard database query languages such as SQL, structural information about Web documents, and even natural language processing for the queries that are used in World Wide Web searches. W3QL 28] combines structure queries, based on the organization of hypertext documents, and content queries, based on information retrieval techniques. WebLog 31] Logic-based query language for restructuring extracts information from Web information sources. Lorel 46] and UnQL 7, 8] query heterogeneous and semi-structured information on the Web using a labeled graph data model. TSIMMIS 10] extracts data from heterogeneous and semi-structured information sources and correlates them to generate an integrated database representation of the extracted information. 2.2 Web Usage Mining Web usage mining is the automatic discovery of user access patterns from Web servers. Organizations collect large volumes of data in their daily operations, generated automatically by Web servers and collected in server access logs. Other sources of user information include referrer logs which contain information about the referring pages for each page reference, and user registration or survey data gathered via CGI scripts. Analyzing such data can help organizations determine the life time value of customers, cross marketing strategies across products, and e ectiveness of promotional campaigns, among other things. It can also provide information on how to restructure a Web site to create a more e ective organizational presence, and shed light on more e ective management of workgroup communication and organizational infras...
View Full Document

This document was uploaded on 02/15/2014.

Ask a homework question - tutors are online