{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Web Mining Information and Pattern Discovery on the World Wide Web

This problem has been addressed in 11 and 12 32

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: on. This problem has been addressed in 11] and 12]. 3.2 Discovery Techniques on Web Transactions Once user transactions or sessions have been identi ed, there are several kinds of access pattern mining that can be performed depending on the needs of the analyst, such as path analysis, discovery of association rules and sequential patterns, and clustering and classi cation. There are many di erent types of graphs that can be formed for performing path analysis, since a graph represents some relation de ned on Web pages (or other objects). The most obvious is a graph representing the physical layout of a Web site, with Web pages as nodes and hypertext links between pages as directed edges. Other graphs could be formed based on the types of Web pages with edges representing similarity between pages, or creating edges that give the number of users that go from one page to another 43]. Most of the work to date involves determining frequent traversal patterns or large reference sequences from the physical layout type of graph. Path analysis could be used to determine most frequently visited paths in a Web site. Other examples of information that can be discovered through path analysis are: 70% of clients who accessed /company/product2 did so by starting at /company and proceeding through /company/new, /company/products, and /company/product1 80% of clients who accessed the site started from /company/products or 65% of clients left the site after four or less page references. The rst rule suggests that there is useful information in /company/product2, but since users tend to take a circuitous route to the page, it is not clearly marked. The second rule simply states that the majority of users are accessing the site through a page other than the main page (assumed to be /company in this example) and it might be a good idea to include directory type information on this page if it is not there already. The last rule indicates an attrition rate for the site. Since many users don't browse further than four pages into the site, it would be prudent to ensure that important information is contained within four pages of the common site entry points. Association rule discovery techniques 1, 48] are generally applied to databases of transactions where each transaction consists of a set of items. In such a framework the problem is to discover all associations and correlations among data items where the presence of one set of items in a transaction implies (with a certain degree of con dence) the presence of other items. In the context of Web usage mining, this problem amounts to discovering the correlations among references to various les available on the server by a given client. Each transaction is comprised of a set of URLs accessed by a client in one visit to the server. For example, using association rule discovery techniques we can nd correlations such as the following: 40% of clients who accessed the Web page with URL /company/product1, also accessed /company/product2 or 30% of clients who accessed /company/special, placed an online order in /company/product1...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online