10.1.1.99.4245 - Web Usage Mining: Sequential Pattern...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Web Usage Mining: Sequential Pattern Extraction with a Very Low Support F. Masseglia, D. Tanasa, and B. Trousse INRIA Sophia Antipolis 2004 route des lucioles - BP 93 06902 Sophia Antipolis, France { Florent.Masseglia, Doru.Tanasa, Brigitte.Trousse } @sophia.inria.fr Abstract. The goal of this work is to increase the relevance and the interestingness of patterns discovered by a Web Usage Mining process. Indeed, the sequential patterns extracted on web log files, unless they are found under constraints, often lack interest because of their obvious content. Our goal is to discover minority users behaviors having a coherence which we want to be aware of (like hacking activities on the Web site or a users activity limited to a specific part of the Web site). By means of a clustering method on the extracted sequential patterns, we propose a recursive division of the problem. The developed clustering method is based on patterns summaries and neural networks. Our experiments show that we obtain the targeted patterns whereas their extraction by means of a classical process is impossible because of a very weak support (down to 0.006%). The diversity of users behaviors is so large that the minority ones are both numerous and dicult to locate. Keywords: Web usage mining, sequential patterns, clustering, patterns summary, neural networks. 1 Introduction Analyzing the behavior of a Web sites users, also known as Web Usage Mining, is a research field which consists in adapting the data mining methods to the records of access log files. These files collect data such as the IP address of the connected host, the requested URL, the date and other information regarding the navigation of the user. Web Usage Mining techniques provide knowledge about the behavior of the users in order to extract relationships in the recorded data. Among available techniques, the sequential patterns are particularly well adapted to the log study. Extracting sequential patterns on a log file, is supposed to provide this kind of relationship: On the Inrias Web Site, 10% of users visited consecutively the homepage, the available positions page, the ET 1 offers, the ET missions and finally the past ET competitive selection . This kind of behavior is just a supposition, because extracting sequential patterns on a log file also implies managing several problems: 1 ET: Engineers, Technicians J.X. Yu, X. Lin, H. Lu, and Y. Zhang (Eds.): APWeb 2004, LNCS 3007, pp. 513522, 2004. c Springer-Verlag Berlin Heidelberg 2004 514 F. Masseglia, D. Tanasa, and B. Trousse the cache (on the users computer) and the proxies (which can be cache servers) can lower the number of records in the access log....
View Full Document

Page1 / 10

10.1.1.99.4245 - Web Usage Mining: Sequential Pattern...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online