This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Constructing Comprehensive Summaries of Large Event Sequences Jerry Kiernan IBM Almaden San Jose,CA firstname.lastname@example.org Evimaria Terzi IBM Almaden San Jose, CA email@example.com ABSTRACT Event sequences capture system and user activity over time. Prior research on sequence mining has mostly focused on discovering local patterns. Though interesting, these pat- terns reveal local associations and fail to give a comprehen- sive summary of the entire event sequence. Moreover, the number of patterns discovered can be large. In this paper, we take an alternative approach and build short summaries that describe the entire sequence, while revealing local asso- ciations among events. We formally define the summarization problem as an op- timization problem that balances between shortness of the summary and accuracy of the data description. We show that this problem can be solved optimally in polynomial time by using a combination of two dynamic-programming algorithms. We also explore more ecient greedy alterna- tives and demonstrate that they work well on large datasets. Experiments on both synthetic and real datasets illustrate that our algorithms are ecient and produce high-quality results, and reveal interesting local structures in the data. Categories and Subject Descriptors H.2.8 [ Database Management ]: Database Applications Data mining ; I.5.3 [ Pattern Recognition ]: Clustering Algorithms ; E.4 [ Coding and Information Theory ]: [Data Compaction and compression] General Terms Algorithms, Experimentation, Theory Keywords event sequences, summarization, log mining 1. INTRODUCTION Monitoring of systems and users activities produces large event sequences , i.e., logs where each event has an associated Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD08, August 2427, 2008, Las Vegas, Nevada, USA. Copyright 2008 ACM 978-1-60558-193-4/08/08 ...$5.00. time of occurrence. Network trac data, alarms in telecom- munication networks, logging systems are examples of ap- plications that produce large event sequences. Off-the-shelf data-mining methods for event sequences though successful in finding recurring local structures, e.g., episodes, can prove inadequate to provide a global model of the data. Moreover, data-mining algorithms usually output too many patterns that may be overwhelming for the data analysts. In this paper, we bring up a new aspect of event sequence analysis, namely how to concisely summarize such event sequences....
View Full Document
This note was uploaded on 12/27/2011 for the course CMPSC 290a taught by Professor Vandam during the Fall '09 term at UCSB.
- Fall '09