This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Space efficiency in Synopsis construction algorithms Sudipto Guha * Department of Computer and Information Sciences University of Pennsylvania, Philadelphia PA 19104 Abstract Histograms and Wavelet synopses have been found to be useful in query optimization, ap- proximate query answering and mining. Over the last few years several good synopsis al- gorithms have been proposed. These have mostly focused on the running time of the synopsis constructions, optimum or approx- imate, vis-a-vis their quality. However the space complexity of synopsis construction al- gorithms has not been investigated as thor- oughly. Many of the optimum synopsis con- struction algorithms (as well as few of the ap- proximate ones) are expensive in space. In this paper, we propose a general technique that reduces space complexity. We show that the notion of “working space” proposed in these contexts is redundant. We believe that our algorithm also generalizes to a broader range of dynamic programs beyond synopsis construction. Our modifications can be easily adapted to existing algorithms. We demon- strate the performance benefits through ex- periments on real-life and synthetic data. 1 Introduction Wavelet and Histogram representations are important data analysis tools and have been used in image anal- ysis and signal processing for a long time. Most appli- cations of these techniques consider representing the input in terms of the broader characteristics of the data, referred to as a synopsis or signature. These syn- opses or signatures, typically constructed to minimize * Supported in part by an Alfred P. Sloan Re- search Fellowship and by an NSF Award CCF-0430376. Email: [email protected] Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005 some desired error criterion, are used subsequently in a variety of ways. A few of the highlights include ap- plications in OLAP/DSS systems by Haas et. al. , in approximate query answering by Amsaleg et. al.  and Acharya et. al. , and more recently in mining time series by Chakraborty et. al. . Histograms were one of the earliest synopses used in the context of database query optimization [29, 25]. Since the introduction of serial histograms by Ioan- nidis  this area has been a focus of a significant body of research, e.g., [20, 28, 21, 11, 16] among many others. Matias, Vitter and Wang  gave one of the first proposals for using Wavelet based synopsis and over the last few years this topic has also received sig- nificant attention from different groups of researchers [4, 12, 9, 8, 26]. Histograms and Wavelets are not the[4, 12, 9, 8, 26]....
View Full Document
This note was uploaded on 03/01/2010 for the course ICT ... taught by Professor ... during the Three '10 term at University of Sydney.
- Three '10