This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Maximal Vector Computation in Large Data Sets Parke Godfrey 1 Ryan Shipley 2 Jarek Gryz 1 1 York University Toronto, ON M3J 1P3 , Canada { godfrey, jarek } @cs.yorku.ca 2 The College of William and Mary Williamsburg, VA 231878795, USA Abstract Finding the maximals in a collection of vec tors is relevant to many applications. The maximal set is related to the convex hull— and hence, linear optimization—and near est neighbors. The maximal vector prob lem has resurfaced with the advent of skyline queries for relational databases and skyline algorithms that are external and relationally well behaved. The initial algorithms proposed for maximals are based on divideandconquer. These es tablished good average and worst case asymp totic running times, showing it to be O ( n ) averagecase, where n is the number of vec tors. However, they are not amenable to ex ternalizing. We prove, furthermore, that their performance is quite bad with respect to the dimensionality, k , of the problem. We demon strate that the more recent external skyline algorithms are actually better behaved, al though they do not have as good an apparent asymptotic complexity. We introduce a new external algorithm, LESS , that combines the best features of these, experimentally evalu ate its effectiveness and improvement over the field, and prove its averagecase running time is O ( kn ). 1 Introduction The maximal vector problem is to find the subset of the vectors such that each is not dominated by any of the Part of this work was conducted at William & Mary where Ryan Shipley was a student and Parke Godfrey was on faculty while on leave of absence from York. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005 vectors from the set. One vector dominates another if each of its components has an equal or higher value than the other vector’s corresponding component, and it has a higher value on at least one of the correspond ing components. One may equivalently consider points in a kdimensional space instead of vectors. In this context, the maximals have also been called the ad missible points, and the set of maximals called the Pareto set . This problem has been considered for many years, as identifying the maximal vectors—or admissi ble points—is useful in many applications. A number of algorithms have been proposed for efficiently finding the maximals....
View
Full
Document
This note was uploaded on 03/01/2010 for the course ICT ... taught by Professor ... during the Three '10 term at University of Sydney.
 Three '10
 ...

Click to edit the document details