This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: External Memory Algorithms and Data Structures: Dealing with Massive Data JEFFREY SCOTT VITTER Duke University Data sets in large applications are often too massive to fit completely inside the computers internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this article we survey the state of the art in the design and analysis of external memory (or EM) algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs. We consider a variety of EM paradigms for solving batched and online problems efficiently in external memory. For the batched problem of sorting and related problems such as permuting and fast Fourier transform, the key paradigms include distribution and merging. The paradigm of disk striping offers an elegant way to use multiple disks in parallel. For sorting, however, Categories and Subject Descriptors: B.4.3 [ Input/Output and Data Communications ]: Interconnections parallel I/O ; E.1 [ Data Structures ]: graphs and networks, trees ; E.5 [ Files ]: sorting/searching ; F.1.1 [ Computation by Abstract Devices ]: Models of Computation bounded action devices, relations between models ; F.2.2 [ Analysis of Algorithms and Problem Complexity ]: Nonnumerical Algorithms and Problems computations on discrete structures, geometrical problems and computations, sorting and searching ; H.2.2 [ Database Management ]: Physical Design access methods ; H.2.8 [ Database Management ]: Database Applications spatial databases and GIS ; H.3.2 [ Information Storage and Retrieval ]: Information Storage file organization ; H.3.3 [ Information Storage and Retrieval ]: Information Search and Retrieval information filtering, search process General Terms: Algorithms, Design, Experimentation, Performance, Theory Additional Key Words and Phrases: Batched, block, B-tree, disk, dynamic, extendible hashing, external memory, hierarchical memory, I/O, multidimensional access methods, multilevel memory, online, out-of-core, secondary storage, sorting This work was supported in part by Army Research Office MURI grant DAAH04-96-1-0013 and by National Science Foundation research grants CCR-9522047, EIA-9870734, and CCR-9877133. Part of this work was done at BRICS, University of Aarhus, Aarhus, Denmark and at INRIA, Sophia Antipolis, France. Earlier, shorter versions of some of this material appeared in Vitter [1998; 1999a; 1999b; 1999c]. Authors address: Department of Computer Science, Levine Science Research Center, Duke University Box 90129, Durham, NC 27708-0129, e-mail: email@example.com, URL: http://www.cs.duke.edu/jsv....
View Full Document