Chapter 13- External Sorting (9-23)

Chapter 13- External Sorting (9-23) - External Sorting...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
External Sorting Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? Used for eliminating duplicates Select DISTINCT Bulk loading B+ trees Need to first sort leaf level pages Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2 Data requested in sorted order SELECT S.name FROM Sailor S ORDER BY S.age Some join algorithms use sorting Sort -merge join Sorting: Main Challenge Sort 1 TB of data with 1 GB of RAM Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3 Why not just use QuickSort? (i.e., simply map disk pages to virtual memory)
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2-Way External Merge Sort Phase 1: Read a page at a time, sort it, write it Only one buffer page used Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Main memory buffers Disk Disk 1 Page Two-Way External Merge Sort: Phase 1 Input file 1-page runs PHASE 1 3,4 6,2 9,4 8,7 5,6 3,1 2 3,4 5,6 2,6 4,9 7,8 1,3 2 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5 Assume input file with N data pages What is the cost of Phase 1 (in terms of # I/Os)? 2-Way External Merge Sort Phase 2: Make multiple passes to merge runs Pass 1: Merge two runs of length 1 (page) Pass 2: Merge two runs of length 2 (pages) until 1 run of length N Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6 … until 1 run of length N Three buffer pages used Main memory buffers INPUT 1 INPUT 2 OUTPUT Disk Disk
Background image of page 2
2-Way External Merge Sort Input file 1-page runs 2-page runs PHASE 1 PASS 1 PASS 2 3,4 6,2 9,4 8,7 5,6 3,1 2 3,4 2,6 4,9 7,8 2 2,3 4,6 4,7 8,9 5,6 1,3 1,3 5,6 2 2,3 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7 4-page runs 8-page runs PASS 3 4,4 6,7 8,9 1,2 3,5 6 9 1,2 2,3 3,4 4,5 6,6 7,8 PHASE 2 2-Way External Merge Sort: Analysis Total I/O cost for sorting file with N pages Cost of Phase 1 = N 2 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8  N 2 log Number of passes in Phase 2 = Cost of each pass in Phase 2 = N 2 Cost of Phase 2 = N N 2 log 2  21 2 NN log Total cost = General External Merge Sort: Motivation 2-Way merge sort uses at most 3 buffer pages What if more buffer pages were available? Can we use these extra buffer pages to reduce Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9 Can we use these extra buffer pages to reduce sorting cost? Specifically, how would Phases 1 and 2 change?
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2-Way External Merge Sort Phase 1: Read a page at a time, sort it, write it Only one buffer page used How can this be modified if B buffer pages are available? Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10 Main memory buffers Disk Disk 1 Page General External Merge Sort Phase 1: Read B pages at a time, sort B pages in main memory, and write out B pages Length of each run = B pages Assuming N input pages, number of runs = N/B Cost of Phase 1 = N 2 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11 Cost of Phase 1 = Main memory buffers Disk Disk B Pages General External Merge Sort: Phase 1
Background image of page 4
Image of page 5
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/12/2012 for the course CS 4320 taught by Professor Koch during the Fall '07 term at Cornell University (Engineering School).

Page1 / 16

Chapter 13- External Sorting (9-23) - External Sorting...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online