√2011-09-23-ExternalSorting

√2011-09-23-ExternalSorting - External...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
External Sorting Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? Used for eliminating duplicates Select DISTINCT Bulk loading B+ trees Need to first sort leaf level pages Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2 Data requested in sorted order SELECT S.name FROM Sailor S ORDER BY S.age Some join algorithms use sorting Sort -merge join Sorting: Main Challenge Sort 1 TB of data with 1 GB of RAM Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3 Why not just use QuickSort? (i.e., simply map disk pages to virtual memory)
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2-Way External Merge Sort Phase 1: Read a page at a time, sort it, write it Only one buffer page used Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4 Main memory buffers Disk Disk 1 Page Two-Way External Merge Sort: Phase 1 Input file 1-page runs PHASE 1 3,4 6,2 9,4 8,7 5,6 3,1 2 3,4 5,6 2,6 4,9 7,8 1,3 2 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5 Assume input file with N data pages What is the cost of Phase 1 (in terms of # I/Os)? Answer: N 2-Way External Merge Sort Phase 2: Make multiple passes to merge runs Pass 1: Merge two runs of length 1 (page) Pass 2: Merge two runs of length 2 (pages) until 1 run of length N Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6 … until 1 run of length N Three buffer pages used Main memory buffers INPUT 1 INPUT 2 OUTPUT Disk Disk
Background image of page 2
2-Way External Merge Sort Input file 1-page runs 2-page runs PHASE 1 PASS 1 PASS 2 3,4 6,2 9,4 8,7 5,6 3,1 2 3,4 2,6 4,9 7,8 2 2,3 4,6 4,7 8,9 5,6 1,3 1,3 5,6 2 2,3 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7 4-page runs 8-page runs PASS 3 4,4 6,7 8,9 1,2 3,5 6 9 1,2 2,3 3,4 4,5 6,6 7,8 PHASE 2 2-Way External Merge Sort: Analysis Total I/O cost for sorting file with N pages Cost of Phase 1 = N 2 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8  N 2 log Number of passes in Phase 2 = Cost of each pass in Phase 2 = N 2 Cost of Phase 2 = N N 2 log 2  21 2 NN log Total cost = General External Merge Sort: Motivation 2-Way merge sort uses at most 3 buffer pages What if more buffer pages were available? Can we use these extra buffer pages to reduce Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9 Can we use these extra buffer pages to reduce sorting cost? Specifically, how would Phases 1 and 2 change?
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2-Way External Merge Sort Phase 1: Read a page at a time, sort it, write it Only one buffer page used How can this be modified if B buffer pages are available? Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10 Main memory buffers Disk Disk 1 Page General External Merge Sort Phase 1: Read B pages at a time, sort B pages in main memory, and write out B pages Length of each run = B pages Assuming N input pages, number of runs = N/B Cost of Phase 1 = N 2 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11 Cost of Phase 1 = Main memory buffers Disk Disk B Pages General External Merge Sort: Phase 1
Background image of page 4
Image of page 5
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 12/07/2011 for the course CS 4410 taught by Professor Vollset during the Spring '07 term at Cornell.

Page1 / 16

√2011-09-23-ExternalSorting - External...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online