Lecture17 (1)

Lecture17 (1) - Advanced Database Systems: DBS CB, 2nd...

Info iconThis preview shows pages 1–14. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: “MapReduce and SQL” & “SSD and DB”
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 Outline Outline MapReduce and SQL SSD and SQL
Background image of page 2
3 MapReduce and SQL MapReduce and SQL
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Introduction Introduction It is all about divide and conquer 4 “Work” w 1 w 2 w 3 r 1 r 2 r 3 “Result” “worker” “worker” “worker” Partition Combine
Background image of page 4
Introduction Introduction Different workers : Different threads in the same core Different cores in the same CPU Different CPUs in a multi-processor system Different machines in a distributed system Parallelization Problems : How do we assign work units to workers? What if we have more work units than workers? What if workers need to share partial results? How do we aggregate partial results? How do we know all the workers have finished? What if workers die? 5
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Introduction Introduction General Themes : Parallelization problems arise from : Communication between workers Access to shared resources (e.g., data) Thus, we need a synchronization system ! This is tricky : Finding bugs is hard Solving bugs is even harder 6
Background image of page 6
Introduction Introduction Patterns for Parallelism : Master/Workers Producer/Consumer Flow Work Queues 7 workers master C P P P C C C P P P C C C P P P C C shared queue
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Introduction: Evolution Introduction: Evolution Functional Programming MapReduce Google File System (GFS) 8
Background image of page 8
Introduction Introduction Functional Programming: MapReduce = functional programming meets distributed processing on steroids Not a new idea… dates back to the 50’s (or even 30’s) What is functional programming? Computation as application of functions Theoretical foundation provided by lambda calculus How is it different? Traditional notions of “data” and “instructions” are not applicable Data flows are implicit in program Different orders of execution are possible Exemplified by LISP and ML 9
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Introduction: Lisp Introduction: Lisp MapReduce? MapReduce? What does this have to do with MapReduce? After all, Lisp is about processing lists Two important concepts in functional programming Map: do something to everything in a list Fold: combine results of a list in some way 10
Background image of page 10
Introduction: Map Introduction: Map Map is a higher-order function How map works: Function is applied to every element in a list Result is a new list 11 f f f f f
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Introduction: Fold Introduction: Fold Fold is also a higher-order function How fold works: Accumulator set to initial value Function applied to list element and the accumulator Result stored in the accumulator Repeated for every item in the list Result is the final value in the accumulator 12 f f f f f final value Initial value
Background image of page 12
Lisp Lisp MapReduce MapReduce Let’s assume a long list of records: imagine if. .. We can distribute the execution of map operations to multiple
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 14
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/21/2012 for the course CS 610 taught by Professor Don'tknow during the Spring '11 term at Santa Clara.

Page1 / 60

Lecture17 (1) - Advanced Database Systems: DBS CB, 2nd...

This preview shows document pages 1 - 14. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online