CS411 - MapReduce - Note 1 - 2

G 2007 input personal information ssn personal info

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: this communication issue As a programmer, we need to 1. Prepare the data 2. Give a map function and 3. Give a reduce function How do we benefit from this framework? It immediately makes the cluster of nodes available to us. It worries about the scheduling issue and make it transparent to us. First phase: partition the documents by the framework (0) Shards the input file. Each shard is typically 16~64MB in size. A bad beginning makes a bad ending I have a bad news MapReduce Architecture (2 of 8) Map Reduce (27 of 44) (1) The user program creates processes on the Master and worker threads. - - Prepare data, map function and reduce function A bad beginning makes a bad ending I have a bad news MapReduce Architecture (3 of 8) What a user of this framework needs to do: Map Reduce (28 of 44) (2) Master pick idle workers to assign map or reduce tasks A bad beginning makes a bad ending I have a bad news MapReduce Architecture (4 of 8) Map Reduce (29 of 44) (3) Each map worker reads assigned input shard and output <key, value> pairs. A bad beginning makes a bad ending - - Second phase: documents fed into worker to map, outputting key- value pairs <bad, 2> <a, 1> <ending, 1> ... I have a bad news <bad, 1> <a, 1> <news, 1> ... MapReduce Architecture (5 of 8) Map Reduce (30 of 44) (4) Write Intermediate <key, value> pairs to local disk. <bad, 2> <a, 1> <ending, 1> ... <bad, 1> <a, 1> <news, 1> ... MapReduce Architecture (6 of 8) Map Reduce (31 of 44) (5) Reduce worker reads intermediate data sort by key Third phase: reduce. Pairs of same keys go to same reducer <bad, 2> <a, 1> <ending, 1> ... <bad, 1> <a, 1> <news, 1> ... MapReduce Architecture (7 of 8) <bad, (2, 1)> <ending, 1> ... <a, (1, 1)> <news, 1> ... Map Reduce (32 of 44) (6) Reduce workers write the result. <bad, (2, 1)> <ending, 1> ... MapReduce Architecture (8 of 8) <bad, 3> <ending, 1> ... <a, 2> <a, (1, 1)> <news, 1> <news, 1> ... ... Map Reduce (33 of 44) Back to Parallel Databases Back to Parallel Databases (0 of 2) Map Reduce (34 of 44) Scenarios where MapReduce outperforms Parallel Databases How does the framework help us in these various scenarios? • Scenario 1: Semi- Structured Data • The data model of MapReduce use “key- value pair” data. • Scenario 2 & 3: ETL Tasks and Data Mining Applications • Fast data loading time. • Flexible User- defined map() and reduce() functions in MapReduce. Back to Parallel Databases (1 of 2) Map Reduce (35 of 44) 1.Data is not rigorous as tables. Example: for image, key could...
View Full Document

Ask a homework question - tutors are online