Big Data Management and Analytics CSCI 436/636 Lecture 6
Outline • Big data processing – Processing Pipeline – Processing System 2
1 Big Data Processing Pipeline • A series of steps where the output of previous step is the input of the next step • Hadoop MapReduce pipeline – Split Map Shuffle and Sort Reduce – WordCount example 3 File 1 File 2 File N WordCount Result File
1 Big Data Processing Pipeline • Hadoop MapReduce pipeline – WordCount example • Split/partition files into HDFS • Map, Shuffle and Sort, Reduce 4
1 Big Data Processing Pipeline • General big data programming model pipeline – Split Apply (Do something) Combine (Merge) – Batch Processing • Collect Data -> Clean Data -> Feed in chunks (Split) -> Wait (Do something and Merge) -> Act – Stream Processing • Instantly capture stream data -> Feed real time to machines -> Process real time -> Act 5
1 Big Data Processing Pipeline • Big data processing pipeline examples 6 Source: l hare.net/ThoughtWorks/big-data-pipeline-with-scala
1 Big Data Processing Pipeline • Big data processing pipeline examples pipeline 7 - apache-flink http:// / BigDataCloud /big-data-analytics-with- google - platform
1.1 Big Data Processing Pipeline • Common data transformation within big data pipeline – Map (One to One mapping): Apply same operation to each member of a collection • Curving every students’ grade, increase by 5% for instance – Reduce: Perform a summary operation (such as counting the number of students in each queue, yielding name frequencies) 8
1.1 Big Data Processing Pipeline • Common data transformation within big data pipeline – Cross/Cartesian • Multiplication – Match/Join • Selective multiplication 9
1.1 Big Data Processing Pipeline • Common data transformation within big data pipeline – Co-Group: binary operation (two inputs) • Group both inputs on a key • Processes groups with matching keys of both inputs – Filter: Select elements that match a criteria 10
1.2 Big Data Processing Pipeline
You've reached the end of your free preview.
Want to read all 32 pages?
- Fall '09
- Computer program, Hadoop