lecture6.pdf - Big Data Management and Analytics CSCI 436\/636 Lecture 6 Outline \u2022\u202f Big data processing \u2013\u202f Processing Pipeline \u2013\u202f Processing

lecture6.pdf - Big Data Management and Analytics CSCI...

This preview shows page 1 - 12 out of 32 pages.

Big Data Management and Analytics CSCI 436/636 Lecture 6
Image of page 1
Outline • Big data processing Processing Pipeline Processing System 2
Image of page 2
1 Big Data Processing Pipeline • A series of steps where the output of previous step is the input of the next step • Hadoop MapReduce pipeline – Split Map Shuffle and Sort Reduce WordCount example 3 File 1 File 2 File N WordCount Result File
Image of page 3
1 Big Data Processing Pipeline • Hadoop MapReduce pipeline WordCount example • Split/partition files into HDFS • Map, Shuffle and Sort, Reduce 4
Image of page 4
1 Big Data Processing Pipeline • General big data programming model pipeline – Split Apply (Do something) Combine (Merge) Batch Processing • Collect Data -> Clean Data -> Feed in chunks (Split) -> Wait (Do something and Merge) -> Act Stream Processing • Instantly capture stream data -> Feed real time to machines -> Process real time -> Act 5
Image of page 5
1 Big Data Processing Pipeline • Big data processing pipeline examples 6 Source: l hare.net/ThoughtWorks/big-data-pipeline-with-scala
Image of page 6
1 Big Data Processing Pipeline • Big data processing pipeline examples pipeline 7 - apache-flink http:// / BigDataCloud /big-data-analytics-with- google - platform
Image of page 7
1.1 Big Data Processing Pipeline • Common data transformation within big data pipeline Map (One to One mapping): Apply same operation to each member of a collection • Curving every students’ grade, increase by 5% for instance Reduce: Perform a summary operation (such as counting the number of students in each queue, yielding name frequencies) 8
Image of page 8
1.1 Big Data Processing Pipeline • Common data transformation within big data pipeline – Cross/Cartesian • Multiplication – Match/Join • Selective multiplication 9
Image of page 9
1.1 Big Data Processing Pipeline • Common data transformation within big data pipeline Co-Group: binary operation (two inputs) • Group both inputs on a key • Processes groups with matching keys of both inputs Filter: Select elements that match a criteria 10
Image of page 10
1.2 Big Data Processing Pipeline
Image of page 11
Image of page 12

You've reached the end of your free preview.

Want to read all 32 pages?

  • Fall '09
  • D'MELLO
  • Computer program, Hadoop

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture