You can see just with 5 lines of pig program we have solved the word count

You can see just with 5 lines of pig program we have

This preview shows page 8 - 11 out of 13 pages.

You can see just with 5 lines of pig program, we have solved the word count problem very easily. There is no need to be concerned with map, shuffle, and reduce phases when using Pig. It will manage decomposing the operators in your script into the appropriate MapReduce phases. Pig Latin, a Parallel Dataflow Language[7] Pig Latin is a dataflow language. This means it allows users to describe how data from one or more inputs should be read, processed, and then stored to one or more outputs in parallel. These data flows can be simple linear flows like the word count example given previously. They can also be complex workflows that include points where multiple inputs are joined, and where data is split into multiple streams to be processed by different operators. To be mathematically precise, a Pig Latin script describes a directed acyclic graph (DAG), where the edges are data flows and the nodes are operators that process the data. This means that Pig Latin looks different from many of the programming languages you have seen. There are no if statements or for loops in Pig Latin. This is because traditional procedural and object-oriented programming languages describe control flow, and data flow is a side effect of the program. Pig Latin instead focuses on data flow.
Background image
1Sunny Kumar, 2Eesha Goel Research Cell : An International Journal of Engineering Sciences, January 2016, Vol. 17 ISSN: 2229-6913 (Print), ISSN: 2320-0332 (Online) -, Web Presence: © 2016 Vidya Publications. Authors are responsible for any plagiarism issues. 59 Fig 7. Working of Pig As shown in fig 7 the steps of working of pig are: [3] 1.Parsing 2.Semantic checking 3.Logical optimizer 4.Logical to Physical optimizer 5.Physical to MapReduce translator 6.MapReduce launcher 3. Implementation Implementation is performed using a map reduce program on raw data. Map reduce program contain 200 lines of code to perform mapping and reducing of data. This code is written in java language. Hadoop component map reduce is a powerful concept that performs mapping and reducing on data. As compare to traditional system if it is performed using map and reduce concept without Hadoop framework it becomes very difficult and thousands of lines of code is written for that. So same concept is implemented using Hadoop ecosystem that is hive. It runs on top of Hadoop framework and perform mapping. First a hive script is written that consist of 7 lines of code. Then “hive –f <name of the hive script file>” is written on the command prompt and ad-hoc-queries are run and map reduce is performed automatically as there is no need to write full code for mapping. Then “hive –e ‘select * from word_count1’” command is used to display output.
Background image
1Sunny Kumar, 2Eesha Goel Research Cell : An International Journal of Engineering Sciences, January 2016, Vol. 17 ISSN: 2229-6913 (Print), ISSN: 2320-0332 (Online) -, Web Presence: © 2016 Vidya Publications. Authors are responsible for any plagiarism issues.
Background image
Image of page 11

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture