Tutorial_Hadoop_HDFS_MapReduce.pdf - Tutorial on Hadoop HDFS and MapReduce Table Of Contents Introduction 3 The Use Case 4 Pre-Requisites 5 Task 1

Tutorial_Hadoop_HDFS_MapReduce.pdf - Tutorial on Hadoop...

This preview shows page 1 - 5 out of 12 pages.

Tutorial on Hadoop HDFS and MapReduce
Image of page 1
Hortonworks, Inc. | 455 W. Maude Ave | Suite 200 | Sunnyvale, CA 94085 | Tel: (855) 8-HORTON | hortonworks.com Copyright © 2012 Hortonworks, Inc. All Rights Reserved 2 Table Of Contents Introduction ........................................................................................................... 3 The Use Case ....................................................................................................... 4 Pre-Requisites ....................................................................................................... 5 Task 1: Access Your Hortonworks Virtual Sandbox ............................................. 5 Task 2: Create the MapReduce job ...................................................................... 7 Task 3: Import the input data in HDFS and Run MapReduce ............................. 10 Task 4: Examine the MapReduce job’s output on HDFS .................................... 12 Task 5: Tutorial Clean Up ................................................................................... 12
Image of page 2
Hortonworks, Inc. | 455 W. Maude Ave | Suite 200 | Sunnyvale, CA 94085 | Tel: (855) 8-HORTON | hortonworks.com Copyright © 2012 Hortonworks, Inc. All Rights Reserved 3 Introduction In this tutorial, you will execute a simple Hadoop MapReduce job. This MapReduce job takes a semi-structured log file as input, and generates an output file that contains the log level along with its frequency count. Our input data consists of a semi-structured log4j file in the following format: . . . . . . . . . . . 2012-02-03 20:26:41 SampleClass3 [TRACE] verbose detail for id 1527353937 java.lang.Exception: 2012-02-03 20:26:41 SampleClass9 [ERROR] incorrect format for id 324411615 at com.osa.mocklogger.MockLogger#2.run(MockLogger.java:83) 2012-02-03 20:26:41 SampleClass2 [TRACE] verbose detail for id 191364434 2012-02-03 20:26:41 SampleClass1 [DEBUG] detail for id 903114158 2012-02-03 20:26:41 SampleClass8 [TRACE] verbose detail for id 1331132178 2012-02-03 20:26:41 SampleClass8 [INFO] everything normal for id 1490351510 2012-02-03 20:32:47 SampleClass8 [TRACE] verbose detail for id 1700820764 2012-02-03 20:32:47 SampleClass2 [DEBUG] detail for id 364472047 2012-02-03 20:32:47 SampleClass7 [TRACE] verbose detail for id 1006511432 2012-02-03 20:32:47 SampleClass4 [TRACE] verbose detail for id 1252673849 2012-02-03 20:32:47 SampleClass0 [DEBUG] detail for id 881008264 2012-02-03 20:32:47 SampleClass0 [TRACE] verbose detail for id 1104034268 2012-02-03 20:32:47 SampleClass6 [TRACE] verbose detail for id 1527612691 java.lang.Exception: 2012-02-03 20:32:47 SampleClass7 [WARN] problem finding id 484546105 at com.osa.mocklogger.MockLogger#2.run(MockLogger.java:83) 2012-02-03 20:32:47 SampleClass0 [DEBUG] detail for id 2521054 2012-02-03 21:05:21 SampleClass6 [FATAL] system problem at id 1620503499 . . . . . . . . . . . . . . The output data will be put into a file showing the various log4j log levels along with its frequency occurrence in our input file. A sample of these metrics is displayed below: [TRACE] 8 [DEBUG] 4 [INFO] 1 [WARN] 1 [ERROR] 1 [FATAL] 1
Image of page 3
Hortonworks, Inc. | 455 W. Maude Ave | Suite 200 | Sunnyvale, CA 94085 | Tel: (855) 8-HORTON | hortonworks.com Copyright © 2012 Hortonworks, Inc. All Rights Reserved 4 This tutorial takes about 30 minutes to complete and is divided into the following five tasks: Task 1: Access Your Hortonworks Virtual Sandbox Task 2: Create The MapReduce job Task 3: Import the input data in HDFS and Run the MapReduce job Task 4: Analyze the MapReduce job’s output on HDFS Task 5: Tutorial Clean Up The visual representation of what you will accomplish in this tutorial is shown in the figure. The Use Case Generally, all applications save errors, exceptions and other coded issues in a log file so administrators can review the problems, or generate certain metrics from the log file data. These log files usually get quite large in size, containing a wealth of data that must be processed and mined.
Image of page 4
Image of page 5

You've reached the end of your free preview.

Want to read all 12 pages?

  • Summer '17

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask You can ask (will expire )
Answers in as fast as 15 minutes
A+ icon
Ask Expert Tutors