HDFS HOW DOES IT WORK HDFS replicates each block 3 times 3X3X 3X3X Datanodes

Hdfs how does it work hdfs replicates each block 3

This preview shows page 16 - 29 out of 60 pages.

HDFSHOW DOES IT WORK? HDFS replicates each block 3times3X3X3X3XDatanodes 16
Image of page 16
HDFSHOW DOES IT WORK? 17
Image of page 17
HDFSHOW DOES IT WORK?
Image of page 18
HDFSHOW DOES IT WORK? 19
Image of page 19
HDFS HOW DOES IT WORK? Namenode keeps track of where the data resides Datanodes (DN-0, DN-1, DN-3, DN-7 ) (DN-0,DN-4, DN-8) (DN-2,DN-5,DN-6) ( DN-7 , DN-10, DN-11) Namenode Namenode tells the other data notes to copy to other live datanodes in order to maintain the 3x data rule 20 (DN-0) (DN-1) (DN-2) (DN-3) (DN-4) (DN-5) (DN-6) (DN-7) (DN-8) (DN-9) (DN-10) (DN-11)
Image of page 20
MAIN COMPONENTS OF HADOOP HDFS Storage MapReduce Computations 21
Image of page 21
MAPREDUCE PROGRAMMING PARADIGM . 22
Image of page 22
MAPREDUCE PROGRAMMING PARADIGM New way to think of your data representation Think of your data in terms of keys and values <key, value> 23
Image of page 23
MAPREDUCE PROGRAMMING PARADIGM Examples Log files data: <time stamp, access log entry> <key, value> Social Network Data: <user id, user profile> <key, value> <user id, list of friends> <key, value> Text Data: <byte offset, portion of text> 24
Image of page 24
MAPREDUCE HOW TO WRITE A MAPREDUCE PROGRAM? 1. Model your data in terms of keys and values 2. Write a Map function : takes a key and value and generates more keys and values 3. Write a Reduce function : takes keys and values generated by Mappers and produces other key and values. Draw an analogy to SQL, map can be visualized as group-by clause of an aggregate query. Map: <key1, val1> <key2, val2> Reduce: <key2, val2> →<key3, val3> 25
Image of page 25
MAPREDUCE HOW TO WRITE A MAPREDUCE PROGRAM? EXAMPLE 1 Word Count: count how many times each word is being repeated in a collection of documents stored in a certain number of datanodes (HDFS). Document#0: World Bank IFC MIGA Social protection poverty reduction.. Step 1: represent data as <key, value> Doc#0 <0, World Bank IFC MIGA Social protection poverty reduction> 26
Image of page 26
MAPREDUCE HOW TO WRITE A MAPREDUCE PROGRAM? EXAMPLE 1 Step two : create a mapper job words = value. split(“ “) ; For(every word in words) emits(word,1) <World,1> <Bank,1> <IFC,1> <MIGA,1> <social,1> <protection,1> <poverty,1> <reduction,1> data-node 27
Image of page 27
MAPREDUCE HOW TO WRITE A MAPREDUCE PROGRAM?
Image of page 28
Image of page 29

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture