Hadoop-MapReduce.pdf - Hadoop MapReduce INF 551 Wensheng Wu 1 Hadoop A large-scale distributed batch-processing infrastructure Large-scale Handle a

Hadoop-MapReduce.pdf - Hadoop MapReduce INF 551 Wensheng Wu...

This preview shows page 1 - 19 out of 83 pages.

Hadoop MapReduce INF 551 Wensheng Wu 1
Image of page 1
Hadoop A large-scale distributed batch-processing infrastructure Large-scale: Handle a large amount of data and computation Distributed: Distribute data & work across a number of machines Batch processing Process a series of jobs without human intervention 2
Image of page 2
Mem Disk CPU Mem Disk CPU Switch Each rack contains 16-64 nodes In 2011 it was guestimated that Google had 1M machines, Mem Disk CPU Mem Disk CPU Switch Switch 1 Gbps between any pair of nodes in a rack 2-10 Gbps backbone between racks 3
Image of page 3