Comparing MapReduce and Databases
MapReduce versus relation database:
We saw that parallel query processing is largely the same.
Mentioning declarative query languages these appear in Pig and especially
Schema in a relational database is the str
Large scale data processing systems other than MapReduce:
First we will discuss the design space of possibilities here.
The space is divide into three axes low latency i.e., quick turnaround time
versus things that maximize throughpu
MapReduce Text Examples
Another example where we need histogram on word usage, we're going to
group things based on the length of the word.
Here we might group words into big words, medium, small words, and tiny
words. The big words are everything that
MapReduce Relational Join
How to implement a join operation from the relation algebra in Map
Find every record in one relation that corresponds to a record in the other
relation where SSN = EmpSSN. So we have the result as shown in the
What does the term scalability mean? This can be viewed in two
perspectives, one is operational and the other is algorithmically.
Operationally and in the past one way to think about this was look it needs
to work on data that doesn't
Referring to the previous example how many map tasks do we already have
and how many reduce tasks are we going to have?
The map tasks are one per document. Number of invocations of the Reduce
function will be the number of groups
MapReduce Implementation Overview
What kind of systems MapReduce is deployed on.
There are three types of architectures to:
o shared memory
o shared disc and
o shared nothing
In these diagrams cylinders are the discs, rectangles are the memory and
MapReduce Matrix Multiply Example
A simple matrix multiplication algorithm in map reduce.
Applying Map reduce
Matrix multiplied is a binary relation and so we need to lump them all
together. We need to tag them with the source.
In the map phase, for e
MapReduce is key value pairs so the input is going to be a big set of key
value pairs. The Map function is going to operate on one of these key value
In this case, the key is the document name and the value is the document
Experimental Results: MR and DB
The first experiment conducted was:
This is much like the DNA sequence search task that we described as a
motivating example for sort of describing scalability.
What were the results just to load this data in?
As you se
Parallel Processing Patterns
Another example which could be done in a similar fashion as the read
trimming problem was convert a bunch of TIFF images into a different
So there should be a pat