Problem Set 5: Simplified MapReduce
April 14, 2009 - 11:59PM
Don't change the mli files!
March 30: We have changed the build system for workers. First,
no longer runs
asynchronously, and instead blocks until the program terminates. Mappers/Reducers no longer obtain their
, and instead use
Similarly, when the mapper/reducer has
finished constructing its list of outputs, it now "returns" them using
Lastly, shared data is now accessed using
. We have updated the mapper
and reducer for word count.
Windows imposes a limit on the number of connections a socket can accept at too low a number for this
assignment. Additionally, sockets are kept alive too long after being closed. Therefore, we have included
two registry fixes for this, that you can add by simply running
. You should backup your
registry before doing this.
Obtaining a channel from a socket using
functions differently in Windows
when compared to Linux/OS X. With Linux/OS X, when you close the channel, the socket is closed as well.
However, when you close the channel in Windows, you must still close the socket. Therefore, whenever you
close a channel, but still want to use the socket (for example, in
) you should
add the following code:
let inp = Unix.in_channel_of_descr socket in
if Sys.os_type = "Win32" then close_in_noerr inp else ()
A new release has been posted to CMS. If you have already started work, then please just copy the files
over your current working directory, with the exception of the files that you have modified
(master/controller.ml, worker/worker.ml, master/master.ml, shared/hashtable.ml). Then, for
worker/worker.ml, copy over the new
variable. Lastly, if worker builds fail and you are
running Linux/OS X, you may need to update the variable
to your OCaml lib directory.
Part 1: Simplified MapReduce (45 points)
In this problem set, you will be implementing a simplified version of Google's MapReduce. Prior to starting
this problem set, you should read the short
to familiarize yourself with the basic MapReduce
architecture. This writeup will assume that you have read sections 1 - 3.1 of the whitepaper.