ps5 - Problem Set 5: Simplified MapReduce Due April 14,...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Problem Set 5: Simplified MapReduce Due April 14, 2009 - 11:59PM Note: Don't change the mli files! Updates/Corrections March 30: We have changed the build system for workers. First, Program.run no longer runs asynchronously, and instead blocks until the program terminates. Mappers/Reducers no longer obtain their input using stdin , and instead use Program.get_input() Similarly, when the mapper/reducer has finished constructing its list of outputs, it now "returns" them using Program.set_output results . Lastly, shared data is now accessed using Program.get_shared_data() . We have updated the mapper and reducer for word count. Windows imposes a limit on the number of connections a socket can accept at too low a number for this assignment. Additionally, sockets are kept alive too long after being closed. Therefore, we have included two registry fixes for this, that you can add by simply running socket_fixes.reg . You should backup your registry before doing this. Obtaining a channel from a socket using Unix.in_channel_of_descr functions differently in Windows when compared to Linux/OS X. With Linux/OS X, when you close the channel, the socket is closed as well. However, when you close the channel in Windows, you must still close the socket. Therefore, whenever you close a channel, but still want to use the socket (for example, in Worker.handle_request ) you should add the following code: let inp = Unix.in_channel_of_descr socket in ... if Sys.os_type = "Win32" then close_in_noerr inp else () A new release has been posted to CMS. If you have already started work, then please just copy the files over your current working directory, with the exception of the files that you have modified (master/controller.ml, worker/worker.ml, master/master.ml, shared/hashtable.ml). Then, for worker/worker.ml, copy over the new default_includes variable. Lastly, if worker builds fail and you are running Linux/OS X, you may need to update the variable ocaml_lib_dir in worker/program.ml to point to your OCaml lib directory. Part 1: Simplified MapReduce (45 points) Overview In this problem set, you will be implementing a simplified version of Google's MapReduce. Prior to starting this problem set, you should read the short whitepaper to familiarize yourself with the basic MapReduce architecture. This writeup will assume that you have read sections 1 - 3.1 of the whitepaper.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Map and Reduce Functions The map and reduce functions can be defined by the following OCaml syntax: val map : 'a -> 'b -> ('c * 'd) list val reduce : 'c -> 'd list -> 'e list However, note that in the messaging protocol defined in shared/protocol.ml we only allow for the transmission of strings. Therefore, you must utilize OCaml's built-in marshalling and unmarshalling to transmit values of different types. (See below for more thorough explanation). For a given MapReduce application
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 12

ps5 - Problem Set 5: Simplified MapReduce Due April 14,...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online