This preview shows page 1. Sign up to view the full content.
Unformatted text preview: ption message about the output already existing - this is OK, this simply
means that this step will be skipped and the existing output will be used instead.
Use the reporter and the job tracker page to your advantage when running on
the large dataset. This can help you understand where the bottlenecks in your
If your renderer runs out of memory try creating more rendering tasks (more
reducers) or try rendering a smaller subset of the US. Also be careful how many
features you emit at higher zoom levels - you do not need every road at high
zoom levels and trying to add every road to a renderer will fill up your heap
space in a hurry. Image Extractor Tool
The output of the render step will be stored on the DFS in SequenceFiles. We have
provided a class that will extract individual tiles to PNG images on the local disk (the
machine from which the job is being run) with appropriate file names. Map Visualization Web Page (for testing)
Also included is a small html file (viewer.html) that can display a 3x3 grid of tiles, can
scroll in all 4 directions, and can zoom in and out. This html page relies on the tile
images being located in the "tiles/<z>" directory relative to it. (For example to display
tiles at zoom level 6 make sure that the tiles are located in "tiles/6/" relative to the
viewer.html file. This viewer also relies on tiles being named in format specified by the
image extractor tool, as well as on the static images in the 'html_img' directory. What To Turn In
Turn in all your code for your complete pipeline. This must be capable of start-to-finish
parsing the data sets, joining the appropriate records, and rendering the tiles. It must
also generate the indices that allow you to look up the latitude and longitude coordinates
for addresses, cities, and states.
Add a text file containing your writeup.
Add the tile image files at zoom levels 4 to 8 for the area centered at the University of
Washington (include a reasonable area, but not too large - 10 or so tiles per zoom level
should be enough). When you zip up your submission, put these in "tiles/4/", "tiles/5/",
You should run your full rendering pipeline on the cluster and leave your image files and
geocode indices in your /user/username directory. You will need this data for assignment
4; you should not need to rerender it at that time. Appendix: Data Sets
TIGER -- The 2006 complete US TIGER/LINE survey data.
------------------------------------------------------2006 TIGER/Line data home:
Complete TIGER/Line Technical Reference Document: http://www.census.gov/geo/www/
In particular, see section 6 (Data Dictionary), which defines all the record types.
Tiger CFCC's (Census Feature Class Codes)
BGN -- This ties place names to locations.
---------------------------------------USGS BGN Home:
BGN File Format spec:
BGN Feature Classes:
View Full Document
- Fall '08