This can help you understand where the bottlenecks in

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ption message about the output already existing - this is OK, this simply means that this step will be skipped and the existing output will be used instead. Use the reporter and the job tracker page to your advantage when running on the large dataset. This can help you understand where the bottlenecks in your code are. If your renderer runs out of memory try creating more rendering tasks (more reducers) or try rendering a smaller subset of the US. Also be careful how many features you emit at higher zoom levels - you do not need every road at high zoom levels and trying to add every road to a renderer will fill up your heap space in a hurry. Image Extractor Tool The output of the render step will be stored on the DFS in SequenceFiles. We have provided a class that will extract individual tiles to PNG images on the local disk (the machine from which the job is being run) with appropriate file names. Map Visualization Web Page (for testing) Also included is a small html file (viewer.html) that can display a 3x3 grid of tiles, can scroll in all 4 directions, and can zoom in and out. This html page relies on the tile images being located in the "tiles/<z>" directory relative to it. (For example to display tiles at zoom level 6 make sure that the tiles are located in "tiles/6/" relative to the viewer.html file. This viewer also relies on tiles being named in format specified by the image extractor tool, as well as on the static images in the 'html_img' directory. What To Turn In Turn in all your code for your complete pipeline. This must be capable of start-to-finish parsing the data sets, joining the appropriate records, and rendering the tiles. It must also generate the indices that allow you to look up the latitude and longitude coordinates for addresses, cities, and states. Add a text file containing your writeup. Add the tile image files at zoom levels 4 to 8 for the area centered at the University of Washington (include a reasonable area, but not too large - 10 or so tiles per zoom level should be enough). When you zip up your submission, put these in "tiles/4/", "tiles/5/", ..., "tiles/8/" You should run your full rendering pipeline on the cluster and leave your image files and geocode indices in your /user/username directory. You will need this data for assignment 4; you should not need to rerender it at that time. Appendix: Data Sets TIGER -- The 2006 complete US TIGER/LINE survey data. ------------------------------------------------------2006 TIGER/Line data home: http://www.census.gov/geo/www/tiger/tiger2006se/tgr2006se.html Complete TIGER/Line Technical Reference Document: http://www.census.gov/geo/www/ tiger/tiger2006se/TGR06SE.pdf In particular, see section 6 (Data Dictionary), which defines all the record types. Tiger CFCC's (Census Feature Class Codes) http://proximityone.com/tgrcfcc.htm BGN -- This ties place names to locations. ---------------------------------------USGS BGN Home: http://geonames.usgs.gov/ BGN File Format spec: http://geonames.usgs.gov/domestic/gaz_fileformat.htm BGN Feature Classes: http://geonames.usgs.gov/domestic/feature_class.htm...
View Full Document

Ask a homework question - tutors are online