Obviously this optional step is to be done only when

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: geocode index pipeline on King County, locally. • Think about any extensions you might want to implement (fancier tile generation, polar latitude/longitude coordinates, other TIGER record types, other data sets, etc). Obviously this optional step is to be done only when you have finished everything else. • Run the entire pipeline on the cluster for a larger area than just king county at an intermediate zoom level (ie 6 or so) Final Misc Information and Hints • • • • • • • • • • • The full TIGER dataset is in /shared/tiger The TIGER dataset for King County is in /shared/tiger-king -- do testing on this The BGN dataset is in /shared/allstates The population dataset is in /shared/population The BGN and population datasets are small; there are no test-only versions of these If you are excluding certain features at a particular zoom level, do your filtering in the mapper, so that you don't waste bandwidth (and time). Be aware of the way the number of mappers and reducers at a particular stage affects performance. Zoom level has an exponential effect on the number of tiles to render (work to do). For testing, make sure you clamp down your mappable range to a very small area to test how you render close-zoom tiles, or else you'll be doing a lot of work (you should pick a range of say 4--12 tiles and render that when tweaking your renderer; don't render several hundred tiles each time until you are sure your renderer works well). Start your experimentation at around zoom level 6. When re-running the mapreduce pipeline, you do not need to re-run every pass every time. For example, it is fine to generate the filtered and joined data once (when you have gotten the filtering and joining right) and run only the render step when developing/testing the renderer multiple times. Note that when you run the full pipeline and some intermediate steps have already been computed (eg TIGER data has already been filtered), that step in the pipeline will fail with an exce...
View Full Document

This homework help was uploaded on 04/02/2014 for the course CSE 490 taught by Professor Staff during the Fall '08 term at University of Washington.

Ask a homework question - tutors are online