Google2009vs1999

Google2009vs1999 - Challenges in Building Large-Scale...

Info iconThis preview shows pages 1–14. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Challenges in Building Large-Scale Information Retrieval Systems Jeff Dean Google Fellow jeff@google.com Challenging blend of science and engineering Many interesting, unsolved problems Spans many areas of CS: architecture, distributed systems, algorithms, compression, information retrieval, machine learning, UI, etc. Scale far larger than most other systems Small teams can create systems used by hundreds of millions Why Work on Retrieval Systems? Must balance engineering tradeoffs between: number of documents indexed queries / sec index freshness/update rate query latency information kept about each document complexity/cost of scoring/retrieval algorithms Engineering difficulty roughly equal to the product of these parameters All of these affect overall performance, and performance per $ Retrieval System Dimensions # docs: ~70M to many billion queries processed/day: per doc info in index: update latency: months to minutes avg. query latency: <1s to <0.2s More machines * faster machines: 1999 vs. 2009 # docs: ~70M to many billion queries processed/day: per doc info in index: update latency: months to minutes avg. query latency: <1s to <0.2s More machines * faster machines: 1999 vs. 2009 ~100X # docs: ~70M to many billion queries processed/day: per doc info in index: update latency: months to minutes avg. query latency: <1s to <0.2s More machines * faster machines: 1999 vs. 2009 ~100X ~1000X # docs: ~70M to many billion queries processed/day: per doc info in index: update latency: months to minutes avg. query latency: <1s to <0.2s More machines * faster machines: 1999 vs. 2009 ~100X ~3X ~1000X # docs: ~70M to many billion queries processed/day: per doc info in index: update latency: months to minutes avg. query latency: <1s to <0.2s More machines * faster machines: 1999 vs. 2009 ~100X ~3X ~1000X ~10000X # docs: ~70M to many billion queries processed/day: per doc info in index: update latency: months to minutes avg. query latency: <1s to <0.2s More machines * faster machines: 1999 vs. 2009 ~100X ~3X ~1000X ~10000X ~5X # docs: ~70M to many billion queries processed/day: per doc info in index: update latency: months to minutes avg. query latency: <1s to <0.2s More machines * faster machines: 1999 vs. 2009 ~100X ~3X ~1000X ~10000X ~5X ~1000X Parameters change over time often by many orders of magnitude Right design at X may be very wrong at 10X or 100X ... design for ~10X growth, but plan to rewrite before ~100X Continuous evolution: 7 significant revisions in last 10 years often rolled out without users realizing weve made major changes Constant Change Evolution of Googles search systems several gens of crawling/indexing/serving systems brief description of supporting infrastructure Joint work with many, many people Interesting directions and challenges Rest of Talk Google Circa 1997 (google.stanford.edu)(google....
View Full Document

Page1 / 77

Google2009vs1999 - Challenges in Building Large-Scale...

This preview shows document pages 1 - 14. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online