cs1_databases

cs1_databases - (C) Junghoo...

Info iconThis preview shows pages 1–11. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: (C) Junghoo "John" Cho, UCLA 1 Searching the Web Junghoo John Cho UCLA Computer Science (C) Junghoo "John" Cho, UCLA 2 Outline History and today Hardware infrastructure and challenges Search paradigm and challenges Search-engine architecture Fighting against spam (C) Junghoo "John" Cho, UCLA 3 History of Web Search 1992: The Web was created 1994-1996: First search engines Based on Information Retrieval Based on statistical analysis of text Infoseek, Lycos, Altavista, Excite, Inktomi 1996: Google prototype Link analysis and anchor text Significant improvement in result quality Today (C) Junghoo "John" Cho, UCLA 4 Top 3 companies has >85% market share (as of Sep 2007) A few companies dominate the market (C) Junghoo "John" Cho, UCLA 5 Scale of Web Search Q: How do we handle such a scale? Whats behind the simple interface? A: Lots and lots of machines! Billions of pages Billions of queries per day Hardware Infrastructure (C) Junghoo "John" Cho, UCLA 6 Warehouse-scale data center Hardware Infrastructure (C) Junghoo "John" Cho, UCLA 7 Hundreds of thousands of machines Challenges? How to manage 100,000 machines with minimal efforts? Power and heat problem? Replacing broken machines? How to deal with machine failures? With 99.9% availability, 100 machines are always down. Reliable service from unreliable machines? How to write massively parallel programs? And many more (C) Junghoo "John" Cho, UCLA 8 (C) Junghoo "John" Cho, UCLA 9 Outline History and today Hardware infrastructure and challenges Search paradigm and challenges Search-engine architecture Fighting against spam (C) Junghoo "John" Cho, UCLA 10 Search Paradigm Simple search box Very short queries from users (2-4 words) (C) Junghoo "John" Cho, UCLA...
View Full Document

Page1 / 32

cs1_databases - (C) Junghoo...

This preview shows document pages 1 - 11. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online