Unformatted text preview: Atul Adya – Google John Dunagan – Microso7 Alec Wolman – Microso0 Research Incoming Request (from Device D2): Incoming Request (from Device D1): Problems: Tell me D1’s current IP addr store my current IP = A How to assign responsibility for … items to app servers? (parEEoning) Front‐end Front‐end Front‐end How to deal with addiEon, Web server Web server Web server removal, & crashes of app servers? How to avoid requests for the same Locate the App Server that Locate the App Server that stores the contact info for D1 stores the contact info for D1 item winding up at diﬀerent servers? (use leases) Read D1’s IP addr
Store D1’s IP addr How to adapt to load changes? ApplicaGon Server ApplicaGon Server (In‐Memory) (In‐Memory) … ApplicaGonServer (In‐Memory) 2 Targets class of services with these characterisEcs: InteracEve (needs low latency) ▪ App servers operate on in‐memory state ApplicaEon Eer operates on cached data: the truth is hosted on clients or back‐end storage Services use many small objects Even the most popular object can be handled by one server ▪ ReplicaEon not needed to handle load 3 Prior systems implement leasing and parEEoning separately We show that integraEng leasing and parEEoning allows scaling to massive numbers of objects This integraEon requires us to rethink the mechanisms and API for leasing ▪ Manager‐directed leasing ▪ Non‐tradiEonal API where clients cannot request leases 4 Centrifuge design Centrifuge internals Results from live deployment 5 Front‐end Lookup Library Front‐end Lookup Library Lookups: Front‐End Web Servers … Front‐end Lookup Library Centrifuge Manager Service Owners: Middle Tier ApplicaGon Servers Owner Library Owner Library In‐Memory Server In‐Memory Server … Owner Library In‐Memory Server 6 Need to issue leases for very large # of objects Lease per object will lead to prohibiEve overhead Centrifuge manager hands out leases on ranges Use consistent hashing to parEEon a ﬂat namespace Centrifuge Assign leases on conEguous ranges of the hashed Manager Service Lease: 0‐50,100‐200 namespace Owner Library In‐Memory Server One lease (one range) per virtual node (64 per server) Single mechanism: manager‐directed leasing handles both leasing and parEEoning 7 Lookup API URL Lookup(Key key) void LossNoEﬁcaEonUpcall(KeyRange lost) Owner API bool CheckLeaseNow(Key key, out LeaseNum leaseNum) bool CheckLeaseConEnuous(Key key, LeaseNum leaseNum) Incoming Request: Find Device “D” … Front‐end Lookup Library Front‐end Front‐end 1.CheckLeaseNow(“D”) ‐> handle Lookup Library Lookup Library 2.Perform applicaGon operaGon: ﬁnd D’s current IP addr Lookup(“D”) ‐> “hXp://m6/” 3.CheckLeaseConGnuous(“D”, handle) Owner Library Server “m1” Owner Library Server “m2” … Owner Library Server “m6” 8 Servers in datacenter environment are stable Beneﬁts Much cheaper to avoid holding mulEple copies in RAM Avoids complexity/performance issues of quorum protocols Doesn’t add extra complexity: ▪ Need a mechanism to tolerate correlated failures anyway (e.g. security vulnerabiliEes, patch installaEon) Cost When an applicaEon server crashes, items are not available unEl clients republish 9 When applicaEon server crashes, Lookups receive Loss NoEﬁcaEons Indicates which ranges are lost Allows the applicaEon to determine which clients should republish their state Live Mesh services use this model Rely on clients to recover state 10 ParEEoning Manager spreads namespace across Owners by assigning leases Consistency Leases ensure single‐copy guarantee: at any Eme t, for any key at most one Owner node Recovery Loss noEﬁcaEons enable app developer to detect and recover from Owner crashes Membership Owners indicate liveness by requesEng leases Load Balancing Manager rebalances namespace based on reported load 11 Centrifuge design Centrifuge internals Results from live deployment 12 Cached Lease Table Current LSN:2 … Lookup “I am at LSN 2.” Manager Lease Table Current LSN:4 [0‐1:Owner=A] [1‐2:Owner=B] [2‐9:Owner=C] Change Log … “Here are changes LSN 2‐>4” Incremental protocol to synchronize Lookup and Manager lease tables Lookups are fast: no need to contact Manager and incur delay Manager load not dependent on incoming request load to Lookups 13 Owner Manager “Request Leases” “Leases granted/recalled” Robustness: Owners have mulEple opportuniEes to retain their leases: Leases requested every 15 seconds Leases last 60 seconds Takes 3 consecuEve lost/delayed requests to lose the lease Safety: owner never thinks it has the lease when the manager disagrees Similar to previous lease servers, rely on clock rate synchronizaEon 14 Manager Service Standby “Can I have the leader lease?” Standby “No.” “Renew leader lease and commit state update.” Leader Lookups and Owners Paxos Group “Yes.” Leader and Standbys 15 Centrifuge designed to run in a single datacenter Scalability target: ~1000 machines in 1 cluster Beyond there, scale by deploying mulEple clusters 16 Centrifuge design Centrifuge internals Results from live deployment 17 First deployed in April 2008 Results cover 2.5 months: Dec ’08 – Mar ‘09 1000 Lookups, 130 Owners Manager = 8 servers 18 Is the Centrifuge manager a scalability borleneck in steady‐state? How well does Centrifuge handle high‐churn events? How stable are producEon servers? 19 20 21 22 From 1/15/09 through 3/2/09, no patch installaEons How stable were the owners during this period? Servers are very stable: only 10 lease‐loss events 7 cases, servers recovered < 10 minutes 3 cases, servers recovered < 1 hour 23 Centrifuge simpliﬁes building scalable applicaEon Eers with in‐memory state Combining leasing and parEEoning leads to a simple and powerful protocol Deployed within Live Mesh since April 2008, in use by 5 diﬀerent Live Mesh Services Data center server stability enables the single copy in RAM w/loss noEﬁcaEons 24 ...
View Full Document
This note was uploaded on 12/08/2011 for the course CS 525 taught by Professor Gupta during the Spring '08 term at University of Illinois, Urbana Champaign.
- Spring '08