Unformatted Document Excerpt
Coursehero >>
New York >>
Cornell >>
LAW 615
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Exploiting Beehive: Power Law Query Distributions for O(1) Lookup Performance in Peer to Peer Overlays
Venugopalan Ramasubramanian and Emin G n Sirer u
Abstract
Structured peer-to-peer hash tables provide decentralization, self-organization, failure-resilience, and good worst-case lookup performance for applications, but suffer from high latencies (O( )) in the average case. Such high latencies prohibit them from being used in many relevant, demanding applications such as DNS. In this paper, we present a proactive replication framework that can achieve O( ) lookup performance for common Zipf-like query distributions. This framework is based around a closed-form solution that achieves O( ) lookup performance with low storage requirements, bandwidth overhead and network load. Simulations show that this replication framework can realistically achieve good latencies, outperform passive caching, and adapt efciently to sudden changes in object popularity, also known as ash crowds. This framework provides a feasible substrate for high-performance, low-latency applications, such as peer-to-peer domain name service.
for query distributions based on a power law, proactive (modeldriven) replication can enable a DHT system to achieve a small constant lookup latency on average. In contrast, we show that common techniques for passive (demand-driven) replication, such as caching objects along a lookup path, fail to make a signicant impact on the average-case behavior of the system. We outline the design of a replication framework, called Beehive, with the following three goals: High Performance: Enable O(1) average-case lookup performance, effectively decoupling the performance of peer-to-peer DHT systems from the size of the network. Provide O(log N) worst-case lookup performance.
1 Introduction
Peer-to-peer distributed hash tables (DHTs) have recently emerged as a building-block for distributed applications. Unstructured DHTs, such as Freenet and the Gnutella network [5, 1], offer decentralization and simplicity of system construction, but may take up to O(N) hops to perform lookups in networks of N nodes. Structured DHTs, such as Chord, Pastry, Tapestry and others [24, 22, 26, 21, 18, 17, 14], are particularly wellsuited for large scale distributed applications because they are self-organizing, resilient against denial-of-service attacks, and provide O(log N) lookup performance in both the worst- and the average case. However, for large-scale, high-performance, latency-sensitive applications, such as the domain name service (DNS) and the world wide web, this logarithmic performance bound translates into high latencies. Previous work on serving DNS using a peer-to-peer lookup service concluded that high average-case lookup costs render current structured DHTs unsuitable for latency-sensitive applications, such as DNS [8]. In this paper, we describe how proactive replication can be used to achieve O(1) lookup performance efciently on top of a standard O(log N) peer-to-peer distributed hash table for certain, commonly-encountered query distributions. It is wellknown that the query distributions of several popular applications, including DNS and the web, follow a power law distribution [15, 2]. Such a well-characterized query distribution presents an opportunity to optimize the system according to the expected query stream. The critical insight in this paper is that, 1
Beehive achieves these goals through efcient proactive replication. By proactive replication, we mean actively propagating copies of objects among the nodes participating in the network. There is a fundamental tradeoff between replication and resource consumption: more copies of an object will generally improve lookup performance at the cost of space, bandwidth and aggregate network load. In the limit, proactively copying all objects in the DHT to all nodes would enable every query to be satised in constant time. However, this would not scale to large systems since it would require prohibitive amounts of space on each node, the network would be overloaded during replica creation, and changes to mutable objects would require O(N) updates. In contrast, Beehive performs this tradeoff through an analytical model that provides a closedform, optimal solution that achieves O(1) lookup performance for power law query distributions while minimizing the number of object copies, and hence reducing storage, bandwidth and load, in the network. Beehive relies on cheap, local measurements and efcient lease-based protocols for replica coordination. Each node in Beehive continually performs local measurements to determine the relative popularity of the objects in the system, as well as
High Scalability: Minimize the background trafc in the network to reduce aggregate network load and per-node bandwidth consumption. Ensure that the amount of memory and/or disk space required of each peer in the network is kept to a minimum. High Adaptivity: Promptly adjust the performance of the system in response to changes in the aggregate popularity distribution of objects. Further, cheaply track and maintain the popularity of individual objects in the system to quickly respond when a certain object becomes highly popular, as with ash crowds and the slashdot effect.
Strictly speaking, the nodes encountered towards the end of the query routing process in a sparsely populated DHT may not share progressively more pre-
2
to estimate global properties of the aggregate query distribution function. Beehive nodes decide how many replicas of each object should be propagated by combining the closed-form solutions from the analytical model with their measurements of the aggregate query distribution function and estimates of object rank. This estimation is performed independently and periodically at each node, while a replica management protocol efciently propagates or removes cached objects without excessive messaging, global synchronization or agreement. Objects in Beehive may be modied dynamically. In general, mutable objects pose cache-coherency problems for any replication technique, as older, out-of-date copies of an object may remain cached throughout a system and keep clients from accessing more recent versions. To provide up to date views in the presence of updates, a system needs to track all replicas of an object and either invalidate old copies or propagate the changes when the object is modied. In Beehive, the structured nature of the underlying DHT allows the system to keep track of the placement of all replicas with a single integer. This enables Beehive to efciently nd and update all replicas when an object is modied. Consequently, objects may be updated at any time in Beehive, and lookups performed after an update has completed will return the latest copy of the object. While this paper describes the Beehive proactive replication framework in its general form, we use the domain name system as a target application, perform our evaluation with DNS data, and demonstrate that serving DNS lookups with a peerto-peer distributed hash table is feasible. Several shortcomings of the current, hierarchical structure of DNS makes it an ideal application candidate for Beehive. First, DNS is highly latencysensitive, and poses a signicant challenge to serve efciently. Second, the hierarchical organization of DNS leads to a disproportionate amount of load being placed at the higher levels of the hierarchy. Third, the higher nodes in the DNS hierarchy serve as easy targets for distributed denial-of-service attacks and form a security vulnerability for the entire system. Finally, nameservers required for the internal leaves of the DNS hierarchy incur expensive administrative costs, as they need to be manually administered, secure and constantly online. Peer-topeer DHTs address all but the rst critical problem; we show in this paper that Beehives replication strategy can address the rst. We have implemented a prototype Beehive-based DNS server layered on top of the Pastry peer-to-peer hash table [22]. Our prototype implementation is compatible with current client resolver libraries deployed around the Internet. We envision that the DNS nameservers that are currently used to serve small, dedicated portions of the naming hierarchy would form a Beehive network and collectively manage the entire namespace. Our implementation supports the existing naming scheme by falling back on legacy DNS when Beehive-DNS lookups fail. Unlike legacy DNS, which relies on cache timeouts for loose coherency and incurs ongoing cache expiration and rell overheads, Beehive-DNS enables resource records to be updated at any time. While we use DNS as a guiding application for evaluating our system, we note that a full treatment of the implementation of an alternative peer-to-peer DNS system is beyond the
scope of this paper, and focus instead on the general-purpose Beehive framework for proactive replication. The framework is sufciently general to achieve O(1) lookup performance in other settings, including web caching, where the aggregate query distribution follows a power law, similar to DNS. Overall, this paper describes the design of a replication framework that enables O(1) lookup performance in structured DHTs for common query distributions, applies it to a P2P DNS implementation, and makes the following contributions. First, it proposes proactive replication of objects and provides a closed form analytical solution for the number of replicas needed to achieve constant-time lookup performance with low costs. The storage, bandwidth and load placed on the network by this scheme are modest. In contrast, we show that simple caching strategies based on passive replication incur large ongoing costs. Second, it outlines the design of a complete system based around this analytical model. This system is layered on top of Pastry, an existing peer-to-peer substrate. It includes techniques for estimating the requisite inputs for the analytical model, mechanisms for replica propagation and deletion, and a strategy for mapping between the continuous solution in the analytical model and the discrete implementation in Pastry. Finally, it presents results from a prototype implementation of a peer-to-peer DNS service to show that the system achieves good performance, has low overhead, and can adapt quickly to ash crowds. In turn, these approaches enable the benets of P2P systems, such as self-organization and resilience against denial of service attacks, to be applied to latency-sensitive applications, such as DNS. The rest of this paper is organized as follows. Section 2 provides a broad overview of our approach and describes the storage and bandwidth-efcient replication components of Beehive in detail. Section 3 describes our implementation of Beehive over Pastry. Section 4 presents the results and expected benets of using Beehive to serve DNS queries. Section 5 surveys different DHT systems and summarizes other approaches to caching and replication in peer-to-peer systems Section 6 describes future work and Section 7 summarizes our contributions.
2 The Beehive System
Beehive is a general replication framework that can be applied to structured DHTs based on prex-routing [19], such as Chord, Pastry, Tapestry, and Kademlia. These DHTs operate in the following manner. Each node has a unique randomly assigned identier in a circular identier space. Each object also has a unique randomly selected identier assigned from the same space and is stored at the closest node, called the home node. Routing is performed by successively matching a prex of the object identier against node identiers. Generally, each step in the query processing takes the query to a node that has one more matching prex than the previous node. A query traveling hops reaches a node that has matching prexes . Since
L0
T S R 2012 Q P O
L1
A 0021 B C 0112 D 0122 E L2 F G H
I
lookup (0124)
L1
J N M K L
home node L0 other nodes
Figure 1: This gure illustrates the levels of replication in Beehive. A query for object 0124 takes 3 hops from node Q to node E, the home node of the object. By replicating the object at level 2, that is at D and F, the query latency can be reduced to 2 hops. In general, an object replicated at level incurs at the most hops for a lookup. the search space is reduced exponentially, this query routing approach provides O( ) lookup performance on average, where is the number of nodes in the DHT and is the base, or fanout, used in the system. The central observation behind Beehive is that the length of the average query path will be reduced by one hop when an object is proactively replicated at all nodes logically preceding that node on all query paths. We can apply this iteratively to disseminate objects widely throughout the system. Replicating an object at all nodes hops or lesser from the home node will reduce the lookup latency by hops. The Beehive replication mechanism is a general extension of this observation to nd the appropriate amount of replication for each object based on its popularity. Beehive strives to create the minimal number of replicas such that the expected number of nodes traversed during a query will match a targeted constant, . It uses an analytical model to derive the number of replicas required to achieve O(1) lookup performance while minimizing per node storage, bandwidth requirements and network load. We note, however, that the model is driven by estimates of object popularity and, in a real implementation like the one we describe, may deviate from the optimal due to sampling errors. Beehive controls the extent of replication in the system by assigning a replication level to each object. An object at level is replicated on all nodes that have at least matching prexes with the object. Queries to objects replicated at level incur a lookup latency of at most hops. Objects stored only at their home nodes are at level , while objects replicated at level are cached at all the nodes in the system. Figure 1 illustrates
xes with the object, but remain numerically close. This detail does not significantly impact either the time complexity of standard DHTs or our replication algorithm. Section 3 discusses the issue in more detail.
)
the concept of replication levels. The goal of Beehives replication strategy is to nd the minimal replication level for each object such that the average lookup performance for the system is a constant number of hops. Naturally, the optimal strategy involves replicating more popular objects at lower levels (on more nodes) and less popular objects at higher levels. By judiciously choosing the replication level for each object, we can achieve constant lookup time with minimal storage and bandwidth overhead. Beehive employs several mechanisms and protocols to nd and maintain appropriate levels of replication for its objects. First, an analytical model provides Beehive with closed form optimal solutions indicating the appropriate levels of replication for each object. Second, a monitoring protocol based on local measurements and limited aggregation estimates relative object popularity, and the global properties of the query distribution. These estimates are used, independently and in a distributed fashion, as inputs to the analytical model which yields the locally desired level of replication for each object. Finally, a replication protocol proactively makes copies of the desired objects around the network. The rest of this section describes each of these components in detail.
2.1 Analytical Model
In this section, we provide a model that analyzes Zipf-like query distributions and provides closed form optimal replication levels for the objects in order to achieve constant average lookup performance with low storage and bandwidth overhead. In Zipf-like, or power law, query distributions, the number of queries to the most popular object is proportional to , where is the parameter of the distribution. The query distribution has a heavier tail for smaller values of the parameter . A Zipf distribution with parameter corresponds to a uniform distribution. The total number of queries to the most popular objects, , is approximately for , and for . Using the above estimate for the number of queries received by objects, we pose an optimization problem to minimize the total number of replicas with the constraint that the average lookup latency is a constant . the numLet be the base of the underlying DHT system, ber of objects, and the number of nodes in the system. Initially, all the objects in the system are stored only at their home nodes, that is, they are replicated at level . Let denote the fraction of objects replicated at level or lower. From this denition, is , since all objects are replicated at level . most popular objects are replicated at all the nodes in the system. Each object replicated at level is cached in nodes. objects are replicated on nodes that have exactly matching prexes. Therefore, the average number of objects replicated at each node is given by . Simplifying this expression, the average per node storage requirement for replication is:
dttt vud &$ '% ( U D E( F 6 rq s%i $ phf7cT ig e dX Y ` S F &$ C! @ 86 B$ A975 S W T F ( " #! 3 4) X AT 1 3 H) 1 20 b$ S S UT r 6 8 Q R y y i $ xwe y ig S ( I3 PH) a %U T 1 G0 S
3
U VT
(6)
3
where
Using the Lagrange multiplier technique to solve this optimization problem, we get the following closed form solution:
g fWm 3g a h g Um
(7) (8)
4
(
, that is,
6 8
6 sy t t g t 8 i r r f '$ i RRui y f fW f
We can derive the value of
r r B$ fW T
by satisfying the condition that .
@ @8 A96
F
where
g
2.2 Popularity and Zipf-Parameter Estimation
The analytical model described in the previous section requires the knowledge of the parameter of the query distribution and
g h
Minimize
&$ ' X T
, such that
g ag
(5)
This analytical solution has three properties that are useful for guiding the extent of proactive replication. First, the analytical model provides a solution to achieve any desired constant lookup performance. The system can be tailored, and the amount of overall replication controlled, for any level of performance by adjusting C over a continuous range. Since structured DHTs preferentially keep physically nearby hosts in their toplevel routing tables, and since they consequently pay the highest per-hop latency costs as they get closer to the home node, selecting even a large target value for C can dramatically speed up end-to-end query latencies [4]. Second, for a large class of query distributions (( ), the solution provided by this model achieves the optimal number of object replicas required to provide the desired performance. Minimizing the number of replicas reduces per-node storage requirements, bandwidth consumption and aggregate load on the network. Finally, serves as an upper bound for the worst case lookup time for any successful query, since all objects are replicated at least in level . We make two assumptions in the analytical model: all objects incur similar costs for replication, and objects do not change very frequently. For applications such as DNS, which have essentially homogeneous object sizes and whose updatedriven trafc is a very small fraction of the replication overhead, the analytical model provides an efcient solution. Applying the Beehive approach to applications such as the web, which has a wide range of object sizes and frequent object updates, may require an extension of the model to incorporate size and update frequency information for each object.
g h
b$
f k W T
given by
n s
g
Note that the second constraint effectively reduces to , since any optimal solution to the problem with just constraint 3 would satisfy . We can use the Lagrange multiplier technique to nd an analytical closed-form optimal solution to the above problem with just constraint 3, since it denes a convex feasible space. However, the resulting solution may not guarantee the second constraint . If the obtained solution violates the second constraint, we can force to 1 and apply the Lagrange multiplier technique to the modied problem. We can obtain the optimal solution by repeating this process iteratively until the second constraint is satised. However, the symmetric property of the rst constraint facilitates an easier analytical approach to solve the optimization problem without iterations. Assume that in the optimal solution to the problem, , for some , and . Then we can restate the optimization problem as follows:
F B$ X cT if xW T WT F f %W T B$ WT 1 ! F a g b$ & $f ' W &$ ' T dttt j%d T jd B$ b$ f W dttt d dXT fW T T tt %t g h T b$ W T X AT b$ e f W T B$ F tt %t W T WT F t t %t T
o
g
n
d
B$
&$ '
qp
W T
o
S
g
m
tt %t
n @ 8 A96 6
n s
F
T
kU T b$
and
(4)
g
p
F kU T 68 y f S y c8
f
(3)
F kU T 3
U
F
(
&$ C!
S
a
1
a
dttt d
XT b$ &$ C! W
T dttt w%4d
dt%ttd 1
dX cT
Minimize
&$ C! T wd &$ ' X T
, such that
(2)
(9) (10)
}
F
| z
F
T
(
w
w
F
XT
(
~ cv
w
F
F
{ cT
w
)
9F
(
y
3 H)
zx C
The fraction of queries, , that arrive for the most popular objects is approximately . The number of objects that are replicated at level is . Therefore, the number of queries that travel hops is . The average lookup latency of the entire system can be given by . The constraint on the average latency is , where is the required constant lookup performance. After substituting the approximation for and simplifying, we arrive at following optimization problem.
a U VT S 8 b$ @A6 e 8 b$ @A96 x2pg r i e 3 !3 B$ b$ b$ W WT UT 3 B$ S1 G0 U VT 3U T S1 G0 a 3U T B$ 3 3 3 4) S1 G0 1 G0 UT a T S1 01 Gs 3U VT S1 G0 S1 G0 a pU W H 3U T U VT f S1 01 Gs S b$ U T xU W S
x '
g h
1 G0
(1)
, As an example, consider a DHT with base , nodes, and objects. Applying this analytical method to achieve an average lookup time, , of hop, we obtain and the solution: , , and . Thus, the most popular objects would be replicated at level , the next most popular objects would be replicated at level , and all the remaining objects at level . The average per node storage requirement of this system would be objects. The optimal solution obtained by this model applies only to . For , the closed-form solution will the case yield a level of replication that will achieve the target lookup performance, but the amount of replication may not be optimal, because the feasible space is no longer convex. For , we can obtain the optimal solution by using the approximation and applying the same technique. The optimal solution for the case is as follows:
F ( w v
y ' x
cW
B$ d b3 B$ W WT
dttt %d
T d
XT 1
3 m a
d
1 !S F kU lT
2.3 Replication Protocol
Beehive requires a protocol to replicate objects at the levels of computed by the analytical model. In order to be deployable in wide area networks, the replication protocol should be asynchronous and not require expensive mechanisms such as distributed consensus or agreement. In this section, we develop an efcient protocol that enables Beehive to replicate objects across a DHT. Beehives replication protocol uses an asynchronous and distributed algorithm to implement the optimal solution provided by the analytical model. Each node is responsible for replicating an object on other nodes at most one hop away from itself; that is, at nodes that share one less prex than the current node. Initially, each object is replicated only at the home node at a level , where N is the number of nodes in the system and b is the base of the DHT, and shares prexes with the object. If an object needs to be replicated at the next level , the home node pushes the object to all nodes that share one less prex with the home node. Each of the level nodes at which the object is currently replicated may independently decide to replicate the object further, and push the object to other nodes that share one less prex with it. Nodes continue the process of independent and distributed replication until all the objects are replicated at appropriate levels. In this algorithm, nodes that share prexes with an object are responsible for replicating that object at level , and are called level deciding nodes for that object. For each object replicated at level at some node , the level deciding node is that node in its routing table at level that has matching prexes with the object. For some objects, the deciding node may be the node itself. This distributed replication algorithm is illustrated in Figure 2. Initially, an object with identier 0124 is replicated at its home node at level and shares prexes with it. If the analytical model indicates that this object should be replicated at level , node pushes the objects to nodes and with which it shares prexes. Node is the level deciding node for the object at nodes , , and . Based on the popularity of the object, the level nodes , , and may independently decide to replicate the object at level . If node decides to do so, it pushes a copy of the object to nodes and with which it shares prex and becomes the level deciding node for the object at nodes , , and . Similarly, node may replicate the object at level by pushing a copy to nodes and , and
a a w d v v w d w w F
5
F
F z ' x ! p)
3
a
1
R
Q
d
`qB$
o)
!
the relative popularities of the objects. In order to obtain accurate estimates of the popularity of objects and the parameter of the query distribution, Beehive needs efcient mechanisms to continuously monitor the access frequency of the objects. Beehive employs a combination of local measurement and limited aggregation to keep track of the changing parameters and adapt the replication appropriately. Each node locally measures the number of queries received by an object replicated at that node in order to estimate its relative popularity. If objects are replicated only at their home nodes, all the queries for an object are routed to the home node, and local measurement of access frequency is sufcient to estimate the relative popularity. However, if the object is replicated at level , the queries for that object are distributed across approximately nodes in a base DHT with nodes. In order to estimate the relative popularity with the same accuracy, we need an fold increase in the measurement interval. But, this prevents the system from reacting quickly to changes in the popularity of the objects. Beehive performs limited aggregation in order to alleviate this problem and improve the responsiveness of the system. Aggregation in Beehive takes place periodically, once every aggregation interval. Each node sends to node in the level of its routing table, an aggregation message containing the access frequency of each object replicated at level or lower and having matching prexes with . Node receives the aggregation messages from as well as other nodes at level with which it shares prexes. It then aggregates the estimates for access frequencies received from these nodes with its own local estimate, and sends the aggregated access frequency to all nodes in the level of its routing table during the next round of aggregation . After rounds of aggregation, the home node of an object replicated at level obtains an accurate estimate of the access frequency. In Beehive, each node is responsible for replicating an object at most one level lower. That is, nodes at level are responsible for replicating an object at level . The nodes at level need to get the aggregated access frequencies of objects replicated at level from the home nodes. We enable this reverse information ow by sending the aggregated access frequencies in response to aggregation messages. The home node of an object sends the latest aggregated estimate of the access frequency in response to an aggregation message from a node . When node receives an aggregation message from , it sends a reply containing the aggregated access frequency of the objects listed in the aggregation message. In this manner, the access frequency of an object is aggregated at the home node and the aggregated estimate is disseminated to all the nodes containing a replica of the object. For an object replicated at level , it takes rounds of aggregation to complete the information ow. In addition to the popularity of the objects, the analytical model needs an estimate of the parameter of the query distribution. The Zipf-parameter, , is also estimated using local measurement and limited aggregation. Each node locally computes using the aggregated access frequency for different objects replicated at the node. We estimate using linear regres " #! d d ( a A ( U " # Y ` 3 U d Y ` 91 d 3 a (
sion techniques to compute the slope of the best t line, since a Zipf-like popularity distribution is a straight line in log-scale. Since this local estimate is based on a small subset of the objects in the system, the estimate is rened by aggregating it with the local estimates of other nodes it communicates with during aggregation. There will be uctuations in the estimation of access frequency and the Zipf parameter due to randomness in the query distribution. In order to avoid large discontinuous changes to these estimates, we age them as follows: , with .
1 cw
home node 012* E level 3
0* A
0* B
0* C
0* D
0* E
0* F
0* G
0* H
0* I
level 1
2.4 Mutable Objects
Beehive directly supports for mutable objects by proactively disseminating object updates to the replicas in the system. The 6
A
w
3
a
A
1 Aw
a
node to and . Our replication algorithm does not require any agreement in the estimation of relative popularity among the nodes. Consequently, some objects may be replicated partially due to small variations in the estimate of the relative popularity. For example in Figure 2, node might decide not to push object to level . We tolerate this inaccuracy to keep the replication protocol efcient and practical. In the evaluation section, we show that this inaccuracy in the replication protocol does not produce any noticeable difference in performance. Beehive implements this distributed replication algorithm in two phases, an analysis phase and a replicate phase, that follow the aggregation phase. During the analysis phase, each node uses the analytical model and the latest known estimate of the Zipf-parameter to obtain a new solution. Each node then locally changes the replication levels of the objects according to the solution. The solution species for each level , the fraction of objects, that need to be replicated at level or lower. Hence, fraction of objects replicated at level or lower should be replicated at level or lower. Based on the current popularity, each node sorts all the objects at level or lower for which it is the level deciding node. It chooses the most popular fraction of these objects and locally changes the replication level of the chosen objects to , if their current replication level is . The node also changes the replication level of the objects that are not chosen to , if their current replication level is or lower. After the analysis phase, the replication level of some objects could increase or decrease, since the popularity of objects changes with time. If the replication level of an object decreases from level to , it needs to be replicated in nodes that share one less prex with it. If the replication level of an object increases from level to , the nodes with only matching prexes need to delete the replica. The replicate phase is responsible for enforcing the correct extent of replication for an object as determined by the analysis phase. During the replicate phase, each node sends to each node in the i level of its routing table, a replication message listing the identiers
}w " # d d d d d UT ( d 6 6 i i i i
Figure 2: This gure illustrates how the object 0124 at its home node E is replicated to level 1. For nodes A through I, the numbers indicate the prexes that match the object identier at different levels. Each node pushes the object independently to nodes with one less matching digit.
level 2
B 01*
E 01*
I 01*
of all objects for which is the level deciding node. When receives this message from , it checks the list of identiers and pushes to node any unlisted object whose current level of replication is or lower. In addition, sends back to the identiers of objects no longer replicated at level . Upon receiving this message, removes the listed objects. Beehive nodes invoke the analysis and the replicate phases periodically. The analysis phase is invoked once every analysis interval and the replicate phase once every replication interval. In order to improve the efciency of the replication protocol and reduce load on the network, we integrate the replication phase with the aggregation protocol. We perform this integration by setting the same durations for the replication interval and the aggregation interval and combining the replication and the aggregation messages as follows: When node sends an aggregation message to , the message implicitly contains the list of objects replicated at whose level deciding node is . Similarly, when node replies to the replication message from , it adds the aggregated access frequency information for all objects listed in the replication message. The analysis phase estimates the relative popularity of the objects using the estimates for access frequency obtained through the aggregation protocol. Recall that, for an object replicated at level , it takes rounds of aggregation to obtain an accurate estimate of the access frequency. In order to allow time for the information ow during aggregation, we set the replication interval to at least times the aggregation interval. Random variations in the query distribution will lead to uctuations in the relative popularity estimates of objects, and may cause frequent changes in the replication levels of objects. This behavior may increase the object transfer activity and impose substantial load on the network. Increasing the duration of the aggregation interval is not an efcient solution because it decreases the responsiveness of system to changes. Beehive limits the impact of uctuations by employing hysteresis. During the analysis phase, when a node sorts the objects at level based on their popularity, the access frequencies of objects already replicated at level is increased by a small fraction. This biases the system towards maintaining already existing replicas when the popularity difference between two objects is small. The replication protocol also enables Beehive to maintain appropriate replication levels for objects when new nodes join and others leave the system. When a new node joins the system, it obtains the replicas of objects it needs to store by initiating a replicate phase of the replication protocol. If the new node already has objects replicated when it was previously part of the system, then these objects need not be fetched again from the deciding nodes. A node leaving the system does not directly affect Beehive. If the leaving node is a deciding node for some objects, the underlying DHT chooses a new deciding node for these objects when it repairs the routing table.
Beehive is a general replication mechanism that can be applied to any prex-based distributed hash table. We have layered our implementation on top of Pastry, a freely available DHT with log(N) lookup performance. Our implementation is structured as a transparent layer on top of FreePastry 1.3, supports a traditional insert/modify/delete/query DHT interface for applications, and required no modications underlying Pastry. However, converting the preceding discussion into a concrete imple7
w
}
v
}
3 Implementation
In addition to the state associated with each object, Beehive nodes also maintain a running estimate of the Zipf parameter. The updates to this estimate are batched, and occur relatively infrequently compared to the query stream. Overall, the storage cost consists of several bytes per object, and the processing cost of keeping the meta-data up to date is small. Pastrys query routing deviates from the model described earlier in the paper because it is not entirely prex-based and uniform. Since Pastry maps each object to the numerically closest node in the identier space, it is possible for an object to not share any prexes with its home node. For example, in and , Pastry will store an a network with two nodes object with identier on node . Since a query for object propagated by prex matching alone cannot reach the home node, Pastry completes the query with the aid of an auxiliary data structure called leaf set. The leaf set is used in the last few hops to directly locate the numerically closest node to the queried object. Pastry initially routes a query using entries in the routing table, and may route the last couple of hops using the leaf set entries. This required us to modify Beehives replication protocol to replicate objects at the leaf set nodes as follows. Since the leaf set is most likely to be used for the last
z | y cv | y w v
semantics of read and update operations on objects is an important issue to consider while supporting object mutability. Strong consistency semantics require that once an object is updated, all subsequent queries to that object only return the modied object. Achieving strong consistency is challenging in a distributed system with replicated objects, because each copy of the replicated object should be updated or invalidated upon object modication. In Beehive, we exploit the structure of the underlying DHT to efciently disseminate object updates to all the nodes carrying replicas of the object. Our scheme guarantees that when an object is modied, all replicas will be updated consistently within a very short time if the system is stable, that is, nodes are not joining and leaving the system. Beehive associates a 64 bit version number with each object to identify modied objects. An object replica with higher version number is more recent than a replica with lower version number. The owner of an object in the system can modify the object by inserting a fresh copy of the object with a higher version number at the home node. The home node proactively multicasts the update to all the replicas of the objects using the routing table. If the object is replicated at level , the home node sends a copy of the updated object to each node in the level of the routing table. Node then propagates the update to each node in the level of its routing table. The update propagation protocol ensures that each node sharing at least prexes with the object obtain a copy of the modied object. The object update reaches the node following exactly the same path a query issued at the objects home node for node s identier would follow. Because of this property, all nodes with a replica of the object get exactly one copy of the modied object. Hence, this scheme is both efcient and provides guaranteed cache coherency in the absence of nodes leaving the system. Nodes leaving the system may cause temporary inconsistencies in the routing table. Consequently, updates may not reach some nodes where objects are replicated. Similarly, nodes joining the system but having older versions of the object replicated at them need to update the copy of their objects. We modify Beehives replication protocol slightly to disseminate updates to nodes that have older versions due to churn in the system. During the replicate phase, each node includes the version number in addition to the object identiers listed in the replication message. Upon receiving this message, the deciding node of an object pushes a copy of the object if it has a more recent version of the object.
" # 3 d 1
mentation of the Beehive framework, building a DNS application on top, and combining the framework with Pastry required some practical considerations and identied some optimization opportunities. Beehive needs to maintain some additional, modest amount of state in order to track the replication level, freshness, and popularity of objects. Each Beehive node stores all replicated objects in an object repository. Beehive associates the following meta-information with each object in the system, and each Beehive node maintains the following elds within each object in its repository: Object-ID: A 128-bit eld uniquely identies the object and helps resolve queries. The object identier is derived from the hash key at the time of insertion, just as in Pastry. Version-ID: A 64-bit version number differentiates fresh copies of an object from older copies cached in the network. Home-Node: A single bit species whether the current node is the home node of the object. Replication-Level: A small integer species the current, local replication level of the object. Access-Frequency: A small integer monitors the number of queries that have reached this node. It is incremented by one for each locally observed query, and reset at each aggregation. Aggregate-Popularity: A small integer used in the aggregation phase to collect and sum up the access frequencies from all dependent nodes for which this node is the deciding node. We also maintain an older aggregate popularity count for aging.
" #!
hop, we replicate objects in the leaf set nodes only at the highest replication levels. Let be the highest replication level for Beehive, that is, the default replication level for an object replicated only at its home node. As part of the maintain phase, a node sends a maintenance message to all nodes in its routing table as well as its leaf set with a list of identiers of objects replicated at level whose deciding node is . is the deciding node of an object homed at node , if would forward a query to that object to node next. Upon receiving a maintenance message at level , node would push an object to node only if node and the object have at least matching prexes. Once an object is replicated on a leaf set node at level , further replication to lower levels follow the replication protocol described in Section 2. This slight modication to Beehive enables it to work on top of Pastry. Other routing metrics for DHT substrates, such as the XOR metric [18], have been proposed that do not exhibit this non-uniformity, and where the Beehive implementation would be simpler. Pastrys implementation provides two opportunities for optimization, which improve Beehives impact and reduce its overhead. First, Pastry nodes preferentially populate their routing tables with nodes that are in physical proximity [4]. For instance, a node with identier has the opportunity to pick either of two nodes and when routing based on the rst digit. Pastry selects the node with the lowest network latency, as measured by the packet round-trip time. As the prexes get longer, node density drops and each node has progressively less freedom to nd and choose between nearby nodes. This means that a signicant fraction of the lookup latency experienced by a Pastry lookup is incurred on the last hop. This means that selecting even a large number of constant hops, , as Beehives performance target, will have a signicant effect on the real performance of the system. While we pick in our implementation, note that is a continuous variable and may be set to a fractional value, to get average lookup performance that is a fraction of a hop. yields a solution that will replicate all objects at all hops, which is suitable only if the total hash table size is small. The second optimization opportunity stems from the maintenance messages used by Beehive and Pastry. Beehive requires some inter-node communication for replica dissemination and data aggregation. This communication is conned to pairs of nodes where one member of the pair appears in the other members routing table. This highly stylized communication pattern suggests a possible optimization. Pastry nodes periodically send heart-beat messages to nodes in their routing table and leaf set to detect node failures. They also perform periodic network latency measurements to nodes in their routing table in order to obtain closer routing table entries. We can improve Beehives efciency by combining the periodic heart-beat messages sent by Pastry with the periodic maintenance messages sent by Beehive. By piggy-backing the i row routing table entries on to the Beehive maintenance message at replication level , a single message can simultaneously serve as a heart beat message, Pastry maintenance message, and a Beehive maintenance message. We have built a prototype DNS name server on top of Bee F a a " # a F w F w a
hive in order to evaluate the caching strategy proposed in this paper. Beehive-DNS uses the Beehive framework to proactively disseminate DNS resource records containing name to IP address bindings. The Beehive-DNS server currently supports UDP-based name (A) queries, is compatible with widelydeployed resolver libraries and is designed to provide a migration path from legacy DNS. Queries that are not satised within the Beehive system are looked up in the legacy DNS by the home node and are inserted into the Beehive framework. The Beehive system stores and disseminates resource records to the appropriate replication levels by monitoring the DNS query stream. Clients are free to route their queries through any node that is part of the Beehive-DNS. Since the DNS system relies entirely on aggressive caching in order to scale, it provides very loose coherency semantics, and limits the rate at which updates can be performed. Recall that the Beehive system enables resource records to be modied at any time, and disseminates the new resource records to all caching name servers as part of the update operation. However, for this process to be initiated, name owners would have to directly notify the home node of changes to the name to IP address binding. We expect that, for some time to come, Beehive will be an adjunct system layered on top of legacy DNS, and therefore name owners who are not part of Beehive will not know to contact the system. For this reason, our current implementation delineates between names that exist solely in Beehive versus resource records originally inserted from legacy DNS. In the current implementation, the home node checks for the validity of each legacy DNS entry by issuing a DNS query for the domain when the time-to-live eld of that entry is expired. If the DNS mapping has changed, the home node detects the update and propagates it as usual. Note that this strategy preserves DNS semantics and is quite efcient because only the home nodes check the validity of each entry, while replicas retain all mappings unless invalidated. Overall, the Beehive implementation adds only a modest amount of overhead and complexity to peer-to-peer distributed hash tables. Our prototype implementation of Beehive-DNS is only 3500 lines of code, compared to the 17500 lines of code for Pastry.
4 Evaluation
In this section, we evaluate the performance costs and benets of the Beehive replication framework. We examine Beehives performance in the context of a DNS system and show that Beehive can robustly and efciently achieve its targeted lookup performance. We also show that Beehive can adapt to sudden, drastic changes in the popularity of objects as well as global shifts in the parameter of the query distribution, and continue to provide good lookup performance. We compare the performance of Beehive with that of pure Pastry and Pastry enhanced by passive caching. By passive caching, we mean caching objects along all nodes on the query path, similar to the scheme proposed in [23]. We impose no restrictions on the size of the cache used in passive caching. We follow the DNS cache model to handle mutable objects, and 8
associate a time to live with each object. Objects are removed from the cache upon expiration of the time to live.
3 2.5 latency (hops) Pastry PCPastry Beehive
4.1 Setup
We evaluate Beehive using simulations, driven by a DNS survey and trace data. The simulations were performed using the same source code as our implementation. Each simulation run was started by seeding the network with just a single copy of each object, and then querying for objects according to a DNS trace. We compared the proactive replication of Beehive to passive caching in Pastry (PC-Pastry), as well as regular Pastry. Since passive caching relies on expiration times for coherency, and since both Beehive and Pastry need to perform extra work in the presence of updates, we conducted a large-scale survey to determine the distribution of TTL values for DNS resource records and to compute the rate of change of DNS entries. Our survey spanned July through September 2003, and periodically queried web servers for the resource records of 594059 unique domain names, collected by crawling the Yahoo! and the DMOZ.ORG web directories. We used the distribution of the returned time-to-live values to determine the lifetimes of the resource records in our simulation. We measured the rate of change in DNS entries by repeating the DNS survey periodically, and derived an object lifetime distribution. We used the DNS trace [15] collected at MIT between 4 and 11 December 2000. This trace spans lookups over days featuring distinct clients and distinct fullyqualied names. In order to reduce the memory consumption of the simulations, we scale the number of distant objects to , and issue queries at the same rate of queries per sec. The rate of issue for requests has little impact on the hit rate achieved by Beehive, which is dominated mostly by the performance of the analytical model, parameter estimation, and rate of updates. The overall query distribution of this trace follows an approximate Zipf-like distribution with parameter . We separately evaluate Beehives robustness in the face of changes in this parameter. We performed our evaluations by running the Beehive implementation on Pastry in simulator mode with 1024 nodes. For Pastry, we set the base to be 16, the leaf-set size to be 24, and the length of identiers to be 128, as recommended in [22]. In all our evaluations, the Beehive maintenance interval was minutes and the replication interval was minutes. The replication phases at each node were randomly staggered to approximate the behavior of independent, non-synchronized hosts. We set the target lookup performance of Beehive to average hop.
2 1.8 object transfers (#) 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2
x 10
6
PCPastry Beehive
0 0
8
16 24 time (hours)
32
Figure 4: Object Transfers (cumulative) vs Time. The total amount of object transfers imposed by Beehive is signicantly lower compared to caching. Passive caching incurs large costs in order to check freshness of entries in the presence of conservative timeouts.
Az |x y C
9
Figure 3 shows the average lookup latency for Pastry, PCPastry, and Beehive over a query period spanning 40 hours. We plot the lookup latency as a moving average over minutes. The average lookup latency of pure Pastry is about hops. The average lookup latency of PC-Pastry drops steeply during the rst hours and averages after hours. The average lookup performance of Beehive decreases steadily and
} v x w | } } } zx }
}
Beehive Performance
converges to about hops, within of the target lookup performance. Beehive achieves the target performance in about hours and minutes, the time required for two replication phases followed by a maintain phase at each node. These three phases, combined, enable Beehive to propagate the popular objects to their respective replication levels. Once all level objects have been disseminated, Beehives proactive replication achieves the expected payoff. In contrast, PC-Pastry provides limited benets, despite an innite-sized cache. There are two reasons for the relative ineffectiveness of passive caching. First, the heavy tail in Zipf-like distributions implies that there will be
|
F
~
|
}
y C x
w v
}
~
z y
w
|
v
}
}
vvw
2 1.5 1 0.5 0 0
8
16 24 time (hours)
32
40
Figure 3: Latency (hops) vs Time. The average lookup perforhop after mance of Beehive converges to the targeted two replication phases.
y
}
40
3 2.5 latency (hops) 2 1.5 1 0.5 Pastry PCPastry Beehive
2.5
x 10
4
500 storage latency 400 estimated latency (ms)
0 32
40
avg objects per node
2
48 56 64 time (hours)
72
1.5
300
Figure 6: Latency (hops) vs Time. This graph shows that Beehive quickly adapts to changes in the popularity of objects and brings the average lookup performance to 1 hop.
1
200
Flash Crowds
Next, we examine the performance of proactive and passive caching in response to changes in object popularity. We modify the trace to suddenly reverse the popularities of all the objects in the system. That is, the least popular object becomes the most popular object, the second least popular object becomes the second most popular object, and so on. This represents a worst case scenario for proactive replication, as objects that are least replicated suddenly need to be replicated widely, and vice versa, simulating, in essence, a set of ash crowds for the least , and we issue popular objects. The switch occurs at queries from the reversed popularity distribution for another hours. Figure 6 shows the lookup performance of Pastry, PC-Pastry and Beehive in response to ash crowds. Popularity reversal causes a temporary increase in average latency for both Beehive and PC-Pastry. Beehive adjusts the replication levels of its objects appropriately and reduces the average lookup performance to about hop after two replication intervals. The lookup per10
} } F
0.5
100
0 0
0.5 1 1.5 target lookup performance (hops)
0 2
Figure 5: Storage Requirement vs Latency. This graph shows the average per node storage required by Beehive and the estimated latency for different target lookup performance. This graph captures the trade off between the overhead incurred by Beehive and the lookup performance achieved. The average number of objects stored at each node at the end of hours is for Beehive and for passive caching. PC-Pastry caches more objects than Beehive even though its lookup performance is worse, due to the heavy tailed nature of Zipf distributions. Our evaluation shows that Beehive provides hop average lookup latency with low storage and bandwidth overhead.
w } | v }
w
x
w
w
y 'z xz
w }
x C w
many objects for which there will be few requests, and queries will take many disjoint paths in the network until they collide on a node on which the object has been cached. Second, PCPastry relies on time-to-live values for cache coherency, instead of tracking the location of cached objects. The time-to-live values need to be set conservatively in order to reect the worst case scenario under which the record may be updated, as opposed to the expected lifetime of the object. Consequently, passive caching suffers from a low hit rate as entries are evicted due to small values of TTL set by name owners. Next, we examine the bandwidth consumed and network load incurred by PC-Pastry and Beehive for caching objects, and show that Beehive generates signicantly lower background trafc due to object transfers compared to passive caching. Figure 4 shows the total amount of objects transferred by Beehive and PC-Pastry since the beginning of the experiment. PC-Pastry has a rate of object transfer proportional to its lookup latency, since it transfers an object to each node along the query path. Beehive incurs a high rate of object transfer during the initial period; but once Beehive achieves its target lookup performance, it incurs considerably lower overhead, as it needs to perform transfers only in response to changes in object popularity and, relatively infrequently for DNS, to object updates. Beehive continues to perform limited amounts of object replication, due to uctuations in the popularity of the objects as well as estimation errors not dampened down by hysteresis.
Beehive efciently trades off storage and bandwidth for improved lookup latency. Our replication framework enables administrators to tune this trade off by varying the target lookup performance of the system. Figure 5 shows the trade off between storage requirement and estimated latency for different target lookup performance. We used the analytical model described in Section 2 to estimate the storage requirements. We estimated the expected lookup latency from round trip time obtained by pinging all pairs of nodes in PlanetLab, and adding to this ms for accessing the local DNS resolver. The average hop round trip time between nodes in PlanetLab is ms. In our large scale DNS survey, the average DNS lookup latency was ms. Beehive with a target performance of hop can provide better lookup latency than DNS.
80
20 18 object transfers per sec 16 14 12 10 8 6 4 2 0 32 40 48 56 64 time (hours) 72 80 PCPastry Beehive
240 objects 200 avg objects per node 160 120 80 40 0 0 alpha
1 0.9 0.8 0.7 0.6 0.5 0.4 96 alpha
24
48 time (hours)
72
Figure 7: Rate of Object Transfers vs Time. This graph shows that when popularity of the objects change, Beehive imposes extra bandwidth overhead temporarily to replicate the newly popular objects and maintain constant lookup time.
Figure 9: Objects stored per node vs Time. This graph shows that when the parameter of the query distribution changes, Beehive adjusts the number of replicated objects to maintain O(1) lookup performance with storage efciency.
Zipf Parameter Change
hops. formance of passive caching also decreases to about Figure 7 shows the instantaneous rate of object transfer induced by the popularity reversal for Beehive and PC-Pastry. The popularity reversal causes a temporary increase in the object transfer activity of Beehive as it adjusts the replication levels of the objects appropriately. Even though Beehive incurs this high rate of activity in response to a worst-case scenario, it consumes less bandwidth and imposes less aggregate load compared to passive caching.
3 latency 2.5 latency (hops) 2 1.5 1 0.5 alpha
1 0.9 0.8 0.7 0.6 0.5 0.4 96 alpha
11
(
~ c
~
x C
Figure 8: Latency (hops) vs Time. This graph shows that Beehive quickly adapts to changes in the parameter of the query distribution and brings the average lookup performance to 1 hop.
y ' x
0
24
48 time (hours)
72
Finally, we examine the adaptation of Beehive to global changes in the parameter of the overall query distribution. We issue queries from Zipf-like distributions generated with different values of the parameter at each hour interval. We start with , then increase it to after hours, then decrease the value to at , and nally increase it to the at . In order to shorten the comstarting value of pletion time of our simulations, we performed this experiment with objects and issued queries at the rate of queries per sec. Figure 8 shows the lookup performance of Beehive as it adapts to changes in the parameter of the query distribution. After we started the experiment, the average query latency converges rapidly to the target of hop. At hours, the increase in the value of to causes a temporary decrease in the average query latency, but Beehive adapts to the change in the Zipf parameter and brings the lookup performance close to the target. Similarly, Beehive renes the replication levels of the objects to meet the target lookup performance, when the Zipf parameter changes to at hours and back to at hours. Figure 9 shows the average number of objects replicated at each node in the system by Beehive. When the parameter of the query distribution is , Beehive achieves hop lookup performance by replicating about objects at each node on average. When Beehive observes the increase in the Zipf parameter to , it decreases the per node storage requirement to about objects in order to meet the target lookup performance efciently. Similarly, when the parameter increases to , Beehive increases the number of objects stored to about per node in order to achieve the target. Overall, continuously monitoring and estimating the of the query distribution enables Beehive to adjust the extent and level of replication to compensate for any global changes.
zx } w ~ } w | ' x } w } w y C x | } w ~ F ( | ' x F ~ x C | C x | y C x } ~ | ' x x ' ( v ~ F y ( }
x
Summary
In this section, we have evaluated the performance of the Beehive replication framework for different scenarios in the context of DNS. Our evaluation indicates that Beehive achieves O(1) lookup performance with low storage and bandwidth overhead. In particular, it outperforms passive caching in terms of average latency, storage requirements, network load and bandwidth consumption. Beehive continuously monitors the popularity of the objects and the parameter of the query distribution, and quickly adapts its performance to changing conditions.
5 Related Work
Peer to peer lookup systems proposed to date fall into two categories, namely, unstructured systems, where the DHT constructs an unconstrained graph among the participating nodes, and structured systems, where the DHT imposes some structure on the underlying network. Unstructured peer-to-peer systems, such as Freenet [5] and Gnutella [1] perform lookups for objects using graph traversal algorithms. Gnutella uses a ooding based breadth-rstsearch, while Freenet uses an iterative depth-rst search technique. Both Gnutella and Freenet cache queried objects along the search path to improve the efciency of the search algorithms. However, their lookup protocols are inefcient, do not scale well, and do not provide bounds on the the average or worst case lookup performance. Structured peer-to-peer systems are appealing because they can provide a worst-case bound on lookup performance. Several structured peer-to-peer systems have been designed in recent years. CAN [21] maps both objects and nodes on a ddimensional torus and provides O( ) lookup performance by searching in a multi-dimensional space. Plaxton et al. [19] introduce a randomized lookup algorithm based on prex matching to locate objects in a distributed network in O( ) probabilistic time. Chord [24], Pastry [22], and Tapestry [26] use consistent hashing to map objects to nodes and route lookup requests using Plaxtons prex-matching algorithms to search for objects. An internal database of O( ) entries enables these systems to route lookup requests and achieve O( ) worst-case lookup performance. Kademlia [24] also provides O( ) lookup performance using a similar search technique, but uses the XOR metric to compute closeness of objects and nodes. Viceroy [17] provides O( ) lookup performance with a constant degree routing graph. De Bruijn graphs [16, 25] can achieve O( ) lookup performance with neighbors per node and O( ) with degree per node. Beehive can be applied on any overlay based on prex matching. A few recently introduced DHTs provide O( ) lookup performance by tolerating increased storage and bandwidth consumption. Kelips [12] provides O( ) lookup performance with probabilistic guarantees by replicating each object on O( ) nodes. It divides the nodes into O( ) groups of O( ) nodes each and maintains information about network membership and object updates using gossip-based protocols. It maps each object to a group and replicates the object on all nodes
A A w 6 Q m Y 2 A
in the group, regardless of popularity. The background gossip communication consumes a constant amount of bandwidth, but incurs long convergence time. Consequently, Kelips may not disseminate object updates to all replicas quickly. An alternative method to achieve one hop lookups is described in [13], and relies on maintaining full routing state (i.e. a complete description of system membership) at each node. The space and bandwidth costs of this approach scale linearly with the size of the network. Farsite [10] also routes in a constant number of hops, but does not address rapid membership changes. Beehive differs from these systems in three fundamental ways. First, Beehive operates as a separable layer on many DHTs without requiring structural changes. Second, it exploits the popularity distribution of objects to minimize the amount of replication. Unpopular objects are not replicated, reducing storage overhead, bandwidth consumption and network load. Finally, Beehive provides a ne grain control of the trade off between lookup performance and overhead by allowing users to choose the target lookup performance from a continuous range. Several peer-to-peer applications have examined caching and replication to improve lookup performance, increase availability, and provide better failure resilience. PAST [23] and CFS [9] are examples of le backup applications built on top of Pastry and Chord, respectively. Both reserve a part of the storage space at each node to cache queried results on the lookup path and provide faster lookup. They also maintain a constant number of replicas of each object in the system in order to improve fault tolerance. These passive caching schemes do not provide any performance bounds. Some systems employ a combination of caching with proactive object updates. In [6], the authors describe a proactive cache for DNS records. Whenever a cached DNS record is about to expire, the cache issues a fresh query to check for the validity of the DNS record, and result of the query is stored in the cache. While this technique reduces the impact of short expiration times on lookup performance, it introduces a large amount of overhead due to background object transfers, without providing bounded lookup performance. CUP, Cache Update Propagation [20], is a demand-based caching mechanism with proactive object updates. In CUP, the process of querying for an object and updating cached replicas of that object forms a tree like structure rooted at the home node of the object. CUP nodes propagate object updates away from the home node in accordance to a popularity based incentive that ows from the leaf nodes towards the home node. The are several similarities between the replication protocols of CUP and Beehive. However, the decision to cache objects and propagate updates in CUP are based on heuristics, while the replication in Beehive is driven by an analytical model that enables it to provide constant lookup performance for power law query distributions. The closest work to Beehive is [7], which presents a study of optimal strategies for replicating objects in unstructured peerpeer systems. This paper employs an analytical approach to nd the best possible replication strategy for unstructured peer-topeer systems, subject to storage constraints. The observations in this work are not directly applicable to structured DHTs, be12
cause it assumes that the lookup time for an object depends only on the number of replicas and not the placement strategy. Beehive exploits the structure of the overlay to place replicas at appropriate locations in the network to achieve the desired performance level.
6 Future Work
This paper has investigated the potential performance benets of model-driven proactive caching and has shown that it is feasible to use peer-to-peer systems in cooperative low-latency, high-performance environments. Deploying full-blown applications, such as a complete peer-to-peer DNS replacement, on top of this substrate will require substantial further effort. Most notably, security issues need to be addressed before peerto-peer systems can be deployed widely. At the application level, this involves using some authentication technique, such as DNSSEC [11], to securely delegate name service to nodes in a peer to peer system. At the underlying DHT layer, secure routing techniques [3] are required to limit the impact of malicious nodes on the DHT. Both of these techniques will add additional latencies, which may be offset at the cost of additional bandwidth, storage and load by setting Beehives target performance level, , to a lower, fractional value. At the Beehive layer, the proactive replication layer needs to be protected from nodes that misreport the popularity of objects. Since a malicious peer in Beehive can replicate an object, or indirectly cause an object to be replicated, at nodes that have that malicious node in their routing tables, we expect that one can limit the amount of damage that attackers can cause through misreported object popularities.
objects throughout the system, along the routing tables already maintained by the underlying DHT. Analysis of Beehives performance in the context of a DNS application indicates that it can achieve a targeted performance level with low overhead. Beehive adapts quickly to ash crowds, which can alter the relative popularities of the objects in the system. It detects qualitative shifts in the global query distribution and adjusts replication parameters accordingly to compensate. The implementation is small and the Beehive approach can be applied to other latency-sensitive applications. Overall, the system derives its efciency by taking advantage of the underlying structure of the lower-layer DHT, and makes it feasible to use DHTs in low-latency applications where the query distribution follows a power law by decoupling lookup performance from the size of the network.
References
[1] The Gnutella Protocol Specication v.0.4. http://www9.limewire.com/developer/gnutella protocol 0.4 .pdf, March 2001. [2] Lee Breslau, Pei Cao, Li Fan, Graham Phillips, and Scott Shenker. Web Caching and Zipf-like Distributions: Evidence and Implications. IEEE International Conference on Computer Communications (INFOCOM) 1999, New York NY, March 1999. [3] Miguel Castro, Peter Druschel, Ayalvadi Ganesh, Antony Rowstron, and Dan Wallach. Secure Routing for Structured Peer-to-Peer Overlay Networks. Symposium on Operating Systems Design and Implementation, OSDI 2002, Boston MA, December 2002. [4] Miguel Castro, Peter Druschel, Charlie Hu, and Antony Rowstron. Exploiting Network Proximity in Peer-to-Peer Overlay Networks. Technical Report MSR-TR-2002-82, Microsoft Research, May 2002. [5] Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore Hong. Freenet: A Distributed Anonymous Information Storage and Retrieval System. Lecture Notes in Computer Science, vol 2009, pp 46-66, 2001. [6] Edith Cohen and Haim Kaplan. Proactive Caching of DNS Records: Addressing a Performance Bottleneck. Symposium on Applications and the Internet SAINT 2001, San Diego-Mission Valley CA, January 2001. [7] Edith Cohen and Scott Shenker. Replication Strategies in Unstructured Peer-to-Peer Networks. ACM SIGCOMM 2002, Pittsburgh PA, August 2002. [8] Russ Cox, Athicha Muthitacharoen, and Robert Morris. Serving DNS using a Peer-to-Peer Lookup Service. International Workshop on Peer-To-Peer Systems 2002, Cambridge MA, March 2002. 13
7 Conclusion
Structured DHTs offer many unique properties desirable for a large class of applications, including self-organization, failure resilience, high scalability, and a worst-case performance bound. However, their O( ) average-case performance has prohibited them from being deployed for latency-sensitive applications, including DNS. In this paper, we outline a framework for proactive replication that can improve the averagecase lookup performance of prex-based DHTs to for a frequently encountered class of query distributions. The Beehive framework consists of three components, layered on top of a standard DHT substrate, such as Pastry. An analytical model provides a closed form solution for computing the requisite level of replication in order to achieve a targeted lookup performance. This analytical solution is optimal in the number of replicas for Zipf-like distributions with . An estimation technique, based on local measurements and limited aggregation to address statistical uctuations, derives input parameters for the model. The estimation process is integrated with background trafc already present in the DHT. Computing the level of replication for each object is performed independently at each node, without costly consensus or synchronization. A replication algorithm proactively disseminates the
u( 3 1 A
[9] Frank Dabek, Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica. Wide-area cooperative storage with CFS. ACM Symposium on Operating System Principles SOSP 2001, Banff Alberta, Canada, October 2001. [10] John R. Douceur, Atul Adya, William J. Bolosky, Dan Simon, Marvin Theimer. Reclaiming Space from Duplicate Files in a Serverless Distributed File System. International Conference on Distributed Computing Systems, ICDCS 2002, Vienna, Austria, July 2002. [11] D. Eastlake. Domain Name System Security Extensions. Request for Comments (RFC) 2535, 3 ed., March 1999. [12] Indranil Gupta, Ken Birman, Prakash Linga, Al Demers, and Robert van Rennesse. Kelips: Building an Efcient and Stable P2P DHT Through Increased Memory and Background Overhead. Second International Peer-ToPeer Systems Workshop, IPTPS 2003, Berkeley CA, February 2003. [13] Anjali Gupta, Barbara Liskov, Rodrigo Rodrigues. One Hop Lookups for Peer-to-Peer Overlays. Ninth Workshop on Hot Topics in Operating Systems. Lihue, Hawaii, May 2003. [14] Nicholas Harvey, Michael Jones, Stefan Saroiu, Marvin Theimer, and Alec Wolman. SkipNet: A Scalable Overlay Network with Practical Locality Properties., Fourth USENIX Symposium on Internet Technologies and Systems, USITS 2003, Seattle WA, March 2003. [15] Jaeyon Jung, Emil Sit, Hari Balakrishnan, and Robert Morris. DNS Performance and Effectiveness of Caching. ACM SIGCOMM Internet Measurement Workshop 2001, San Francisco CA, November 2001. [16] Frans Kaashoek and David Karger. Koorde: A Simple Degree-Optimal Distributed Hash Table. Second International Peer-To-Peer Systems Workshop, IPTPS 2003, Berkeley CA, February 2003. [17] Dahlia Malkhi, moni Naor, and David Ratajczak. Viceroy: A Scalable and Dynamic Emulation of the Buttery. ACM Symposium on Principles of Distributed Computing, PODC 2002, Monterey CA, August 2002. [18] Petar Maymounkov and David Mazi res. Kademlia: A e Peer-to-peer Information System Based on the XOR Metric. First International Peer-To-Peer Systems Workshop, IPTPS 2002, Cambridge MA, March 2002. [19] Greg Plaxton, Rajmohan Rajaraman, and Andrea Richa. Accessing nearby copies of replicated objects in a distributed environment. Theory of Computing Systems, vol 32, pg 241-280, 1999. [20] Mema Roussopoulos and Mary Baker. CUP: Controlled Update Propagation in Peer-to-Peer Networks. USENIX 2003 Annual Technical Conference, San Antonio TX, June 2003. 14
r
[21] Sylvia Ratnasamy, Paul Francis, Mark Hadley, Richard Karp, and Scott Shenker. A Scalable Content-Addressable Network. ACM SIGCOMM 2001, San Diego CA, August 2001. [22] Antony Rowstorn and Peter Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. IFIP/ACM International Conference on Distributed Systems Platforms, Middleware 2001, Heidelberg, Germany, November 2001. [23] Antony Rowstorn and Peter Druschel. Storage management and caching in PAST, a large-scale persistent peer-topeer storage utility. ACM Symposium on Operating System Principle, SOSP 2001, Banff Alberta, Canada, October 2001. [24] Ion Stoica, Robert Morris, David Karger, Frans Kaashoek, and Hari Balakrishnan. Chord: A scalable Peer-to-peer Lookup Service for Internet Applications. ACM SIGCOMM 2001, San Diego CA, August 2001. [25] Udi Wieder and Moni Naor. A Simple Fault Tolerant Distributed Hash Table. Second International Peer-To-Peer Systems Workshop, IPTPS 2003, Berkeley CA, February 2003. [26] Ben Zhao, Ling Huang, Jeremy Stribling, Sean Rhea, Anthony Joseph, and John Kubiatowicz. Tapestry: A Resilient Global-scale Overlay for Service Deployment. IEEE Journal on Selected Areas in Communications, JSAC, 2003.
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more.
Course Hero has millions of course specific materials providing students with the best way to expand
their education.
Below is a small sample set of documents:
Cornell - LAW - 640
Suman Seth, Syllabus, STS 640, 11/17/05STS 640/FGSS 640 Historical Issues of Gender, Science and TechnologyWednesday, 12:20-2:15 Instructor: Suman Seth 303 Rockefeller Hall Ph: 255 6325 Email: ss536@cornell.edu Office Hours: Monday, 2-4 pm. Course
Cornell - LAW - 645
S&TS 645, Govt 634 Fall 2006 T 2:30 4:25 p.m.Stephen Hilgartner 306 Rockefeller Hall 5-9950; shh6The New Life Sciences: Emerging Technologies, Emerging PoliticsCourse description The new life sciences (including genetics, genomics, biotechnolog
Cornell - LAW - 652
NANO LETTERSImaging Electromigration during the Formation of Break JunctionsThiti Taychatanapat, Kirill I. Bolotin, Ferdinand Kuemmeth, and Daniel C. Ralph*Laboratory of Atomic and Solid State Physics, Cornell UniVersity, Ithaca, New York 14853R
Cornell - LAW - 681
Three basic issues concerning interface dynamics in nonequilibrium pattern formationWim van SaarloosarXiv:patt-sol/9801002 23 Jan 1998Instituut-Lorentz, Leiden University, P.O. Box 9506, 2300 RA Leiden, The NetherlandsAbstract In these lectur
Cornell - LAW - 703
Government 703: Political EconomyProf. Jonathan Kirshner 323 White Hall 255-4120/(jdk5) Fall 2006 Tues 10:10-12:05 Rockefeller B-16This course reviews some of the extensive literature on political economy, both classical and modern. Among the mate
Cornell - LAW - 731
Government 731 Comparative Political Ecology SyllabusRonald Herring W 2:30-4:25; McGraw 365/Sibley 211 Office Hrs: M 3:30-5; T 11-12 and by appointment White Hall, 255-4060 Email: rjh5@cornell.edu Political Ecology and Development. Political ecolog
Cornell - VIVO - 5479
Government 731 Comparative Political Ecology SyllabusRonald Herring W 2:30-4:25; McGraw 365/Sibley 211 Office Hrs: M 3:30-5; T 11-12 and by appointment White Hall, 255-4060 Email: rjh5@cornell.edu Political Ecology and Development. Political ecolog
Cornell - LING - 453
Z. PtlKrankh. PflSchutz, Sonderh. XVI, 445-453 (1998)Field efficacy of Phomopsis cokvolvulus for control of Convolvulus arvensisS. VOGELGSANG', A. K. WATSON', A. D ~ M M A S O ' ,K. HURLE* Faculty of Agricultural and Environmental Sciences, McGill
Cornell - LING - 600
GEOMETRY OF TOTAL CURVATURETakashi SHIOYAGraduate School of Mathematics Kyushu University Fukuoka 812-81 (Japan)Abstract. This is a survey article on geometry of total curvature of complete open 2dimensional Riemannian manifolds, which was rst st
Cornell - LING - 615
The Active Badge Location SystemRoy Want, Andy Hopper, Veronica Falcao1 and Jon Gibbons2Olivetti Research Ltd ORL 24a Trumpington Street Cambridge CB2 1QA EnglandIntegration of telephone systems with computer systems is an important part of the de
Cornell - LSP - 220
LSP/Dev Soc 220 Sociology of Health and Ethnic Minorities Fall 2006 100 Caldwell TU/TH 10:10-11:25 pap2@cornell.edu Instructor: Pilar A. Parra Ph.D. 3M7 MVR-Hall Tel: 5-0063 Office Hours TU/TH 11:30-12:30Description of the course and general object
Cornell - BIOEE - 265
Cornell course in Kenya09/09/2005 09:35 AMSept. 8, 2005Giraffe dung and lions devouring zebras are part of new field biology course that takes students to the African bushITHACA, N.Y. - Imagine a college course that enables students to documen
Cornell - MATH - 011
Symmetry, Integrability and Geometry: Methods and ApplicationsSIGMA 3 (2007), 011, 37 pagesFinite-Temperature Form Factors: a ReviewBenjamin DOYON Rudolf Peierls Centre for Theoretical Physics, Oxford University, 1 Keble Road, Oxford OX1 3NP, U.
Cornell - MATH - 012
Symmetry, Integrability and Geometry: Methods and ApplicationsSIGMA 3 (2007), 012, 18 pagesBoundary Liouville Theory: Hamiltonian Description and QuantizationHarald DORN and George JORJADZEInstitut fr Physik der Humboldt-Universitt zu Ber
Cornell - MATH - 100
COMMUNITY DEVELOPMENT: Journal of the Community Development Society, Vol. 37, No. 2, Summer 2006Costs, Benefits, and Long-Term Effects of Early Care and Education Programs: Recommendations and Cautions for Community DevelopersW. Steven Barnett and
Cornell - MATH - 122
Math 122, Prelim 1 Solutions Fall 20052 1) a) We integrate by parts. Let u = x and dv = (x + 1)1/2 so du = dx and v = ( 3 )(x + 1)3/2 . Then 2 x x + 1 dx = x(x + 1)3/2 32 2 22 ( )(x + 1)3/2 dx = x(x + 1)3/2 (x + 1)5/2 . 3 3 35Plugging in the
Cornell - MATH - 135
Bacons Bilateral CipherMath 135, January 25 In 1623, Francis Bacon created a cipher system using the techniques of substitution and steganography - the art of writing hidden messages in such a way that no one apart form the sender and receiver know
Cornell - MATH - 135
L. Smithline Math 135 Final Exam Solutions11. Suppose you have a magic box which has an input slot and an output slot. The box works as follows: If you write a prime number P , a base B, and an integer R on a strip of paper, feed the strip into
Cornell - MATH - 223
Math 223: Fall 2005 Final Exam13 December, 2005 RF 231 You have 150 minutes to complete this exam. Please note: while it has been standard practice to give additional time as needed on the prelims, no extra time can be given on the nal exam, as this
Cornell - MATH - 293
Math 293 Solutions to Problem Set 3. 2 1 1y x2 (x + y 2 ) dz dx dy into cylindrical coordinates we 13.6 #14. To convert 1 0 0 r2 r dz dr d. For the dz rst substitute dz dx dy = r dz dr d and x2 + y 2 = r2 to get r cos 3 integration we just replace
Cornell - MATH - 294
Math 294 Homework Assignment, Wednesday Nov 8, 2000 1. Consider the dierential equation my + cy + ky = 0 () = d/dtwhich is the dierential equation that models the motion of the mass m when it is connected to a spring with stiness k and also to a da
Cornell - MATH - 321
Math 321practice problems for first prelimFall 2006There are more problems here than many of you can do in fifty minutes, but they are representative of the type of problems you might expect on the exam. On the exam you may use a one-sided lett
Cornell - MATH - 414
Math 414 Spring 2005 Homework Assignment no. 6 Due Thursday 17 March Section 10.1.5: 7, 15, 16. Section 10.2.4: 11, 16. 1. Let (M, d) be a metric space. For any Y M and > 0 dene B (Y) = { x M | there exists y Y such that d(x, y) }. (This gene
Cornell - MATH - 414
Math 414 Spring 2005 Homework Assignment 4 Due Thursday 24 February Section 9.3.7: 6, 7, 12, 13, 15, 17, 20; plus the problem below. 1. This problem completes the proof of the Stone-Weierstra Theorem given in class. Dene a sequence of polynonial func
Cornell - MATH - 418
Math 418Spring 2006Complex Function theory, Math 418 Time and Place. 1140-1255P MT 205 Instructor. Dan Barbasch Oce. 543 Malott Phone. 5-3685 Email. barbasch@math.cornell.edu Text. Complex Variables and Applications by J. Brown and R. Churchill C
Cornell - MATH - 425
IX. Equilibrium? In this problem, you will try to determine whether a one-dimensional room with a heat source in the middle and ice cubes at the ends reaches equilibrium in a reasonable amount of time. The given problem is ut u = f, u(x, 0) = 0, u(0
Cornell - MATH - 428
Math 428 Ram rezFinal Exam: Part 1 Due May 15, 2003SHOW ALL OF YOUR WORK. You may only discuss this exam with the lecturer or the TA. Using sources other than the textbook is allowed and encouraged but please include references to any books you u
Cornell - MATH - 428
MATH 428. Introduction to Partial Dierential Equations.Spring 2003HOMEWORK 1: Addendum1. If w = cos(x y), x = 2r sin s and y = r2 cos s, use the chain rule to compute w r and w . s2. Consider a metalic plate. Suppose that the temperature at t
Cornell - MATH - 433
97532 @8641 0 ) ( '& !# $ %# $ " ! u rtprbEHrkCIq thEthw bv! H ' 8kd%{%5q t)x EEwk7qb%|1C5w qrh}d% d r%E%|78 rC8|tEt)8|t|kI}d d5I5t
Cornell - MATH - 433
jw ux"vBqu sIp) u vrj uqstvwGDuirIv"eIxyt)x x vs w v tnvs w t t w w s tvs t w w w t h e t t w e 3 yx"vBuqi8qwqx"Byr)w1qItQtIvt Iv)rqItIt$t7qItQte j t f xs o u8vytwpIvy)u7 B ysx u
Cornell - MATH - 450
Math 450Problem Set 71. (a) The trace tr(A) of a square matrix A is dened to be the sum of its diagonal entries. Show that tr(AB) = tr(BA) and deduce from this that tr(BAB 1 ) = tr(A) when B is invertible. (b) Calculate eA when A is a diagonal ma
Cornell - MATH - 453
MATH 453PROBLEMSON GROUPSNODUE DATE1. Decide whether or not the following sets with given binary operations are groups. Justify your answer. a. G1 = Z with a b = max{a, b}. b. G2 = Z with a b = a b. c. G3 = R+ with a b = a b. d. G4 = Q
Cornell - MATH - 454
Math 454Prelim ISpring 2003Rules: The only outside sources you can use are the textbook and your class notes. The only person you can talk to about the problems is the instructor for the course. 1. Let (s) be a regular curve in R3 , parametrize
Cornell - MATH - 486
Problem 1: For sets A and B, let ASolution: For all x x A BF C x x x x A BFAx ACxAxA CFProblem 2: If A and B are sets, prove from the set axioms that AAS BC exists. By the union axiom, the set D C AB Solution: By the pairing axiom, the
Cornell - MATH - 486
MATH 486 Prelim 2 - Practice Exam April 27, 2004(1) Let (Q, <) be the structure with domain the set of rational numbers and the binary predicate < interpreted as the natural order on rationals. (a) Write (x)(y)(y < x) as a simple English sentence. (
Cornell - MATH - 486
( f (i@ f X@ ( ( f f (igi@ C#$f$f f f f f f ( #( f#(xgi@ #@ #e0xfkkf p b kk0kk}e0pg}eb p b p b b {qk0pde0peb ppepp16eCpq{bbb b p b b k#eCxfg#$ffqfi@i#geb p U PG 0RGiV{VC pwe1bb p
Cornell - MATH - 486
fV 2i fV si fV xXn fV sXn f2i1xXnX fsXg`FsXnX f V f V V f V f V V fV xXg fV sXn fX{gXeFPfz f V Vf{ V fFX{neFPDz f V Vf{ V fF{ieFPDzFfX{nXePDz f V Vf{ V f V Vf{ V fF{i`FX{nXePDz f V f V Vf{ V fFX{gX@FPDzfFX{neFPDzHfX{gFX{nX@FPDz V f V Vf{ V f
Cornell - MATH - 611
n j pl mlrCl s d u q u s i u q v s d s i u q q u s d s q u s UfsHfxrsUuriCr4fvv|k UsuppUUs Uw|uU %ts q x'n tj } n j } o } n j } o l mlrCl s o n j fiC t n a'Cxn 'n #n %Pfivy vo j o j o j o j d
Cornell - MATH - 612
Math 612homework assignment 3due 2007-02-15 at 3 pm1. Rudin 10: 2, 4, 5, 16, 17.^ A circle in the Riemann sphere C is a subset which is either a circle in C or a set of the form l {}, where l is a straight line in C. Equivalently, a circle ^
Cornell - MATH - 617
j #YGbd Y!bHdePz!bvtzpihGb)z#Heg q eq qf s rq gf e d x e gx s g v y d g tx t g f g q v x s tx t bHdh7GHe#btpf#vvHx)hGe7zPt7GvpGvf rUbzvx#HegtbHhxHhf@dPus)pxvpfg5zqhqpUq%x&pxH7Hw7bHdh7GHe#bw7)bds7G%)!p%)Hx%xpb|m q q qgg f r t f g g
Cornell - MATH - 651
Math 651Problem Set 2Spring 2005Terminology used in some of the problems: nullhomotopic means homotopic to a constant map. points) then the induced homomorphisms : n (X, x0 )n (Y , (x0 ) are isomorphisms for all n . [Generalize the proof of Pr
Cornell - MATH - 661
Math 661 Geometric Topology (homework 11, due Nov 25)Exercise 11.1. Prove: A polygon diagram describes an orientable surface if and only if, for each edge-color a, the two edges of color a are oriented oppositely in the boundary circle of the polyg
Cornell - MATH - 661
Math 661 Geometric Topology (homework 7, due Oct 25)Exercise 7.1 (Correction). Let Px (X) denote the set Px (X) := {p : X p is continuous and p(0) = x} and let be the equivalence relation of paths being homotopic relative to endpoints. Let the to
Cornell - MATH - 712
Math 712PROBABILISTIC METHODS IN ANALYSISInstructor E. B. Dynkin Classes on Tuesday and Thursday, 10:10-11:25 Interactions between the theory of stochastic processes and the theory of partial dierential equations are benecial for both probability
Cornell - MATH - 739
w w m wv px s ox m w m ~wywpxRpeiyx|h|#ep#u$k#whi$ke#yxxpm}yuT pi#ysiz|yoie mox n s n nx mu s s ox w p s wx mxvvx x o o w mu i1y1k i ywnw}v xRiyrx3mip gpoeynyenV|'jkiyxyn1og xRiRy|T'j # ypwni iyon l|p wiR#emo f|w RemiRuoen ex
Cornell - MATH - 739
oeppeo%C&%okol&%eoQpeI evev|C%xxtyCDkzt|mgty4fepoj%exx%ed~|}xtyCDi%tgerCmgj 4ojeryg t%ds &t%risiqs paCihy |g~edxf e|2Cbfajc 5` x Y g x | s| s s r d y x x s| z y y C|x%et}ppeleeeox%wCfepoj%d y s x x r d v | wyjg ir}xgys wvr C
Cornell - MATH - 757
MATH 757, FALL 1999 EXERCISES ON ROOTSLet W be a nite reection group given by a root system in the general sense. Thus is a W -invariant nite set of nonzero vectors, and W is generated by the reections s ( ). We assume, for simplicity, that is
Cornell - MATH - 762
ddcalyxbw9ouw"sxzbcelalq|xue e a a e uh g a b y a b y h a a b b a w wga wg e a a w b b QSgw xbuxecw6dy 1SdlhlyudCxaSdcalyxbw9Cdah ihg p w g e yh y bh p w g e y bg b y y b hg f bg b a wa qGliy yCflqvuxydabQlh`qGliy xyCfq$ql le`SQruFlhq1
Cornell - MATH - 762
sxvbwxgdhH%PvfPgfaxwY Q ip b p l i g e i p p d d d p u s b s p d f s d p p g i dl d d txxvbgfpg9gfexbvv9P1xvbPdgs if s e d b s p p d d d p u s s y eib S1xib SeggtgprdgfPvfExwtu 9hvdP%Sl pu b we"a s qgbvsPhFgfvdHnwAxvbPhggfvedsP2xebmvei9a%C
Cornell - MATH - 762
b b b i i d x yb iy s i d x x sb s ghdepfaab{guaqaub{g)uycutuhxvvsapauYtux ~ ig w s d b i dpY d x i d x p y Y Y d i x i z ghiganabieubusxa!gptuavYxi9uygvuyGarYtvYxvdhxtwitguyhidT Q6stqtabkiebvsxa6{gmtfephgfaSusaggagd ~ yY s d b i d pY b
Cornell - BIO G - 001
NEW YORK STATE AGRICULTURAL EXPERIMENT STATION, GENEVA, A DIVISION OF THE NEW YORK STATE COLLEGE OF AGRICULTURE, A STATUTORY COLLEGE OF THE STATE UNIVERSITY, CORNELL UNIVERSITY, ITHACAFree sugars in fruits and vegetablesby C. Y. Lee, R. S. Shallen
Cornell - BIO G - 006
NEW YORK'S FOOD AND LIFE SCIENCES BULLETINNO. 6, JANUARY 1971NEW YORK STATE AGRICULTURAL EXPERIMENT STATION, GENEVA, A DIVISION OF THE NEW YORK STATE COLLEGE OF AGRICULTURE, A STATUTORY COLLEGE OF THE STATE UNIVERSITY, CORNELL UNIVERSITY, ITHACA
Cornell - BIO - 490
College of Agriculture and Life Sciences Office of Academic Programs Course EvaluationFALL 2004Course BIOGD 490 LEC 01 Instructor SOLOWAY, P.Gender 7 Male 13 Female 2 DeclinedStudent Year 0 Freshmen 3 Seniors 0 Sophomores 16 Grad Students 1 Ju
Cornell - BIO - 608
NS/BioGD 608 Epigenetics Fall 2003 WF 11:15-12:05 Savage Hall 200 Paul Soloway, 108 Savage Hall, 4-6444, pds28@cornell.edu http:/blackboard.cornell.edu/Overview: Epigenetic effects refer to alterations in chromatin structure that can stably and heri
Cornell - BIO - 780
B i o G D 7 8 0 C u r r e n t To p i c s i n G e n e t i c soleculEvolution in toxic environmentsEvolutionary forces work to transform populations, but are generally a slow process. However, humans have invented tools (antibiotics, insecticide, h
Cornell - BIO - 290
BioMI 290COURSE SCHEDULEFall 2006STRUCTURE AND FUNCTION 1 Fri Aug 25 Introduction; The History of Life on Earth 2 Mon Aug 28 Microbial Evolution, History of Microbiology 3 Wed Aug 30 Microbial Phylogeny, The Big Tree of Life 4 Fri Sep 1 Definit
Cornell - BIO - 409
BIOMI409/VETMI409 Take home exam 1 (open Book) 5 question, each 10 pointsName:Due in class Thursday 9/221.Describe the basic features and the results obtained from the Hershey-Chase experiment with T4 phage, carried out in 1952. What were the
Cornell - NAV S - 201
Math 201 Notes, Part 11One of our main goals is to relate numbers of various kinds to geometry. The simplest sorts of numbers are integers, along with their ratios, the rational numbers. There is a very interesting diagram, not as well known as i