112 Pages

etd2482

Course: IR 3810, Fall 2009
School: Sveriges...
Rating:
 
 
 
 
 

Word Count: 23389

Document Preview

THE IMPROVING PERFORMANCE OF THE GNUTELLA NETWORK Andrk Dufour B.A.Sc. University of Ottawa, 2003 A THESIS SUBMITTED IN PARTIAL FULFILLMENT O F T H E REQUIREMENTS FOR T H E DEGREE OE' MASTEROF APPLIED SCIENCE in the School of Engineering Science \ @ Andrk Dufour 2006 SIMOS FRASER UNIVERSITY Summer 2006 All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means,...

Register Now

Unformatted Document Excerpt

Coursehero >> Other International >> Sveriges lantbruksuniversitet >> IR 3810

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
THE IMPROVING PERFORMANCE OF THE GNUTELLA NETWORK Andrk Dufour B.A.Sc. University of Ottawa, 2003 A THESIS SUBMITTED IN PARTIAL FULFILLMENT O F T H E REQUIREMENTS FOR T H E DEGREE OE' MASTEROF APPLIED SCIENCE in the School of Engineering Science \ @ Andrk Dufour 2006 SIMOS FRASER UNIVERSITY Summer 2006 All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means, without the permission of the author. APPROVAL Name: Degree: T i t l e of thesis : Andr6 Dufour Master of Applied Science Improving the Performance of the Gnutella Network E x a m i n i n g C o m m i t tee: Dr. Rodney Vaughan, Chairman Dr. Ljiljana Trajkovid Professor, Engineering Science, SFU Senior Supervisor Dr. Joseph Peters Professor, Computing Science, SFU Supervisor Dr. Mohamed Hefeeda Assistant Professor, Computing Science, SFU Examiner D a t e Approved: - DECLARATION OF PARTIAL COPYRIGHT LICENCE The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users. The author has further granted permission to Simon Fraser University to keep or make a digital copy for use in its circulating collection, and, without changing the content, to translate the thesislproject or extended essays, if technically possible, to any medium or format for the purpose of preservation of the digital work. The author has further agreed that permission for multiple copying of this work for scholarly purposes may be granted by either the author or the Dean of Graduate Studies. It is understood that copying or publication of this work for financial gain shall not be allowed without the author's written permission. Permission for public performance, or limited permission for private scholarly use, of any multimedia materials forming part of this work, may have been granted by the author. This information map be found on the separately catalogued multimedia material and in the signed Partial Copyright Licence. The original Partial Copyright Licence attesting to these terms, and signed by this author, may be found in the original bound copy of this work, retained in the Simon Fraser University Archive. Simon Fraser University Library Burnaby, BC, Canada Abstract In this thesis, the behaviour of the Gnutella peer-to-peer (P2P) file sharing network is examined and a proposal is put forth to improve its performance. Gnutella's overlay topology is not well matched to the underlying physical network and the network therefore exhibits sub-optinla1 perforrnance in terms of message latency. 111 ordcr to evaluate t,his performance, we modified an existing Gnutella simulation framework developed for the ns-2 simulator to gather information about query and query hit propagation. The protocol implemented in the simulation was then modified to use t,hc Vivaldi syntl~et~ic coortliriat,c: syst,em in ortlcr to bias 11eight)onrsdec.tio11 to favow nodes that are "close" in the Euclidean sense. Simulations showed that the modified Gnutella protocol yielded an improvement in both query and query hit propagation times. Keywords Computer networks - computer simulation, computer network architectures. peer-to- p w r arcliit ect urc (computer networks) Acknowledgements I would like to thank my senior supervisor, Dr. Ljiljana Tsajkovid for her support and guidance throughout this degree. The members of the Comn~unication Networks Laboratory have also been very supportive and I am deeply grateful to them. I am also thankful for thc help of Dr. Joseph Peters arid Dr. Mollanled Hefeeda as nicmbcrs of my committee. I would like t o extend a special thanks to Dr. Jason Carey of the University of Alberta for his thorough review and insightful comments on my thesis. Sincere thanks are also due to the organizations that have funded me throughout r r y studies: the Natural Science and Engintwirrg Rcsearch Council (NSERC), tlic Advanced Systems Institute of British Columbia, Simon Fraser University, Agilent Technologies, and Nova-Tech Engineering Inc. Finally, I am eternally grateful to my parents for their support and encouragement throughout all illy studies. Contents Approval Abstract ........................................................... 1 1 .. ........................................................... ................................................. 11 1 ... Acknowledgements Contents iv v ........................................................... List of Tables List ofFigures .......................................................viii ...................................................... ix List of Symbols ..................................................... xiii ............................................... xiv I List of Abbreviations 1 Introduction ..................................................... 1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Background 2.1.1 ..................................................... 3 3 4 7 9 9 2.1 P2P Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................................. 2.1.2 Applications of P2P Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Gnutella Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 The Gnutella Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P2P Network Properties 2.2.2 The Gnutella Network's Topology . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Network Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.2 SSFNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.3 Dedicated P2P Simulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.4 ns-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 The GnutellaSim Sinlulation Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.1 Opriet 3 The Vivaldi Coordinate System .................................. 21 3.1 The Need for Network Coordinates to Improvc Network Pcrformmce . 21 3.2 Setwork Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 Vivaldi Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3.1 Error Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3.2 . Spring Rclaxat.ion arid Coortlinat.c Atl.j u s t ~ ~ m i t . . . . . . . . . . . . . 25 3.3.3 Vivaldi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3.4 Vivaldi Coordinate Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3.5 Vivaldi Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4 Modifications to the Gnutella Protocol .......................... 29 4.1 Syntactic Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Beliavioural Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.2.2 4.2.3 4.2.4 4.3.1 4.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Message Forwarding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Optimal Ncighbour Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Coordinate Updates 4.3 Costs and Risks of the Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Costs of the Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Risks of the Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5 System Architecture ............................................. 37 5.1 Gnutaldi Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 . 5.2 Griutaltli Ardiite~t~urt: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.2.1 5.2.2 Protocol Layer Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Application Layer Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.2.3 Messaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2.4 Vivalcli-Rclatecl Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2.5 Griutaldi Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 6 Network Topology Generation ................................... 52 55 7 Evaluation of the Modified Gnutella Protocol 7.2 Xeighbour Selection Behaviour 7.4 Performance Evaluation 8 Conclusions .................... 7.1 Turiiiig Pararrieter Selectiori . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 7.3 Convergence of Vivaldi Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 89 91 ..................................................... References ......................................................... List of Tables Header structure of Gnutella binary messages. . . . . . . . . . . . . . . . . . . . . . 10 2.2 Gnutella pong message structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Gnutella push message structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1 2.4 2.5 2.6 4.1 Gnutclla query message structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Gnutella query hit message structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Gnutella query hit result set structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Modified header structure of Gnutella binary messages. . . . . . . . . . . . . . 31 . . . . . . . . . . . . . . . 57 7.1 Simulation Parameters for a Stable 42-Peer Network 7.2 7.3 Connectivity at 200 Seconds Simulation Parameters for a Dynamic 42-Peer Network . . . . . . . . . . . . . 63 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 ; List of Figures 2.1 Client-Server networking paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 2.3 Physical and logical topologies in P2P communication . . . . . . . . . . . . . . 4 6 P2P networking paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 Gnutclla coriricctiori exchange. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1 Gm~tclla conricct message bearing Vivaldi data. . . . . . . . . . . . . . . . . . . . . 31 4.2 Seighbour selection algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.1 Relationship between ns.2, GnutellaSim and Gnutaldi . ns-2 is the fundamental building block for the other two simulators. GnutellaSim is a set of classes that extends ns-2 to include simulation of the Gnutella network . Gnutaldi is an extension of GnutellaSim which adds higher perfomiance a i d i~riplerrieritsthe Vivaldi coordinate systerri a i d the proposed neighbour selection algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.2 5.4 Class hierarchy for Gnutaldi protocol layer modules . . . . . . . . . . . . . . . . . 39 Conncctio11 sclcctiorl process iriiplernented witlliu GnutellaApp as part of the neighbour selection algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.5 With every received Gnutella message, GnutellaApp updates its estimate of the distance to the originating node . It also updates the local coordinates and crror estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.6 Class hierarchy for Gnutaldi message modules . . . . . . . . . . . . . . . . . . . . . . 47 5.7 Class hierarchy for Gnutaldi's Vivaldi-related modules . . . . . . . . . . . . . . 50 . 7.1 Median relative RTT prediction error as a function of time for c = 0.01. 56 5.3 Class hierarchy for Gnutaldi application layer modules . . . . . . . . . . . . . . 41 , 7.2 Median relative RTT prediction error as a function of time for c = 0.10. 57 Median relative RTT prediction error as a function of time for c = 0.25. , Median relative RTT prediction error as a function of time for c, = 0.50. Median relative RTT prediction error as a function of time for c, = 0.75. Connection drop events for a stable network of 42 Gnutella servents. Connection drop events for a dynamic network of 42 Gnlltclla scrvents. Median relative RTT prediction error as a function of time for a 92node network with 42 stable Gnutella servents running the neighbour selection algorithm. Each node has up to 8 neighbours. . ... .. . ..... .. Median relative RTT predict,ion error as a function of time for a 92node network with 42 Gnutella servents running the neighbour selection algorithm. Each node has up to 8 neighbours. The peers have a 10% chance of leaving the network after a successful query. Each subfigure represents a different iteration of the sinmlation, operating on a diffcrcnt physical network. ... ... ..... . ...... . ..... . ..... .... ... . Average number of nodes reached by queries sent early in the simulation with 42 stable Gnutella servents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Average number of query hits received for queries sent early in the siiiiulatio~iwith 42 stihlc Gnutella sorvc:nts. . . . . . . . . . . . . . . . . . . . . . . . Average number of nodes reached by queries sent with 42 stable Gnutella servents when the median RTT prediction error is above 10%. . . . . . . . Average number of query hits received for queries sent with 42 stable 10%. Gnutclla scrvcnts whcn the median RTT prediction error is at~ove Average number of nodes reached by queries sent with 42 stable Gnutella servents when the network has reached a higher degree of stability. . . . Average number of query hits received for queries sent with 42 stable Gnutella servents when when the network has reached a higher degree ofstability. ................................................... Average number of nodes reached by queries sent with 42 stable Gnutella servents when the network has completely stabilized. . . . . . . . . . . . . . . . Average number of query hits received for queries sent with 42 stable has stabiliaetl. . . . . . . . 77 Gnntclla scrvonts whcn thc ~ictwork c.o~~iplctc?ly 7.18 Average riu~nberof nodes reached for queries sent with 42 dy~iar~iic Gnutella servents when the median relative RTT prediction error is above 100%. At this stage in the sim~dation,only 21 servents are online. The median relative RTT prediction error at instant 200 s , when these queries originated, is 497,760.252921. . . . . . . . . . . . . . . . . . . 7.19 Average number of query hits received for queries sent with 42 dynamic Gnutella servents when the median relative RTT prediction error is above 100%. At this stage in the simulation, only 21 servents are online. The median relative RTT prediction error at instant 200 s , when these queries originated, is 497,760.252921. . . . . . . . . . . . . . . . . . . 7.20 Average number of nodes reached for queries sent with 42 dynamic Gnutella servents when the median relative RTT prediction error is above 10%. At this stage in the simulation, only 21 servents are online. Tlic riledian rclative RTT prediction error at iristarit 300 s, when these queries originated, is 0.122698. ................................. 7.21 Average number of query hits received for queries sent with 42 dynamic Gnutella servents when the median relative RTT prediction error is abovc 10%. At this stagc in thc: si~inilat,ion, only 21 scmwits are o~iliiic:. The median relative RTT prediction error at instant 300 s, when these queries originated, is 0.122698. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.22 Average number of nodes reached for queries sent wit,h 42 dynamic Gnntella servents a t instant 1,400 s. At this timc, thc network most closely resembles real-world network conditions. The median relative RTT prediction error is 0.051884. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.23 Average number of query hits received for queries sent with 42 dynamic Gnutella servents at instant 1,400 s. At this time, the network most closely rcscrnblcs real-world network conditions. Tlie ~ncdiaiirelative RTT prediction error is 0.051884. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.24 Average number of nodes reached for queries sent with 42 dynamic Gnutella servents at instant 1,900 s. At this time, the network most c1ost:ly rcscnibles real-world notwork contlitions. Tlic rnctliaii rc~lative RTT prediction error is 0.084133. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.25 Average nunher of query hits received for queries sent with 42 dynamic Gnutella servents at instant 1,900 s. At this time, the network most closely resembles real-world network conditions. The median relative RTT prediction error is 0.084133. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 List of Symbols c : Vivaldi tuning parameter controlling the magnitude of the response to each new , sample c : Vivaldi tuning parameter controlling the weighting of new samples in error cal, culations 6 : Vivaldi timestep di : Degree of node j e2 : Squared error on a Vivaldi sample E2 : Squared error for the entire Vivaldi system ei : Local Vivaldi error e, : Relative Vivaldi sample error Fij Force node j exerts on node i : R : Sampled round trip time w : Vivaldi sample weight Xlll ... List of Abbreviations AS: Autonomous system BGP: Border gateway protocol DNS: Domain name system GUID: Globally unique identifier HTTP: Hypertext transfer protocol ICMP: Internet control message protocol IP: Internet protocol. version 4 LAN: Local area network NCS: Network coordinate systcm ns-2: Network sir~iulator 2 NSA: Seiglibour sclection algoritl~nl P2P: Peer-to-peer PDNS: Parallel distributed network simulator QoE: Quality of experience QRP: Query routing protocol (in Gnutella) RTT: Round trip time TCP: Transmission control protocol xiv TTL: Time to live (field in a Gnutella messages) UMASS: University of Massachiisetts (refers to a model for peer behaviour) Chapter 1 Introduction With the advent of peer-to-peer (P2P) networks, the landscape of data communications has been radically altered. Since the popularization of the technology through the Napster file sharing rictwork [I], P2P has grown to be the leading source of traffic in the Internet [2]. Although file sharing is still the most popular application, P2P technology has found niches in distributed processing [3], [4], online chatting [5], [6], and gaming [7], [8], for example. Furthermore, because P2P networks rely on a tlistributctl overlay ~ictwork, h y mliibit a high dcgrcc of to1cranc.e to rantlorn 11odc t failure: an alternate path or location can usually be found for the desired resource. Gnutella is one of the most popular P2P file sharing protocols. It is an open standard protocol inlplen~entedby many vendors, such as Limewire and Bearshare. In additrim to sharing contcnt, thousands of nodcs on thc Gnutclla nct,work collaborate to forward control messages, such as queries, through the Gnutella overlay topology. This topology is formed as nodes learn the addresses of other nodes from a bootstrap server and from the nodes they are already connected to. This process does not take into consideration the underlying physical topology and, as such, can lead to iricficierit network utilizatioii [9]. Sodes arc as likely to connect to distant riodcs as to close ones, which results in longer message latency. CHAPTER 1. INTRODUCTION 1.1 Objectives In this thesis, a rnotlificatiori to the Griut,ella protocol is proposed in order to more closely align the overlay topology with the underlying physical topology. The proposal uses the Vivaldi coordinate system [lo] to assign synthetic coordinates to each participating node in the Gnutella network. The Euclidean distance in this coordinate system may be used to predict the round trip time between two nodes. With these modifications, every time a node sends a message, it includes its Vivaldi coordinates in addition to the message payload. Thus, the receiving node, knowing its own coordinates, can estimate the round trip time to the sending node. Nodes therefore have the means to decide whether to accept connection requests based on node proximity. In order to evaluate the perforrnarlcc of the proposed protocol modification, we extended an existing network simulator called GnutellaSim [ll]to create a new simulator: Gnutaldi [12]. This simulator models the Gnutella version 0.6 network. We rewrote significant portions of the code in order to implement statistics gathering logic and enhancc thc pcrforrnancc of the sirnulator. W(: also rctlesigned thc rmssagc generation and parsing classes in order to make the code faster and more maintainable. The redesigned message classes included the proposed implementation of Vivaldi coordinates. The augmented protocol agent classes included routines for maintaining the coordinates at each node and inserting thcm into messages. The BRITE topology generator [13] was used to create a physical network topology and evaluate the speed with which queries were satisfied. For each scenario. the performance of the network with and without the proposed enhancements is compared. 1.2 Organization of the Thesis Chapter 2 presents background information on themes relevant to this research. In Chapter 3, details are provided on the operation of the Vivaldi coordinate system. Cliapter 4 iritroduces the proposed modifications to the Grlutella protocol are introduced. In Chapter 5 the architecture of the simulator is outlined. Sext, in Chapter 6 the topic of synthetic network topology generation is discussed. Simulation results appear in Chapter 7. Finally, the conclusions of this work appear in Chapter 8. Chapter 2 Background This chapter provides background information on themes relevant to t,llis thesis. The first subsection discusses the fundamentals of P2P communication. Subsection 2.2 providcs details about tlic Gnutella P2P network. An introduction to network simulation tools and the ns-2 simulator [14] is given in Subsection 2.3. Finally, Subsection 2.4 presents the GnutellaSim sim~lat~ion package, which was used in this research. P2P Networks In the last several years, P2P (P2P) networks have emerged as a new model for computer con~n~unication. Radically departing from the traditional client-server paradigm. P2P givcs c:acll iictwor~kparticipant significant autono~riy t r d i ;ti1 iwli;t~ic.otl in tlic rolc fundamental operation of the network. P2P has achieved remarkable popularity in the short time it has been in use and has found applications in file sharing, distributed data processing, and online gaming, for example. P2P networks are also related to ad-hoc and scnsor networks becausc t,hey employ a deccntralixcd, distributed mode of communication. P2P communication has been a strong disruptive force in networking and, as such, is an important research topic. This subsection presents important P2P properties and applications. CHAPTER 2. BACKGROUND Figure 2.1 : Client-Server networking paradigm. 2.1.1 P2P Network Properties A fundamental premise of P 2 P networks is to allow nodes at the edge of the network to collaborate together in a decentralized fashion. P 2 P networking causes systems to peer together and form a network where there is no concept of a client or server: nodes both provide and consume resources. P 2 P nodes also act as routers in the P 2 P topology, forwarding traffic destined for other peers through the network. A traditional client-server configuration is shown in Fig. 2.1. With this model, the server holds the content or resource of interest to the clients. If the server fails or if its communication link ceases to function, all clients are deprived of the server's services. P2P networks, conversely, do not have a central point of failure. Resources and services are distributed anlongst the peers participating in the network. The flat hierarchy through which P 2 P nodes relate is shown in Fig. 2.2. This hierarchy is in sharp coritrast to the clicnt-scrver. paradigm rcprcseritcd in Fig. 2.1. P 2 P networks are formed as an application-layer overlay, superimposed on the ex- isting physical infrastructure of routers and links. This overlay, sometimes called the logical topology, is the conduit for all message exchanged between participants in the nctwork. Pcers associate by forming nelghbowr relationships. Each p e r is oiily awarc: of its neighbours and only communicates directly with them. The application-layer messages between neighbours are routed through the physical topology by networklayer devices (routers). An example of relationship between physical and logical topologies is shown in Fig. 2.3. In some cases, logical links follow physical links, as in the connection between nodes 1 and 6. In other cases. nodes that are neighbours CHAPTER 2. BACKGROUND Figure 2.2: P2P networking paradigm. in the application layer overlay, such as 1 and 5 , are separated by numerous hops in the physical topology. A variety of devices may ,join a P2P network. The most common example is lionie cornputcrs coriricctcd to thc Internet. Interrict-enabled iiiobile phones iiiay also participate in P2P exchanges. Because these devices at the edge of the network are not always in use, they are not always connected to the network. Transient node presence and the associated network variability are inlportant characteristics of P2P comn~unication.This property further sets P2P networks apart from the traditional client-server paradigm, where servers are reliable network entities that may always be contacted at the same address. While the steadfastness of the server approach may be an attractive feature, it leads to an architecture with a single point of failure. If a server malfunctions or its link to the network fails, the services provided by the server will be disrupted for all users. The distributed nature of P2P networks, conversely, guards against this type of failure scenario: there is no single point of failure. Content or services may be provided by multiple nodes at different locations in the network. The robustness of P2P networks has provided a haven for users interested in illicit activities such as the illegal distribution of copyrighted material. Sirice rcmovi~ig any single entity does not disable the network, it has proven very difficult for copyright owners to halt the undesired distribution of their works. P2P file-sharing is of great concern to many network operators because of the large amount of traffic involvcd. Some Intcrnct service providers have found that, at times, 90% of the traffic transmitted over their network is due to P2P applications [2]. This CHAPTER 2. BACKGROUND physical logical Figure 2.3: Physical and logical topologies in P2P communication. traffic is taxing network resources and making them less available for services that generate revenue for service providers. such as long-distance voice traffic or virtual private networks (VPNs) provisioned for customers. Furthermore, much of the P2P traffic is neither originating nor terminating in the particular provider's network: it is only being routed through a P2P node residing in that network on its way to its final destination. While policies may prevent this type of behavior for layer-3 mutirig by iiot advertising routes through tlie provider's autonomous systerri (AS) to other ASS via Border Gateway Protocol-BGP, routers are riot aware of applicationlayer routing decisions made in the P2P network and, therefore, cannot intervene. Measurements in the Gnutella P2P network have shown that less than 5% of Gnutella connections liiik riotles that arc in t,lia sairic AS [9]. Thus, P2P traffic ofton csossos AS boundaries. which is more costly than intra-AS traffic from the service providers' point of view. Because of the providers' need to control P2P traffic, many network equipment manufacturers, such as Cisco/P-Cube [15] and Caspian [16], are developing devices that may identify P2P traffic flows and apply more stringent policies. CHAPTER 2. BACKGROUND 2.1.2 Applications of P 2 P Networks While there any many applications of P2P technology, one of the niost cormion is undoubtedly file-sharing. The popularity and notoriety of the Napster [I] file sharing application was largely responsible for making P2P a household word. Started in 1999, the Napster network aimed to help Internet users exchange digital music files [17]. It rclicd on a central server for processing queries, but thc actual file exchange was done on a P2P basis, without the files ever passing through the server. Because of concerns surrounding illegal music trading on the Napster network, the Recording Industry Association of America filed a lawsuit against Napster, charging that the company had engaged in tributary copyright infringement [17]. Further to the lawsnit. Sapster suspended operations for a time; nevertheless the era of P2P networks was launched. There are many examples of P2P technology being used for file-sharing. The Gnutella network [IS]is an open-standard file-sharing conlmunity used by applications such as Limewire [I91 and Bearshare [20]. Unlike Napster, it does not rely on a central server for query processing. Queries are forwarded through a logical overlay network consisting of Gnutella peers. These peers act not only as clients and servers, but also as forwarding agents for Gnutella control traffic. The actual downloading of files is done by direct communication between the peers involved. BitTorrent [2l], another popular file-swapping program, may download scgmcnts of the desired content from multiple peers at the same time. It is also innovative in that BitTorrent peers penalize nodes that do not share sufficient content by reducing their download rate; so called "file leaches" are, therefore, less successful. To further cncouragc sharing, users' BitTorrent clicnts offer the rcccivcd parts of partially downloaded files for download by other peers: peers do not have to have the entire file to share parts of it. Chord [22] is a P2P file sharing network that addresses the issue of efficiently locating the node or nodes that store content by employing a distributed hashing algorith~n. Chord ruay resolve lookups by scndirig orlly 0 (log N) nicssages. Each Chord node maintains a routing database for other nodes that grows logarithmically with the size of the network. Although Chord is not widely used, its efficient lookup mechanism is very promising. CHAPTER 2. BACKGROUND 8 The Freenet network 1231 protects the anony~riity those shari~ig of and dow~iloading content using its clients. The stated intent of its designers is t o allow users to publish and download content without fear of censorship 1231. Version 0.7 features a scalable "darknet". Darknets are file sharing networks where nodes only connect to trusted nodes. Since human relationships (and consequentJy trust) create small-world networks 1231, 1241, Freenet may respect trust and still find a short path between two peers participating in the network [23]. Many terabytes of data are shared on these file-sharing networks and they account for an appreciable amount of the traffic on Internet. as disc~lssedin Sllbsection 2.1.1. In addition to file sharing, P2P has found applications in online gaming. Researchers have developed a P2P version of Xiangqi (Chinese chess) [7] 181, for example, which allow users to interact with other users at the edge of the Internet without en~ploying central server. a Orlli~ic chatting systems such as MSN Messe~igcr[li] and ICQ [6] arc further applications of P2P technology. While they do rely on central servers, they link resources (in this case people) at the edge of the network, which is the essence of P2P. Similarly, in keeping with this definition of P2P, applications that use the aggregate processing powu of computers tlhroughout the Iiitcmet arc oxa~nplcsof P2P t c ~ h o l o g y . FoldingQHome [3] is an innovative P 2 P solution that uses the distributed processing power of thousands of computers to analyze complex protein folding and aggregation problems. SETIQHome [4] is a similar P2P system that uses the processing cycles to analyzc radio signals. There are also generic P2P frameworks, not attached to any particular application. Microsoft Research's Pastry [25] routing and location substrate forms a P 2 P overlay network and is the foundation for P2P applications such as the Scribe group communication system [26] and the Splitstream content distrubution system [27]. Sun Microsystc~~i's JXTA fra~nework1281 is a set of opcn protocols that allow P 2 P communication between devices. It has been used for a range of applications, including chatting, gaming and file-sharing. In short, P 2 P con~n~unication found many applications in today's networks. It has t,o is riot yot pervasive, hut it has certainly bccn arid c*onti~iucs bc a st,roiig tfisn~ptivc: force in computer communication. CHAPTER 2. BACKGROUND 2.2 The Gnutella Network The project tlescribed in t,his thesis involves irr~proving perforrnallce of the Griut,ella the P2P file-sharing network [29]. Accordingly, it seems appropriate to discuss some of the key characteristics of Gnutella. In subsection 2.2.1 the syntax and semantics of the Gnutella protocol are discussed. In subsection 2.2.2 details about the topology of the Gnutella network are provided, including a fundamental flaw in the way nodes associate to form the overlay. 2.2.1 The Gnutella Protocol Gnutella is a very popular distributed file sharing protocol. The number of nodes participating in the network was estimated a t about 50,000 in 2001 [9], when P2P was still a nascent technolgoy. The current widely deployed version is 0.6 [29], which is a two-tiered hierarclly of peers, tcrrned ser-vents, collaborating to sliarc files and forward protocol traffic through the network. Gnutella is an open standard and, as such, lends itself well to study and simulation in academic circles. Unlike the Napster network [I], which employed a central server to mediate communication bctwccn pccrs, Gnutclla is a distributd network. Whcn first connect,ing to the network, new nodes (known as servents) contact a "bootstrap" server to obtain the addresses of a few connected peers. However, further communication is handled through the P2P overlay without relying on servers. Once the new node has the addresses of existing nodes, it attempts to connect to them by sending Gnutella con,nect messages [18]. Two connected nodes are called neighbours. Several connection attempts may be necessary because nodes may not be willing to accept new connections or may have left the network. Once at least one connection has been established, the new node begins sending periodic Gnutella ping messages. These probes - not to be rnistakcri for Inter~ict Control Message (ICMP) ping ruessagcs - arc used to search for other nodes willing to accept new connections. They are sent by the new node to all its neighbours, which, in turn, flood them to all their neighbours. This recursive flooding continues until the ping's time-to-live (TTL) field, which is decremented at (:ad1 hop, reachcs zcro. Along thc way, any node receiving a ping and willing t,o a(.cept new connections responds with a pong message. The pong back-propagates to the originator of the ping, which may decide to attempt a connection with the pong's CHAPTER 2. BACKGROUND scrider. In order to locate content shared in the Gnutella network, nodes must send query messages. Queries contain the search criteria (e.g.. a file name) and are flooded similarly to ping messages. When a node receives a query that matches a resource it t shares, it responds with a query h ~ message, which is back-propagated to the query originator. The originator may then decide to download the file from the node that sent the query hit. This is done directly, through an HTTP-like protocol [18], and the traffic does not pass through the P2P overlay network. Incidentally, this download activity represents a large proportion of the traffic on Internet 121. Gnutella messages are carried over a reliable T C P (transmission control protocol) transport 1291. The default T C P port is 6346. although servents may negotiate a different port. The initial connection messages are sent in clear-text form, in a format somewhat similar to HTTP. Figure 2.4[29] shows a sample connection transaction. Seruent A is attcrriptirig to establish a Griutella connection witli Sel vent B. Aftcr opening a T C P connection, A sends a Gnutella CONNECT message, indicating its version (0.6)) the Gnutella application (Bearshare, version 1.0) and the protocol options it supports (pong-caching and GGEP). Servent B responds with an OK message Thc: firial OK riicwage fro111sc:rvc:~it A coriclntlcs ant1 a list, of it,s supported oxt,c~risioiis. the three-way handshake and establishes the Gnutella connection. The Przuate-Data headers encapsulate vendor-specific information. After the connection has been established servents exchange binary messages. The hcader structure for thcsc: messages is shown in Table 2.1 [29]. Table binarv messages. Header Octets Description Message globally unique identifier. 0-15 16 Payload Type TTL (Time To Live) 17 18 Hops 19-22 Payload Length - The message globally unique identifier (GUID) is used to avoid forwarding the saiiic: ni(:ssag~t w i m p:ers will drop rriessages witli itlerltifiers tlioy Iiavc?c:ncouriteretl before. The payload type field identifies the type of message. The following values are applicable [29]: CHAPTER 2. BACKGROUND Servent A GNUTELLA CONNECT/O.G User-Agent: BearShare/l.O Pong-Caching: 0.1 GGEP: 0.5 GNUTELLA/O.G 200 OK User-Agent: BearShare/l.O Pong-Caching: 0.1 GGEP: 0.5 Private-Data: 5ef89a GNUTELLA/O.G 200 OK Private-Data: a04fce Servent B Figure 2.4: Gnutella connection exchange. 0x00 = Ping 0x01 = Pong 0 0x02 = Bye 0x40 = Push 0 0x80 = Query 0 Ox81 = Qucry Hit The TTL field is a mechanism for limiting the scope of messages. It is initialized to a positive value, normally 7 [9]. Each time a message is forwarded by a servent, the TTL field is decremented. A servent will not forward a message with a TTL of zero. TTL fields are used widely in networking protocols such as IP. The hops count la stored in octet 18 of the header keeps track of how many times a G n ~ ~ t e l message has been forwarded. It is initialized to zero and incremented by every servent that floods the message to its neighbours. The payload length field indicates the length of the message following the header. CHAPTER 2. BACKGROUND 12 Other than optional extensions, ping messages do not contairi a payload. They are simply probes to find servents willing to accept new connections. Pong messages are a reply to pings. They indicate that a servent is willing to accept a connection and they provide information about the responding peer. Table 2.2 shows the pong mcssagc structure [29]. Table 2.2: Gnutella pong message structure. Octets Description Port number on which tho peer will accept connections. IP Address of the responding peer. Number of shared files. Number of kilobytes shared. Optional protocol extension. Bye mossagcs arc an optional indication that a peer wishes to tcrminatc a Gnutclla connection. They bear no payload and are always sent with a TTL of 1, so that they are not accidentally propagated beyond the intended target [29]. Push messages are used to download files from servents that are unable to accept incoming connections [29]. Nodes t,hat are protected by a firewall, for example, would fall into this category. The push message instructs the receiver to open a connectioil to the specified peer and transfer the indicated content. The structure of push messages is shown in Table 2.3. Table 2.3: Gnutella push message structure. Description Servent identifier for the target of the push message. The index identifying the desired content. The IP address of the requesting peer. The TCP port the requestsing peer is listening on. Optional protocol cxtcnsion. Octets 0-15 16-19 20-23 24-25 26- Query message are the used to locate content in the Gnutella network. Their structure is shown in Table 2.4 [29]. Qucry hits arc scrit in rcsponsc to query messages. They iridicatc that tlic r c quested content is available at the originator of the query hit. The structure of query hit messages is shown in Table 2.5 [29]. CHAPTER 2. BACKGROUND Table 2.4: Gnutella query message structure. Octets Description 0- 1 The minimum download speed required (in kbps). 2NUL-terminated search criteria string. Others.. . Oytio~ial protocol extension. I I ' I Octets 0 1-2 3-6 7-10 11- d e 2.5: Gnutella query hit message structure. Description Nu~iiberof niatcl~ing files. The TCP port to use for download requests. The IP address of the responding peer. The speed (in kbps) of the responding peer. The matches, presented as shown in table 2.6 1291. The previous release of Gnutella, version 0.4 [18], was a flat hierarchy of servents where all peers were considered equal. Powerful computers with multi-megabit connections to the Internet were treated in the same way as piddling home computers with 56-kilobit dial-up connections. With the increase in Gnutella's popularity, and the concomitant increase in network traffic, underpowered computers with slow connections began to be overwhelmed with the task of forwarding Gnutella control messages. As a result, a two-tiered network was inlplemented in version 0.6 of the protocol, with two classes of peers: ultrapeers and leaves. Ultrapeers shield their attached lcavcs fro111 P2P network traffic that is riot rclcvant to them. Leaves only keep a few connections to ultrapeers open [29] and these ultrapeers only send queries to leaves they think may satisfy them. Leaves do not forward queries or any other Gnutella control traffic, which reduces the load on their limited resources. In order to dctcrminc which qucrics should be forwarded to leaves, ultrapecrs network normally use the query routing protocol (QRP). Leaves construct a hash table containiig entries for each of the words in the names of the resources they are sharing. This hash Table 2.6: Gnutella auerv hit result set structure. Description Index assigned by the rcsponding that idcntifics the file. Size of the file in bytes. Thc null-terminated file name striilg. 8Others. . . Optional protocol extensions. A u Octets 0-3 4-7 CHAPTER 2. BACKGROUND 14 table is communicated to a leaf's ultrapeers. Ultrapeers will forward queries to leaves according to search criteria matches against this table [29]. According to the specification [29], ultrapeer election is based on the following criteria: Firewall protection The peer must not be shielded by a firewall. Operating system Certain operating syst,enis have more scalable socket implement,ations than others [29]. Thcse include Linnx, Windows 2000/NT/XP, and Mac OS/X. Bandwidth Eligible peers should have at least 15KB/s downstream capacity and 10KB/s upstream bandwidth. Uptime, stability Peers should have been in the network for at least a few hours. Sufficient memory and processing power Forwarding control traffic requires nieniory and CPU cyclcs. The above rules are quite loosely specified, and it is the responsibility of each implementation to define the precise conditions under which a Gnutella client should become an ultrapccr. Since ultrapcer election is donc in a distributcd systcm, each node bears the responsibility for choosing to become an ultrapeer or not [29]. Ultrapeers may change roles to become leaves once again if the implementation deems it appropriate. The Gnutella 0.6 network may interwork with legacy 0.4 servents. These servents establish neighbour relationships directly with ultrapeers, and behave in exactly the same way as if they were peering with 0.4 implementations. Ultrapeers essentially treat legacy peers as ultrapeers. 2.2.2 The Gnutella Network's Topology The Gnutella network has been extensively studied. Because it is an open standard, it lends itself particularly well to data collection and analysis. One especially cornpreliclisivc. arialysis was contluctctl at thc Univt:rsit,y of Chicago [9]. By dt:vc:lopirlg a crawler program, researchers were able to use the ping and pong messages exchanged in the Gnutella network to gather information about its topology. According CHAPTER 2. BACKGROUND 15 to these measurements, madc in 2001, the Grlutella network had about 50,000 nodes in its largest connected component. Furthermore, the crawler compiled a list of some 400,000 nodes that had been active at some point during the seven month study [9]. The Gnutella network was found to exhibit power law, or scale-free, properties. Power law distributions were studied e~t~ensively Parcto [30] and are govcrned by: by The power law states that an attribute f of vertex v in a network is governed by the attribute y of that vertex, raised t o the constant negative power Ic. In the case of the Gnutella network, the node degree distribution obeys a power law [9]: the number of nodes in the network exhibiting a particular node degree diminishes according to a power law as the node degree increases. If D is a node degree and f D is the number of nodes with D neighbours, the power law implies that where a and Ic are constants. This is similar t o one of the power laws observed by Faloutsos et al. [31], for the AS-level topology of the Internet. The World Wide Web graph, whcrc the vcrticcs are thc web pages and the cdgcs arc the hypcrlinks, also exhibits a power law distribution [32], [33]. Power law networks are common, even beyond the field of communication networks, and their properties have been well studied. Power laws have been observed in cellular participation in biochemical reactions, Hollywood actor collaboration, protein regulatory networks, technical paper co-authorship, and sexual contacts [24]. Power law networks are highly resilient in the face of random node failure [24]. It has indeed been shown that a large proportion of the nodes in a scale-free network may be removed without severely interrupting network connectivity. Conversely, these networks arc highly susccptiblc to thc selectivc rernoval of a srliall rluirlbcr of hub nodes. These are the rare nodes with a very high number of neighbours. The failure of these nodes may catastrophically disrupt the network. Nodes in the Gnutella network are generally close t o each other. It has been observc:tl that 95(% of node pairs arc less tllan 7 hops apart ['!I. Sincc thc most common TTL value used in Gnutella messages is 7, this implies that almost all flooded messages reach almost all Gnutella nodes [9]. This traffic traffic is significant, not only CHAPTER 2. BACKGROUND 16 in the Griutella network, but in Internet as a whole [9]. Gnutella control traffic alorie accounted for about 1.7% of the estimated total traffic on the United States core network in 2000 [9]. This does not even include file transfers, which consume orders of magnitude more bandwidth than control traffic. Given the sheer quantit,y of traffic being exchanged on the network, it seems appropriate to investigate the efficiency of the Gnutella overlay. Efficiency, in this connection, refers to the efficient utilization of network resources during message exchange. An example of how the Gnutella network's topology may lead to inefficient use of network resources is shown in Fig. 2.3. All communications in the Gnutella network rely purely on the topological information known at the application layer. Hence, nodes may only send messages directly to their logical neighbors. If node 1 needs to retrieve content stored in node 2, its messages must first pass through node 5. This represents only two hops in the logical topology. In the physical topology, this corresponds to a t least 5 hops: either (1, 6 , 7, 5, 4, 2) or (1, 3, 4, 5, 4, 2). The inefficiency is particularly egregious in the second path where the physical link between nodes 4 and 5 is traversed twice. If nodes 1 and 2 had elected to be neighbors in the logical topology, only a single physical hop would have been required and no links would liave hco11 traversed twice. This would liave rcsultetl in 1owc:r ~liessagc:latency and a more efficient use of network bandwidth. It has been observed that less than 5% of Gnutella overlay links connect nodes that are in the same autonomous system (AS) [9]. Aut,ononious systems, being controlled by a single administmtivc: authority, imply a ccrtain notion of locality. Intra-AS communication is oft,en faster than communicatin between ASS. It is also less expensive for network operators. Another potential indicator of locality is the domain name hierarchy [9]. The round trip time for comnlunication between two hosts on the same domain (e.g., sfu.ca) is expected to be smaller than for hosts in distinct domains. Tlius, if Griutclla hosts established neighbour relationships with hosts or1 tlie same domain: they would incur lower communication costs. Ripeanu and Foster analyzed network entropy to test if the Gnutella network contains a notion of hierarchy [9]. They define the entropy of a set C of size ICI as (-pi log (n)- (1 - pi) log (1 - pi)) , (2.3) i=l is the probability of randomly selecting a host with domain i and n is the S(C) = c n where pi CHAPTER 2. BACKGROUND 17 riurriber of dist.inct domain narries [9]. The entropy of a network with ICI rlodcs and k clusters is defined as The entropy with clustering (2.4) around highly connected nodes in the Gnutella overlay was not lower than the entropy without clustering (2.3). Herice, Gnutella nodes cluster independently of the domain hierarchy [9]. The two experiments [9] show that the Gnutella overlay topology is not well matched t o the underlying physical topology. As a result, Gnutella uses network resources ir~fficicntly.Not only docs this llavc an adverse effect 0 1 1ietwor.k 1 utilixa- tion as a whole, but it also negatively impacts users because their messages take longer to circulate through the network, which delays their eventual download of the desired content. It is clearly desirable to improve thc way the Gnutclla overlay ~ ~ st,hc underlying cs physical infrastructure in order to achieve better network utilization and improve users' quality of experzence (QoE). 2.3 Network Simulation The conlplexity of real-world networks often precludes the use of closed-form mathematical models [34]. It is not feasible, for example, to find an equation that describes the behaviour of the Internet. The use of testbeds of realistic scale is also generally not possible, given the large number of systems involved. It would be prohibitively expensive to construct a 50,000-node network to model the Gnutella network, for instance. Thus, research must often rely on simulation in order to approximate the networks of interest. Sirriulatiorl prcscrits a riuilibcr of potcrltial pitfalls. 111 particular, using an overly simplified simulation model may cause critical aspects of Internet behaviour to be overlooked [35]. Also, if many researchers rely on the same simulator, they risk being affected by the same software defects and underlying assumptions [35], which may lead to erroneous conclusions. Nevertheless, simulation docs play an important role, when measurement and rigorous mathematical modelling is not possible. CHAPTER 2. BACKGROUND 18 In this subsectiori the simulation tools used in investigating the effects of the proposed enhancements to the Gnutella network are discussed. 2.3.1 Opnet Opnet [36] is one of the most popular tools used in industry. It is a powerful simulator with a rich library of simulated network devices and support for user-defined state machines. Its flexibility is limited, however, because tlie user canriot access and modify the source code as with open-source simulation packages. 2.3.2 SSFNet SSFNet [37] is a collection of open-source models for protocols and network elements. It is implemented in Java and utilizes SSF, the Scalable Simulation Framework. While the underlying frariiework has a high-perfornlance C++ binding, tlic Java-based iiiodels raise important scalability concerns due to the inferior performance of Java in speed-sensitive applications. 2.3.3 Dedicated P2P Simulators There are a number of tools dedicated to P2P network simulation. Anlong them is y-sim [38],which achieves high scalability by neglectirig packet-level clctails. 3LS [39], the "three-layer simulator", conversely. is very much predicated on the importance of such details. Simp2 [40] is another P2P simulator, which in intended to simulate simple file sharing networks. It does not, however, take into account the transient prc:w?nce of r~otlcs,which is a clc?finirig characteristic of P2P notworks. Thcrc. arc1 many other simulators, but overall, no dedicated P2P network simulator has achieved wide acceptance in the research community. The ns-2 network simulator [14] is one of the most popular discrete event simulators [all. It cvolvccl through tlie contributions of rcscarcliers at the Information Sciences Institute at the University of Southern California, and elsewhere. ns-2 has CHAPTER 2. BACKGROUND 19 received fundirlg fro111 DARPA and tlie Natiorial Science Foundation [14] arid contiriues to grow as developers contribute to its open-source codebase. Because of its extensive support for IP, TCP, mobile and routing protocols, ns-2 has gained favour in the research conlnlunity. Also, because the source code is freely available and may be modified, it is a versatile and flexible tool. The realtime-critical portions of the .ns-2 engine, such as packet and event processing are written in the C++ programming language. Scripts that drive the tests are normally written in oTCL, which is the object-oriented version of the popular Tool Command Language (TCL). This combination of C++ and oTCL allows the user to benefit from the performance of a compiled language where needed and the flexibility of an interpreted language when appropriate. Because they simulate packet-level exchanges, the processing and memory overhead of ns-2 simulations is quite high. Consequently, ns-2 is not particularly scalable. Sirriulatioris virtually grind t o a halt with more than a fcw hulidrcd nodcs. A parallcl version of ns known as PDNS [42] offers the possibility of distributing simulations on 8-16 worlcstations connected by a LAN. ns-2 was enlployed in this research for several reasons. Although it introduces scalability issues, thc lwel of detail supported by rrs-2 is important. It lias 1)c:crl shown that the performance of P2P networks is highly sensitive to the details of the underlying physical network [9], [ l l ] , [35], [43]. For this reason, more scalable P2P simulators that sacrifice the packet-level information modelled by ns-2 were ruled out. Also, the fact that it was possible to customize the sourcc cock to implement arbitrary statistics gathering where most convenient was a key feature. ns-2 is well regarded in the research community and has a large base of users available for support; this weighed heavily in the selection process. Finally, the fact that there was a Gnutella simulation framework available for ns-2 made the choice to use ns-2 clear. 2.4 The GnutellaSim Simulation Package GnutellaSim [44] is a packet-level simulator for the Gnutella network. It relies on a "scalable arid exterlsible packct-level P2P [ l l ] " sir~lulatiorlfrarncwork dcvclopcd at Georgia Tech. This framework is in turn designed to run with the ns-2 simulator, among others. The framework conlprises a number of concrete classes that provide CHAPTER 2. BACKGROUND 20 basic P2P packet forwarding arid i~ifrastructure services. It also defines a number of abstract classes, intended to be used as bases for particular P2P prot,ocol implementations, such as GnutellaSim. GnutellaSim extends ns-2's existing TCP implementation by introducing several new features [Ill, including a socket-like interface, dynamic connection establishment of TCP sessions and real payload transfer. The software is structured in a three-layer architecture. The application layer is responsible for the users' behaviour profile, and the initiation of protocol messages such as queries. The protocol layer is concerned with protocol message semantics, network formation and message forwarding. The socket adaptation layer is a bridge between the socket-like interface provided by GnutellaSim's framework to the application and the underlying ns-2 simulator. GnutellaSim relies on the so-called UMASS model [45] to characterize peer behaviour. This model, as implemented in GnutellaSim, specifies the following paramctcrs for peers: tlic avwage t,i~rie ttliey arc offline the average time they are idle (not sending qucrics) the probability of going offline after a successful query whether they share content or not (freeloaders) the number of files they share. When a query arrives at a node, the probability of the node having the requested content, and generating a query hit, is conditioned by the number of files shared by that iiode. It is possible to define multiple classes of peers, with diflerent parameter values, in the same simulation. GnutellaSim implements the Gnutella 0.6 [29] protocol, with support for leaves, ultrapeers and legacy version 0.4 peers. The role of a node, however, is specified at configuration time and is static throughout thc sirriulatio11. the GnutellaSim was used as the basis for eval~at~ing proposed modifications to the Gnutella protocol. Chapter 3 The Vivaldi Coordinate System In this chapter, the Vivaldi coordinate system is discussed. Vivaldi was used in the proposed modifications to improve the performance of the Gnutella protocol. The need for network coordinates in presented in Subsection 3.1. Several ~ictwork coordinate systems are then introduced in Subsection 3.2. Finally, in Subsection 3.3, the operational details of the Vivaldi coordinate system are presented. 3.1 The Need for Network Coordinates to Improve Network Performance As discussed in Subsection 2.2.2, the Gnutella network exhibits a iiiismatch between the physical and logical (overlay) topologies. This mismatch causes inefficient network utilization as peers establish nel:qh,bour relationships wit,liol~t consicicring thc uncierlying physical network. It would be more efficient to bias the selection of neighbours to favour nodes that are physically close, thus aligning the P2P overlay with the physical topology. Since queries and query hits will propagate faster, the niodifications should lead to an improved user QoE. Round trip time (RTT) is a cornrnon measure of closeness in networks. RTT is the time it takes for a message to propagate from the sender to the receiver and to return to the sender. The RTT is of the order of milliseconds in wireline networks. It is reasonable to base the formation of the overlay on predicted RTT, choosing neighbors to which the RTT is low. Network coordinute systems provide a rricaris to estimate CHAPTER 3. THE VIVALDI COORDINATE SYSTEWI 22 inter-node RTT without the riccd for explicit measurement. They use regular protocol communication as a means to convey information and nodes are therefore continually updating their coordinates and distance estimates to their neighbours. One simple alternative to using network coordinate systems would be to use ICMP echo ( p i n g ) messages to measure inter-node latency. It is not, however, feasible for every node in the Gnutella network to measure the RTT to every other node in the network when evaluating prospective neighbours. This would lead to 0 ( N 2 )ICMP echo messages for an N-node network. Even if that were not prohibitively costly, there is no single resource that stores all the addresses of Gnutella nodes. Hence, it would not be possible to determine all the participants in the network in order to measure a round trip time to each one. Nevertheless, when a Gnutella node learns of a potential neighbor, it should be in position to decide whether it should peer with that node. The node receiving the connection request could simply ping the request originator before deciding wlicthcr to connect. It would thus obtain an iristantaiicous RTT measurement to the originator. This ineilsurement could then be compared to the RTT to other neighbours and the closest ones could be selected. In order to ensure a valid comparison, all neighbour RTT values should be updated regularly, t,hrough polling. Such explicit rneasurciiicnts, especially wlicm coritl~ct~c~la rcgulitr on basis, can be unattractive because their overhead cost often outweighs the benefit they yield [lo]. Another potential approach would be to perform clustering based on one of the two locality indicators ident,ified in Chapter 2: autonomous systcms or domain namcs. Gnutella hosts are not aware of their AS. This information is only known by the routers propagating topology information through the network. For this reason, deliberately selecting neighbours within the same AS is not possible. Clustering based on domain names is possible, but would require a DNS lookup for every connection request, which could be onerous. With this method, it would be importarit to ensure that peers maintain a certain number of neighbors from outside their domain, otherwise the Gnutella network would become fragmented into islands corresponding to domain boundaries. Such a disconnected network would limit the horizon of searches and rctlucc the probabilit,y of locating tlesiretl contant. Clustcririg based an area for possible future investigation. 011 tloniai~i nitiiic:s is not ruled out as a viable solution, but it is not explored in this research and remains CHAPTER 3. THE VIVALDI COORDINATE SYSTEM 23 One final approach t'o be considered would be to use an RTT limit arid only accept, connections from nodes below a specific RTT value. By imposing a large enough minimum node degree n - that is by relaxing the maximum RTT requirement for the first n connections - the probability of obtaining a connected network is high [46]. The difficult,^ with this approach is choosing an appropriate RTT threshold. This could be an area for further research. Other methods notwithstanding, network coordinate systems appear to be a most appropriate tool for biasing the formation of the P2P overlay to favour connections between nodes that are close together. 3.2 Network Coordinate Systems Network coordinate systems (NCSs) assign coordinates to each node in t,he system and use the distance (Euclidean, Manhattan, Rlahanalobis) to estimate the physical distance (RTT) between nodes [lo]. Consider a system where coordinates are represented as 4-dimensional vectors. There are two nodes: N1, with coordinates (2, 3, 4, 7)and N2, with coordinates (4, 3, 1, 2). Using the Euclidean distance function, the rourid trip tiillc bctweeri the two nodes is estiillated as: RTT = = J(2 - 4)2 + (3 - 3)2 + (4 - + (7 - 2)? J38. (3.1) Many coordinate systems rely on fixed infrastructure nodes in order to calculate nodc coordinatcs. Global Nctwork Positioning (GNP) [47], Nctwork Positioning System (NPS) [48] and Lighthouse [49] use landmark or beacon nodes as reference points for coordinate calculation. Network participants derive their coordinates by measuring their RTT to landmarks. While these systems predict RTTs between nodes with some success, their reliance on fixed infrastnicture nodes makes them incompatible with the P2P paradigm. There are also coordinate systems that are fully decentralized, with no dependence on infrastructure nodes. The Practical Internet Coordinates (PIC) system [50] does not rely on fixed landmark nodes. Nevertheless, it is oversensitivity to changing network conditions, wliich rnay r~iake unsuitable in dyiia~riic it network coilditiorls [lo] such as P2P environments. Vivaldi [lo] is another decentralized coordinate system, CHAPTER 3. THE VIVALDI COORDINATE SYSTEM 24 wliich is used by the Chord [22] P2P network's lookup algorithni. Vivaldi is presented in detail in Subsection 3.3. 3.3 Vivaldi Operation Vivaldi [lo] is a decentralized system used to assign synthetic coordinates to nodes participating in a network. It does not rely on fixed infrastructure nodes and. as such, may be be suitable for P2P networks. The Vivaldi coordinate system employs the Euclidean distance between two nodes' coordinates to estimate the RTT between tlicni (3.1). This subsection presents the opcratio~ial details of the Vivaldi algorithm. 3.3.1 Error Minimization The error for a coordinate pair is defined as the difference between the predicted RTT (the coordinates) and the actual RTT [lo]. The squared error function is e = (R2 //xi -xjII) , where R is the actual RTT, and x, and x, are the two coordinates. If R,, is the actual RTT value between nodes i and j , and x, and z, are defined as the coordiiiates of i and j, thcn thc sqllarcd crror E for t,hc system is [lo] The Vivaldi algorithm employs the squared error function because it is analogous to spring relaxation in a physical spring-mass system [lo]. Thcsc associations follow if a spring is placed between each pair of nodes for which latency measurements exist [I0 : 1 Length of the spring: models the distance between the nodes given their current coordinates Spring rest position: occurs when the coordinat,es predict the RTT with zero error. Potential energy of the spring: the square of its displacement from its rest position. Models the error in the coordinate pair. CHAPTER 3. THE VIVALDI COORDINATE SYSTEM 25 Potential energy of the system: the squared-error function (3.3). Minimizing this function gives optimal coordinates. 3.3.2 Spring Relaxation and Coordinate Adjustment The Vivaldi algorithm models the movements of the nodes under the forces applied by the conceptual springs between them [lo]. The algorithm seeks to minimize the potentkl energy of the spring system. Let Fij be the force that node j exerts on node i. Hooke's law [51] states that the force is proportional to the spring's displacement from its rest position, and in the opposite direction. Hence, the force is [lo] where (A,, - JYC, - YC,the magnitude of the spring displacement and u (.c, 1) is - .c,) is a unit vector in the direction of the force [lo] (pushing i along a line connecting it to j, either closer or farther). If nodes have identical coordinates, u (x, - z,) is defined as a unit vector in an arbitrary direction [lo]. Since the actual round trip time R,, is not known. nodes adjust their coordinates in respoiise to sati~plcdRTT values lear~iedfrom corrmur~icatioiiwith otlicr tiodcs. Based on these samples, nodes allow their coordinates to be "pushed" for a short time 6 by the inter-node force (3.4) [lo]. For a sample RTT r,, between nodes i and j , node i will adjust its coordinates to: 3.3.3 Vivaldi Algorithm With every message sent, nodes participating in the Vivaldi process append three additional values: their coordinates, their estimate of the error on those coordinates, arid a tirliestatlip so that thc receivcr can calculate the RTT. A node ~eccivitiga message will apply the Vivaldi algorithm, which can be summarized in the following steps [lo]: 1. Calculate the credence to be given to the new sample: If the sample bears a high error; the receiving node will in response adjust its coordinates only slightly. CHAPTER 3. THE VIVALDI COORDINATE SYSTEM 26 Node rrioverr~entis also conditiorml by the local error. If a node has high local error, it will give more weight to reports from other nodes. The sample weight [lo]is defined as the ratio of the local and sample errors: w=ei + ej ' ei (3.6) where ei and ej are the local and remote errors, respectively. 2. Calculate the relative error of this sample: Based on the RTT predicted by the coordinates [Isi xj 11 and the measured RTT ( r t t ) [ l o ] : 3. Update the local error: where c, is a tuning parameter 4. Update the local coordinates: The amount to move the coordinates 6 is a constant proportion c, of the calculated sample weight (3.6). The recommended value c, = 0.25 is basctl oil crnpirical observation [10].The suggc~stctl iliitial value for 6 is 1. The coordiiiates are updated as [ l o ] The adaptive timestep 6 eniployed by Vivaldi kelps to achieve fast convergence and low oscillation by "bclicving" nodes with relatively low error morc than nodcs with high error [ l o ] . 3.3.4 Vivaldi Coordinate Types After examining Euclidean and spherical coordinates, and evaluating their performance, the authors of the seminal Vivaldi paper [lo]chose to employ Euclidean coordinatw with bight vcctors. These height vc?ctors rnodel packot propagation tfirnc through the access link into the Internet's core. Height vectors redefine some of the usual vector operations as follows [lo]: CHAPTER 3. THE VIVALDI COORDINATE SYSTEM 1. Dinerenee is defined as [lo]: where s and y are Euclidean coordinate vectors, and sh and yh are their height coniponeiits. It is worth rioting that whilst the Euclitlean coortlinat,es are subtracted as usual, the height components are added together in the difference operation. In Euclidean space, if a Vivaldi node is too close to nodes in opposite directions, the forces applied on it will cancel out [lo]; with height vectors, the forces will push the node "up": itts height componcnt will increase even in the presence of equal and opposite forces. 2. Magnitude is defined as [lo]: where sh is positive value. 3. Scaling is defined as [lo]: where a is a scalar. Height vectors were found to predict the RTT more precisely than 2 and 3dimensional Euclidean or spherical coordinates [lo]. 3.3.5 Vivaldi Accuracy I 1 a s t ~ ~ tinvolving 1,740 hosts on thc Interiiot, t h Vivaltli coordinate systciii, with 1 ly two Euclidean dimensions and a height component was found to predict round trip time with a median relative error of 11% [lo]. This result was achieved using Internet domain name servers as the nodes for which the RTT was to be predicted. Vivaldi's authors uscd the King 1521 method in order to mcasurc thc nctml RTT bctwccn each pair of nodes in the set. For example, to measure the RTT between A and B, the probing host first measures the RTT to A. It then requests that A resolve a domain served by B [lo]. The difference between the two times is an estimate of the RTT CHAPTER 3. THE VIVALDI COORDINATE SYSTEM 28 between A arid B. The RTT was continuously nieasured (100 x lo6 times) over the course of a week, and compared with the results produced by the Vivaldi algorithm. The errors observed were as low as GNP [47], which uses fixed infrastructure nodes to help assign synthetic coordinates [lo]. Chapter 4 Modifications to the Gnutella Protocol In this chapter, modifications to the Gnutella protocol are proposed in order to implement and use the Vivaldi coordinate system. It is worth noting that the nodes being considered when evaluating Vivaldi's accuracy (in Subsection 3.3.5) were stable DNS servers; P2P nodes are much more volatile. While DNS servers are virtually always available, P2P nodes join and leave the net,work frequently. It is worth exploring whether the use of Vivaldi synthetic coordinates can help mitigate the topological mismatch of the Gnutella P2P network discussed in Subsection 2.2.2. If Gnutella servents had coordinates that could reliably predict the round trip time between nodes, they could chose to form neighbour relationships with nodes that are near to them. In Subsection 4.1, the syntactic modifications to the Gnutella messages are outlined. In Subsection 4.2, the behavioural n~odifications to tho iiotl(:s using the Gnutc:lla protocol are prcsentcd. Finally, ill Snt)sc:ction 4.3, t,he costs and risks associated with these proposed modifications are discussed. 4.1 Syntactic Modifications The proposed modificatioii to Gnutella messages is that they include Vivaldi coordiaugriwntctl with nattw. As discussed in Subscc%on 3.3.4, 2D Enclidean c.oortlinatc~s height vectors appear to provide sufficiently accurate results, therefore these three coordinate components may be inserted in Gnutella binary and text messages. It is CHAPTER 4. IvIODIFICATIONS T O THE GNUTELLA PROTOCOL 30 important that they be in the text messages because these messages are the vehicle for connection requests, based upon which Gnutella nodes form neighbour relationships. This information must also be included in binary messages so that the communicating nodes' coordinates will converge as messages circulate in the network. In addition to communicating nodes' coordinat,es, the Vivaldi algorithm requires the estimated coordinates error and a means to estimate the RTT (such as a "send timestamp"). This timestamp represents the time a message is sent and may employ a common time base used by a11 Gnutella nodes to synchronize when they join the network. Assuming that the latency is symmet.rica1, an estimate of the RTT is twice the difference between the timestamp and the time a message is received. Therefore, the proposed modifications include nodes' estimated coordinate error and send timestamp in all Gnutella messages. It is necessary to strike a balance between adequate precision for the coordinates, orror, and tirricstarnp, arid t,he cxtra bandwidth rcquircd t,o trarisrriit this iriforrnation. For the purpose of evaluating the proposed protocol modifications, the implementation uses double precision floating point numbers for all quantities, whilst recognizing that the introduction of forty additional octets (each of the coordinate con~ponents, the crros aiitl t.11~ t,i11icxta111p acco~~iit 8 o(;tets)of o v ~ l i m d all hinary rric?ssagclicatlcrs for to inight unreasonably increase the load on the network. It inight be feasible to use only 32-bit, single precision floating point representations of the coordinates, error, and timestamp, given the millisecond scale of RTT values. The 32-bit representation is not inv~st~igated this bhesis. Instead. thc thcsis focuscs on examining the viability in of the synthetic coordinate approach itself. The modified structure of the Gnutella binary message header is shown in Table 4.1. This information could be encoded into text (connection sequence) messages by adding an additional field named Vzualdi-Data. Because the data is ASCII text [29], cacli character addcd to the riicssage requires 8 bits. It would tlierefore be ecoriuiiiical to use a hexadecimal representation for the coordinates, error, and timestamp, and concatenate them into a single string to be included after the Vivaldi-Data header. This is shown in ~ i 4.1 for. a connect message. ~ Tlic tcn-hcxatl<w.irrial-digitsc:querics uunuuaatiau represents the fort,y octcts of iriformation required to convey the Vivaldi 5-tuple of (X,Y, height, error: timestamp). CHAPTER 4. h4ODIFICATIONS T O THE GNUTELLA PROTOCOL Table 4.1: Modified header structure of Gnutella binary messages. Octets Description Message globally unique identifier. 0-15 Payload type 16 TTL (Time to live) 17 18 Hops 19-22 Payload length Vivaldi X coordinate 23-30 31-38 Vivaldi Y coordiriate 39-46 Vivaldi height coordinate Vivaldi coordinate error 47-54 55-62 Vivaldi send timestamp 31 GNUTELLA CONNECT/O.G User-Agent: BearShare/l.O Pong-Caching: 0.1 Vivaldi-Data: aaaaaaaaaa Figure 4.1: Gnutella connect message bearing Vivaldi data. Gnutella servents ignore headers they do not support in the text messages exthe use of the additional header changed during the connection phase [29]. Tl~us, Vivaldi-Data does not pose a backwards compatibility problem. The proposed binary nlessagcs. liowc.ver, will t)(; iriwrnpat,iblc with serveilts that (lo not irnpl(~rilcnt the modifications. For this reason, both servents involved in a connection must agree to use the Vivaldi enhancements during the connection phase, when they exchange capability headers. If either servent does not support Vivaldi, then they cannot communicat,~ using thc modificd prot,ocol. 4.2 Behavioural Modifications To makc use of the proposed Vivaldi enhancements to thc Gnutclla protocol, scrvents must behave differently. This subsection describes the proposed behavioural modifications. Firstly, The proposed modifications only operate in ultrapeer-to-ultrapeer communication: leaves and legacy Gnutella 0.4 servents do not employ the enhancements. CHAPTER 4. MODIFICATIONS T O THE GNUTELLA PROTOCOL 32 In the two-ticred Gnutella hierarchy, peers were divided into ultrapeers and leaves to conserve the networking and processing resources of leaves [29]. Hence, the proposed modifications are not applied to leaves. We therefore avoid adding extra bytes to the messages sent to leaves and eliminate the burden of processing each message bearing Vivaldi coordinates. New protocol modifications do not apply to legacy peers because they employ an old version of the protocol. 4.2.1 Initialization arbitrary large value). These initial values are used The coordinates of a node joining the network are initialized to the origin (O,0, 0) and its local error is set to 5 x 10"an every timc a ~lotlc cntcrs tlic network, regardless of whcthcr it had previously been a participant. This is necessary because node coordinates will dr.ift significantly over time due to the highly dynamic nature of the P2P overlay topology. Coordinates held previously by the node are of little use. 4.2.2 Coordinate Updates When a node rt?cc:ivc:s ariy Gnutclla niessagc 1)earing Vivaltli data, it uses thc information to update its coordinates and local error according to the algorithm described in Chapter 3. It uses twice the difference between the current time and the send timestamp of the received message a s an estimate of the RTT to the sending node. Vivaldi information is "piggybackcil" on Gnutclla control traffic, and, hcnce, nodcs' coordinates are continuously updated through the normal exchange of messages in the network. 4.2.3 Message Forwarding When a Gnutella servent forwards a message to one of its neighbours either through flooding (pin,g or query) or backrouting (pmq or quemj h,it), it includcs its coordinates, timestamp, and error estimate in the message. The updated values included in the received messages replace the previous values recorded at a node.. Nodes must only consider the Vivaldi inforniation of their immediate neighbours when updating their CHAPTER 4. MODIFICATIONS TO THE GNUTELLA PROTOCOL 33 coordinates because the RTT estimate is only meaningful for nodes that are directly connected in the overlay topology. 4.2.4 Optimal Neighbour Selection The topological mismatch between the overlay and physical topology is addressed by servents electing neighbors that are physically close. Thus, nodes need to judiciously select how to respond to connect messages they receive. The number of cormectiorls (a finite maximum) a Gnutella servent can accept is decided by the client implementation. It is a small number: of the order of tens rather than hundreds of connections [9]. If this maximum has not been reached, Vivaldi-modified Gnutella clients will accept, any c:onriec:t,ion request, as in tllc casc of stantlard c:licx~ts.The receiving clic-:ritstorm an estimate of the distance to the node on the other side of the new connection. The estimate is based on the Euclidean distance to that node's coordinates. Every time a new message with new coordinates is received from that node, the distance est,imato is purged. If a node roaches its maximum allowcd number of connections and a new connection request received, is it will consider dropping an existing connection to accommodate the new request. It first estimates the distance to the requesting node based on its coordinates. It then searches its own connection records for the node estimated to be the farthest. If this node is farther than the requesting node, the existing coimection is ter~niriated arid the new corinectiori is accepted. Otlmwise, the new connection request is rejected. Thus, the Gnutella nodes use the Vivaldi information to form neighbour relationships with nodes that are physically close. The pseudocode for the neighbour selection algorithm (NSA) is shown in Fig. 4.2. Pccrs init,iat,c-: connection recllic?sts because t h y have not 1~achctl h i r rnitsiuiiirn t number of connections. They should, therefore, attempt to connect to any peer they ar eaware of: no special logic is required in this situation. When a peer receives a positive response to a connection request it initiated, it records a distance estimate to tho responding nodc. The ncw connection will be eligible for discard if the servent reaches the maximum number of allowed connections. In summary, when a node has room for more connections, it initiates and accepts connections exactly as a normal Gnutella client would. When the maximum number of connections has been reached, it only accepts connections to peers that are closer CHAPTER 4. MODIFICATIONS T O THE GNUTELLA PROTOCOL / / Accept or reject a new connection / / request based on the estimated / / RTT to the requesting node AcceptOrRejectConnectionRequest(RTT, NodeId) { if (neighborList.roomForMore()) { / / We still have connection slots / / available. acceptNewConnection(Node1d) return I currentWorstDistance = 0 foundNeighborToReplace = false neighborToReplace = null / / Find the farthest neighbour. Must be farther than the RTT for / / the requesting node. foreach neighbor in neighborList { if ((neighbor.Distance < RTT) and (neighbor.Distance > currentWorstDistance)) { I I currentWorstDistance = neighbor.Distance neighborToReplace = neighbor foundNeighborToReplace = true disconnect(neighborToRep1ace) neighborList.erase(neighborToRep1ace) acceptNewConnection(Node1d) neighborList.add(NodeId, RTT) I else rejectNewConnection(Node1d) I Figure 4.2: Neighbour selectioii algorithm. CHAPTER 4. MODIFICATIONS TO THE GNUTELLA PROTOCOL than the peers to which it is connected. 35 4.3 Costs and Risks of the Modifications There are costs and risks associated with the proposed modifications. The tradeoffs involvctl with the neiglibour selection algorit,hm arid it,s iniplenicntatiori in Gnutella are discussed in this subsection. 4.3.1 Costs of the Modifications As with most protocol enhancements, the costs are mainly associated with increased message size and additional processing. By adding by adding an additional forty octets to each Gnutella binary message, the header size would be almost trippled. Evcn if only 32-bit values wcrc used for the coordinatcs, error, and timestamp, tho algorithm could still be adding an additional twenty octets, or almost doubling the header size. To keep this seemingly enormous cost in perspective, it is important to recall that Gnutella control traffic volumes are orders of nlagnitude less than the associated file transfer activity, which this proposal does not modify. Hence, even though the amount of control traffic would be increased, the overall effect would be slight: traffic volumes would not be significantly increased and the bit cost of the proposed modifications is not a major concern. The processing costs a t the nodes are far more important. Each node must update its coordinates with every message rcccivcd. Assurriirig a control traffic ratc of Ci kb/s [9], and knowing that the vast majority of messages (91%) [9] are queries with a minimum size of 25 bytes, a lower bound for the number of coordinate updates per second can be estimated as: While this is a fair number of operations, it is not unreasonable for today's fast processors. 4.3.2 Risks of the Modifications In addition to the costs, the proposed modifications do carry certain risks. CHAPTER 4. MODIFICATIONS TO THE GNUTELLA PROTOCOL 36 Oiie of the most important risks is that the enliaiicements leave the iietwork vulnerable to malicious nodes. A node that misreported its coordinates, especially with a very low reported error, could mislead other nodes and reduce the accuracy of their coordinates. Furthremore, a node could send connection requests with false coordinatcs engineered to be close to the target node's coordinates, and thus cause it to discard other legitimate connections. For this proposal to be safely deployed, it would have to be augmented with a trust management algorithm. Such security concerns are beyond the scope of this particular research project. Another potential risk is that t,he dropping of existing connections (conn,ection churn) and the resulting network instability may actually degrade the user experience and make it harder to locate content quickly. While the network is undoubtedly more unstable as a result of connections being dropped and new ones being established according to the neighbour selection algorithm, instability is not inherently detrirnerital to tlic locatiori of contciit arid is tlicreforc iiot dircctly incasured. One final risk is that by causing nodes to preferentially associate with nodes that are physically close, the algorithm may be fragmenting the Gnutella network and limiting nodes' search horizon. If distant nodes are always rejected in favour of closer ones, tlis.joirit networks rnay form (olio on oach contiiimt, for cxaniplc). If sorno notic~s have not filled all their available connection slots, they would still accept connections from distant servents. Thus, inter-continental connections would still be possible, although less frequent. The presence of disjoint networks is not directly tested, but the simiilations discl~ssedin subscqucnt chapters do measurc thc time rcquired to locate content, which is the ultimate indication of the network's success and the user's QoE. Chapter 5 System Architecture In this chapter, the newly developed Gnutaldi (Gnut,ella + Vivaldi) sin~ulation framework is presented. This simulator is based on the Gnutellasin1 work presented in Subsectiori 2.4 as well as the Vivaldi coordinate systerri described in Cliapter 3. In Subsection 5.1: the purpose of the sirnulator is discussed. In Subsection 5.2, the simulator's software architecture is presented. In Subsection 5.2.5, the platform upon which the simulations were run is described. Gnutaldi Objectives The purpose of the Gnutaldi simulator is to evaluate the performance of the proposed inotlifications to thc Grintclla protocol. It is neither feasible nor desirable t,o implement modified Gnutella clients in a deployed network on any meaningful scale. Hence, we rely on network simulations. In order to observe the effects of the proposed neighbor selection algorithm on the performance of the Gnutclla network, we dcvclopcd a new nctwork simulator: Gnutaldi (Gnutella + Vivaldi). Gnutaldi evolved from the GnutellaSim simulator [ll] based on ns-2 [14]. The choice of the ns-2 simulator was mot,ivated by a desire to capture packet-level details. It has the disadvantage of a fairly unscalnble plat,form. As discussed in Subsection 2.3.4, ns-2-based simulations do not scale to the tens of tkiousarids of iiodcs rcyuircd to rwdcl a realistic Griutella nctwork. Ncverthelcss, even simulating small networks with tens of nodes provided a useful test of the proposed modifications. CHAPTER 5. SYSTEM ARCHITECTURE 38 The GnutellaSini sirriulator [ll]provides a starting point for evaluatirig the performance of the existing Gnutella network. With some modifications, it was possible to customize it to collect statistics on query propagation times and the time required for nodes to receive query hits, thus establishing a baseline for conlparison with the modified version. After implemcnting the modifications to Gnutella in the simulator, it was possible to obtain results for comparison with the unmodified protocol, which was a major objective of the project. A second objective of the Gnutaldi simulator is to provide a means to observe the effect of parameter modifications within the enhanced Gnutella protocol or the network environment. In summary, the Gnutaldi simulator is intended to model the modified Gnutella protocol and allow network observations. 5.2 Gnutaldi Architecture The software architecture of the Gnutaldi simulator is discussed in this subsection. As stated, Gnutaldi is based on Gnutellasin1 [ll],which, in turn. is based on ns-2 [14]. The relationsliip betwccii the tlircc sirnulators is illustrated in Fig. 5.1. 5.2.1 Protocol Layer Operation The modules discussed in this subsection are largely unchanged from the original GnutellaSim implementation [ll],but are included for completeness. These classes deal with forwarding control messages according to protocol specifications, but do not contain ariy logic related to the gerieratiori of tliese Iricssagcs or to user bchaviour. This is in the domain of the application layer modules. PeerAgent This class is the base for all protocol layer modules. The class hierarchy is shown in Fig. 5.2. It defines virtual methods for the basic operations of a peer in any P2P network, such as callbacks for packet reception and connection establishment. It provides no functionality for these callbacks; this is left to subclasses. CHAPTER 5. SYSTEM ARCHITECTURE Gnutaldi High performance Gnutella simulation Vivaldi coordinate system Neighbour selection algorithm Figure 5.1: Relationship between ns-2, GnutellaSim and Gnutddi. ns-2 is the fundarnentaI building block for thc othcr two simulators. GnutellaSim is a set of clnsscs that extends ns-2 t o include simulation of the Gnutella network. Gnutaldi is an extension of GnutellaSirn which adds highcr pwforrnancc and iniplements the Vivaldi coordinate system and the proposed neighbour selection algorithm. Figure 5.2: Class hierarchy for Gnutaldi protocol layer modules. CHAPTER 5. SYSTEM ARCHITECTURE GnutellaAgent This class is derived from PeerAgent, and encapsulatcs the protocol operation of a Gnutella. It contains virtual methods for the sending and receiving of all Gnutella messages, such as ping, pong, push, query, and query hit. The GnutellaAgent class provides default implementations for these methods: they behave as for a legacy Gnutella v. 0.4 peer. The Gn.utellaAgen,t handles the forwarding of Gnutella mcssages in a way transparent to upper layers, and only notifies the application when a message of interest is received. The application, in turn, communicates with the GnutellaAgent by instructing it to connect to particular peers, or to send queries and query responses. The Gnutel1aAgen.t participates in statistics collection and interacts with the GnutStats class. LeafAgent This is a specialization of the GnutellaAgent class adapted for Gnutella 0.6 leaf nodes. If differs principally in that it will not consider forwarding received messages: leaves only terminate messages in Gnutella. It will also only accept connection requests from ultrapeers, as mandated by the protocol. UltraAgent This is another specialization of GnutcllaAgen,t, adapted to encapsulate the bchaviour of Gnutella 0.6 ultrapeer nodes. Ultrapeers do not forward ping messages to their shielded leaf nodes. Furthermore, they only forward queries to leaves with a certain probability. This is done because Gnutellasin1 does not yet implement the query routing protocol (QRP). 5.2.2 Application Layer Operation The modules described in this section encapsulate the functionality of Gnutella clients. They are responsible for the initiation of Gnutella connections and searches. They also model user behaviour. CHAPTER 5. SYSTEM ARCHITECTURE Figure 5.3: Class hierarchy for Gnutaldi application layer modules. IPeer App This interface class declares pure virtual methods which could be implemented by any application layer entity in a P2P network. These methods are: join, leave, search, share. set State, bootstrap (search for nodes to connect to), maintenance (update con~icctions), arid con~iect. In this work, it was addcd to tlic original GriutellaSirn project because it helps make the class interface clearer and allows for the eventual manipulation of any P2P applications implementing the interface through pointers or references to the base class. It is the base for all other classes in this subsection. The class hierarchy is shown in Fig. 5.3. Peer App This class, derived fro111 IPeerApp provides empty irriylc~iicntatio~is the riietliods for declared in the interface. It is meant to be a null implementatioii: a peer that does nothing. It stores settings for the application such as the peer's address, the number of files it is sharing, and its connection speed. CHAPTER 5. SYSTEM ARCHITECTURE IGnutellaApp This interface class is derived from IPeerApp and defines additional pure virtual methods to be implemented by Gnutella application classes, which are divided into two categories: instructions to the protocol layer and callbacks invoked from the protocol layer to notify the application of events. Examples of instructions to the protocol laycr include disconnecting from a particular node, sending a ping message, and replying to a query with a query hit. Callbacks invoked by the protocol layer objects include notifications of bootstrap results, indications that query has been received, and confirmation that a connection request has been accepted by another node. GnutellaApp This is the concretc implementation of IGn,.utellaApp, and also derives from PeerApp. This class encapsulates the behaviour of a Gnutella v. 0.4 servent. In addition to performing all the operation mandated by it,s parent interface, it stores connection information for each of its neighbours and interacts with the VivaldiManager in order to maintain its coordinates and make connection decisions based on nodes' coordinates. The flowchart illustrating this decision process is shown in Fig. 5.4. With each Gnutella message arrival, GnutellaApp updates its estimate of the distance to the originating node and its local coordinates and error, as shown in Fig. 5.5. GnutellaApp is also heavily involved in gathering statistics, which it maintains via the GnutStats rnanager. Leaf The Leuf class is derived fro111 the G.n,utelluAppbase and specializes it to encapsulate the behaviour of a Gnutella v. 0.6 leaf servent. Leaves will only attempt to connect to ultrapeers and will reject any other connection requests. This is the only difference, from an application layer perspective, between this class and GnutellaApp. CHAPTER 5. SYSTEM ARCHITECTURE ( Connection request ) Accept connection existing connection farther than connection Refuse connection Figure 5.4: Coririection selectiorl process irilpleriieuted within GnutelluApp as part of the neighbour selection algorithm. CHAPTER 5. SYSTEM ARCHITECTURE Gnutella message arrival (ping, pang, query, query hit) I I I I I 4 Determine distance to originating neighbour I I I 1 + Update estimate of distance to originating neighbour Update estimate of local coordinates and error Figure 5.5: With every received Gnutella message, GnutellaApp updates its estimate of the distance to the originating node. It also updat,es the local coordinates and error cstimatc. CHAPTER 5. SYSTEM ARCHITECTURE Ultrapeer This class is another subclass of GnutellaApp, specialized to represent a Griutella v. 0.6 ultrapeer servent . Ultrapeers have different maximum connection limits for legacy (v. 0.4) peers, ultrapeers, and leaves. Thus, this class contains logic for responding differently to each type of connection request. This is the main difference from the basc class. SmpBootServer This class rriodcls a siiriple bootstrap server, which provide new iiodcs ill thc iictwork with the addresses of a few existing servents. Nodes simply call this class directly instead of sending messages through the network to it. Since bootstrapping dynamics are not of interest in this research project, this is not a concern. The SmpBootServer implements m~t,hods store peer addrcsses in its database and to rcspond to to bootstrap requests from new nodes. ActivityController The original implementation of the ActivityController contained logic to distribute queries according to the UMASS model [45]. Since the comparison of the enhanced Gnutella protocol and the unmodified version required precise control on the timing of queries, this functionality was disahlcd. Thc ActivityController still rctains code to probabilistically determine whether peers should be online or offline. This behaviour is governed by the parameters of the UMASS model. 5.2.3 Messaging The messaging classes encapsulate the Gnutella protocol messages exchanged by peers. Thc message transmission and parsing algorithms were entirely rewritten for this work and consequently, all these classes are new in the Gnutaldi simulator. The reason for redesigning this portion of the simulator is twofold. Firstly, the initial implementation was inefficient. It copied and simulated the transmission of a great deal of superfluous Gnutella fields that were never examined by any nodes in GnutellaSim. While this was quite rigorous, the larger data structures and extensive use of C++ memory CHAPTER 5. SYSTEM ARCHITECTURE 46 copying furictiorls caused the sinlulation to scale poorly. With the new design, only the fields that are actually used in the simulation are implemented. Secondly, the initial implementation was fairly complex and unmaintainable. The implementation was rationalized by introducing a class hierarchy. Also, the maintainability of the codc was improved by handling all the message parsing in a single class instcad of t,hroughout the code, as was the case prior to the modifications. IGnutMsg This interface declares the methods which all message classes must implement. The message class hierarchy is shown in Fig. 5.6. Declaring the virtual methods as high up the class hierarchy as possible allows the use of generic code in the parser class and elsewhere. which docs not need to know what typo of mcssagc it is dealing with. These pure virtual methods include primitives for writing a message to anns-2 packet structure, getting the size of a message in bytes, and retrieving the Vivaldi tuple contained in the message. GnutConnectMsgBC This abstract base class is derived from IGnutMsg. It is a base for all Gnutella connection mcssagcs. Gn,vtCon.nectn/f,$,qBC stores the Vivaldi data (coordinatcs, crror, and time sent) for a message. It also contains logic for writing this information to rawns-2 packets. GnutBootcacheUpdateMsg, GnutBootstrapMsg, and GnutBootstrapResMsg These three classes, derived from GnutConnectMsgBC, encapsulate the interaction between nodes and the SmpBootstrapServer. GnutBootcache UpdateMsg is used by nodcs to inform thc servcr of their prescncc in thc network, so that the scrvcr may givc their address to new nodes seeking peers with which to connect. GnutBootstrapMsg is "sent" (as discussed earlier, the message is not actually sent to the simplified server) to the bootstrap server to request the addresses of peers with available connections. The response, GnutBootstrapResMsg, contains a list of such peers. CHAPTER 5. SYSTEM ARCHITECTURE GnutBinaryMsgBC I 1 GnutRejMsg Figure 5.6: Class hierarchy for Gnutaldi message modules. - CHAPTER 5. SYSTEM ARCHITECTURE GnutLeafConnMsg and GnutLeafOkMsg Thesc two classes, again, derived fro111 GnutConnectMsgBC, represent the rnessages sent by leaf nodes. The GnutLeafConnMsg is a request to connect to an ultrapeer. The GnutLeafOkMsg is a positive response to an ultrapeer's request, for a connection. GnutLegacyConnectMsg and GnutOkMsg These message classes, subclasses of GnutConnectMsgBC, encapsulate the connection messages sent by a legacy Gnutella 0.4 peer. GnutLegacyConnectMsg is a message requesting a connection with another legacy peer or an ultrapecr. GriutOkMsg signals that a v 0.4 peer is willing to accpet a connection request sent by a legacy peer or an ultrapeer. GnutUltraConnMsg and GnutUltraOkMsg These subclasses of GnutConnectMsgBC are analogous to GnutLeafConnMsg and GnutLeafOkMsg, but for ultrapeers: they encapsulate requesting and accepting a connection. Grn~tclla GnutRejMsg This subclass of GnutConnectMsyBC ericapsulates a coririection rejection message sent by any Gnutella servent. GnutBinaryMsgB C This abstract base class is derived from IGnutMsg. It is a base for all Gnutella binary messages. GnutBinaryMsgBC stores the Vivaldi data (coordinates, error, and time sent) for a message. In addition to the Vivaldi data, it stores the Gnutella binary rlicssagc header introduced in Table 2.1. This class contains the logic for writing the Vivaldi and header information to rawns-2 packets. GnutPingMsg This subclass of GnutBinaryMsgBC represent,^ a ping message sent by a Gnutella servent to discover p o t e n h l neighbours. CHAPTER 5. SYSTEM ARCHITECTURE This subclass of GnutBina~yMsgBCericapsulates a po~ig,the response to a ping message. It stores the address of the initiating node and implements the logic to write this information to rawns-2 packets. Gnut QueryMsg This class, which is derived from GnutBinaryMsgBC, represents a query message used by Gnutella servents to search for content in the network. Normally, it should contain inforniation about thc search criteria, but in the simulation, all coiiterit is assuriicd to be identical (i.e., the same file). Thus, it caries no additional data associated with Gnutella. For the simulation, however, these objects are instrumented with the time the query was sent. This is used for gathering statistics about how fast queries sent at a particular the are traversing the network. Gnut QueryHitMsg This su1)clsss of GnutB/rrar:yh/lsqBC rcprcwnts a Gnutella cpcry hit. This typc of message, sent in response to a query, indicates that a node has the requested content. Normally, a query hit would contain a list of files matching the search criteria. but since all content in the simulation is identical, this is not required. F~~rthernlore, the Gnutaldi siml~lator not modcl thc actual download of contmt, so this information do would be superfluous. These query hit message objects are instrumented with the time the query that triggered them was sent. This is done so that the last node receive the query hit - t8heone that initiated the query - will be able to record statistics about how long it took for the query to be satisfied. GnutMsgParser This singleton class is responsible for parsing all the Gnutella mcssagcs (subclasses of IGnutMsg) throughout the Gnutaldi application. When passed raw data, it segments it into the message into type, header, payload, Vivaldi coordinates, error, and send timestamp. It relies on the services of the message classes to recoilstruct IGnuthlsgderived objects that are readily usable by the application code. The message parser CHAPTER 5. SYSTEM ARCHITECTURE Figure 5.7: Class hierarchy for Gnutaldi's Vivaldi-related modules. acts as a factory for ns-2 PacketData objects, which are the raw packets. It also manages the allocation of message GUIDs. 5.2.4 Vivaldi-Related Classes This subsection describes the modules related to implementing the Vivaldi algorithm. IVivaldiManager This interface is the base class which defines the behaviour of an entity that manages Vivaldi coordinates for a node. It defines only pure virtual methods t o update coordinates, reset them t o the initial value, retrieve the local node error, and estimate the RTT given a pair of coordinates. The class hierarchy for the Vivaldi classes is shown in Fig. 5.7 VivaldiManagerBC This class provides a null implementation of the IVivaldiManager: it implements all the niethods, but they do nothi~lg.This is used wllen running the 11or1nal G1lutelli-l network simulat.ions. VivaldiManager This subclass of IVivnldiManager fully implements the functionality to maintain coordinates according to the Vivaldi algorithm. An instance of this class is stored in each GnutellaApp instance in order to calculate that node's coordinates. CHAPTER 5. SYSTEM ARCHITECTURE 5.2.5 Gnutaldi Platform Thc Gnutaldi siniulatioris for this researcli were executed on a dual-Xcon processor Red Hat Linux workstation. The machine was equipped with 2 gigabytes of RAM. At the time of the simulations, this was considered a relatively high-end platform. It is important to note that ns-2 is a single-threaded process, so the performance of the simulations was not directly improved by virtue of having two processors. Nevertheless, the fact that processes other than ns-2 could be served by the additional CPU improved performance somewhat. CPU utilization was at 100% for the duration of the sin~ulations.The scale of the sinlulations was bounded by the available menlory on the system. When too many nodes were included, the system failed to complete the simulation. After all available RAM had been filled. the system entered a thrashing state where data was constantly being swapped between RAM and the hard disk cache and progress of the simulation was halted. Because memory requirements increased exponentially with the number of peers sending queries (due to the flooding nature of query propagation), tlic lirriit on thc nurnbcr of peers was quickly reached. With 92 nodes, the simulation took several hours. M'ith inore than 150 nodes, the simulations never completed due to lack of memory. Chapter 6 Network Topology Generation In order to evaluate the performance of the Gnutella network through sinzulation, it was necessary to generate a synthetic network topology. There are a variety of tools available t o accomplish this task, many of which are able to export their inforrriatiorl to native ns-2 scripts. There are also many mathematical models employed by the tools to produce realistic topologies. A very simple nzodel is the Erdos-Rhyi model, which generates a graph G = (V, E) wlicro c x l l of tllc possible 11 Vll (11 VII - 1) O ~ ~ hasS a prot)at)ility p of appc:aring in C the graph [53]. This has the property of producing a graph that is not necessarily connected. Furthermore, the average degree of the vertices is (IJVIJ 1) p. - A more recent effort at generating random topologies is the random graph model introdl~ccdby Waxman [54]. With this model, nodes are randomly placcd in a plane [55] and connected with the following probability [54]: P (x, y) = cre 4 -- z . y ) 01, I where d ( x , y) is the distance between vertices x and y, L is the distance between the two farthest riodcs in the graph, and n and , ' arc rnodcl pararnetcrs. Increasirig h cr yields more edges, whilst increasing /3 increases the ratio of long edges to short ones [53]. Since the Internet includes a notion of hierarchy, a refinement to the "flat" nzodcds prasentccl tlilis ftr was for tliarn to incwrporatc that charac.teristic. The tmmitstub [56]model uses this approach by recognizing that routing domains in the Internet are either transit or st,ub domains. Transit domains are typically operated by large CHAPTER 6. NETWORK TOPOLOGY GENERATION 53 service providers such as AT&T arid Worldconi, and are used to connect the stub domains operated by their customers and to forward traffic between them. Stub domains terminate traffic but do not forward it. More precisely, a domain is a stub if and only if the path connecting two nodes goes through that domain only if either of t,he nodes is contained in thc donlain [%I. The following algorithm is used to build transit-stub topologies [56]: 1. Construct a connected random graph using any suitable method. The vertices in this graph represent transit domains. 2. Replace each node in the graph with another connected random graph. This graph represents the backbone of the transit domain. 3. For each node in the transit domains, generate a certain number of random connected graphs. These are the stub domains connected to the node in the transit domain. 4. Add a number of edges between nodes in transit domains and stub domains, and bctwceri riodes i11 different stub doniairis. Thc hicrarcliical topology is riow conlplete. Because there is strong evidence to suggest that the Internet topology exhibits power-law characteristics [31], Barab6si and Albert proposed the model known as Barab6si-Albert [32]. This scale-free (power-law) model is based on the premises of incrcmcntal growth and prefcrcntial attachment. Iilcrcmcntal growth rcfcrs to t,hc fact the network did not come online all at once: nodes were added progressively, over time. This is modelled by starting with a small number of vertices m and, at o each step. adding a vertex and connecting it to the existing vertices with m edges. Preferential attachment implies that popular nodes (i.e., ones with many incident li~iks) more popular as the network grows. More formally, tlie probability rI of a get new vertex i connecting to vertex j, which is already in the network, is given by [32]: where d j is the degree of node j and the overall equation is the ratio of j's links to the sum of all the vertex degrees. The recornmended m-value to most closely approximate CHAPTER 6. NETWORK TOPOLOGY GENERATION 54 the router-lcvel topology of tlie Internet is 2 [55]. This topology represents the core of the network. The routers are connected by links with a latency uniformly distributed between 0 and 4 milliseconds. The technologies deployed in the core of the Internet at this time are connected with high speed links, with a bandwidth of 10 Gb/s being quite common. To be conservative, 100 Mb/s links were employed, alt,hough beyond a small threshold, the actual capacity of the links is irrelevant: no T C P window closures nor traffic throttling would be observed even if a slow link speed such as 10 Mb/s were used. Link latency variation, however, is of significance. The intent was to create paths of different lengths (in terms of time) through the network so that the Vivaldi algorithm would have the opportunity to select the shortest one. Using this core topology as a starting point, the sample network was completed by attaching a random nuinber of nodes (either 1 or 2) to each leaf (nodes with the smallest degree) in the core topology. A similar approach was proposed by the authors of GnutellaSirri in their usage notes [44] and tlic hierarchical transit-stub niodcl [MI. These nodes, newly &ached to the leaves, contain the Gnutella servents. They are connected by slower links, distributed according to observed peer bandwidths [43]. (Note that the bandwidth of the nodes is of little importance here). The latency for tlic: pcxr accc:ss li~iks was tlistribut,atl unifor~nlybetwc-:e~i arid 2 (j rnillisoconds. Tlic: resulting network contained a total of 92 nodes and 42 Gnutella servents. Because of the scalability limitations of the ns-2 simulator, it was not possible to simulate networks with a larger number of nodes. Nevertheless, even with a snlall number of nodes, it was possible t,o exanline the behaviour of the proposed algorithm. We generated 10 different networks and used these as the basis for t,he simulations described in Chapter 7. Chapter 7 Evaluation of the Modified Gnutella Protocol I11 this section, simulations using the 92-node network described in Chapter 6 are presented. In Subsection 7.1, a range of tuning parameters for the Vivaldi algorithm is explored. In Subsection 7.2, the observed operation of the neighbour selection algorithm (NSA) is discussed. In Subsection 7.3, the convergence properties of the Vivaldi coordinates in the simulation are reported. Finally, in Subsection 7.4, the performance evaluation results are shown and the extent to which the neighbour selection algorithm improves performance in the Gnutella network is discussed. 7.1 Tuning Parameter Selection Although the Vivaldi algorithm tuning parameter c, is recommended to be 0.25 [lo], no value is specified for the tuning parameter c, which is used to balance the contribution of new samples to the weighted moving average of the local error (3.8). Higher values cause recent sariiplcs to be weighted more heavily. In order to dctcrrriine the c, value to use in subsequent simulations, the performance of different ce values was explored for one of the 92-node networks described in Chapter 6. The median relative RTT prediction error, which is an indicator of coordinate accuracy, is shown for valucs of c, ra~igiiigfroin 0.01 to 0.75 in Fig. 7.1 - 7.5. As shown in Fig. 7.1, with ce = 0.01, approximately 1,000 seconds are required for the relative error to drop below 10%. The higher ce values converge much more CHAPTER 7. EVALUATION O F THE MODIFIED GNUTELLA PROTOCOL 56 500 1000 Time (s) Figure 7.1: Median relative RTT prediction error as a function of time for c = 0.01. , rapidly: they drop below 10% within 200 - 400 seconds. The steady-state errors for , , the c values shown in Fig. 7.2 - 7.5 are quite similar, but c = 0.10 (Fig. 7.2) causes less oscillation of the error. It is expected that lower c values would lead to less error , dramatic variations siricc tlic inipact of cacli new sariiplc is slliallcr (3.8). The lower the coordinate error, the more likely the coordinates are to accurately predict RTT between nodes and lead to optimal neighbour selection behaviour. Rapid convergence and little oscillation is also desired. For these reasons, with the networks is nsctl in this sinil~lation, most suitat)le c, vall~e 0.10. This valuc was 11sct1in all the simulations discussed in this chapter. Setworks with distinct properties, in particular , a different number of nodes, may require different values for the c tuning parameter. Neighbour Selection Behaviour In this subsection, simulation results are examined in order to observe the behaviour of the neighbour selection algorithm. CHAPTER 7. EVALUATION OF THE MODIFIED GNUTELLA PROTOCOL 57 c .u al 01 . 5 005 0 0 I I 500 1000 1500 Time (s) Figure 7.2: Median relative RTT predi~t~ion error as a function of time for c = 0.10. , Table 7.1: Simulation Paramcters for a Stablc 42-Pccr Nct,work I Nodes I Gnutella servents Maximum number of connections per servent Minimum node start time Maximum node start time Probability of going offline after a successful query Number of nodes with the desired content Proportion of nodes sending queries Query interval - Simulation time - 1 92 42 8 10s 50 s 0% 12 25% 1100s 1,500 s I I 1 I I CHAPTER 7. EVALUATION OF THE MODIFIED GNUTELLA PROTOCOL 58 I I I 1500 0 500 1000 Time (s) Figure 7.3: Median relative RTT prediction error as a function of time for c, = 0.25. The first simulation consisted of a 42-servent Gnutella network where nodes join the network and never disconnect. This is an unrealistically stable environment. but it is ideal for observing the raw behaviour of the neighbour selection algorithm. The para~lietewdescribing this network, which was generated using the ~rictliodoutliiicd in Chapter 6, are suminarized in Table 7.1. Each dot in Fig. 7.6 marks the time where a Gnutella servent disconnected from one of its existing connections in order t o select a peer deemed to be closer, in accordance with the neighbour selection algorithm. Although 10 difft3rcnt physi('a1 net,works wero ac.tually iiscd in the siillulation iterations, only the results for a single network are presented in this subsection because inmy interesting characteristics of the neighbour selection behaviour are masked when the aggregate results are considered. There are no connection drop events early in the sim~lat~ion: first occlirrcnce is at instant 72.034588 s. This initial period of calm t,hc is expected, because the nodes are still joining the network and have not exhausted all of their available connection. Hence, there is no need for them to drop existing ~onnect~ion favour of closer nodes. Once the network contains more nodes and their in CHAPTER 7. EVALUATION OF THE MODIFIED GNUTELLA PROTOCOL 59 I I I 1500 0 500 1000 Time (s) Figure 7.4: Median relative RTT prediction error as a function of time for c = 0.50. , connections are filled, however, a considerable number of drop events are observed, as shown by the densely packed dots between 100 and 500 seconds. This is the expected pattern. where the network is reorganizing itself in order to more closely match the undcrlyiiig physical topology. Betwccri iiistant 530 s arid 675 s, no conrlectiori drop events are observed. This is followed by a period of moderate connection drop activity ending at instant 899.921195 s. An inspection of the connection dynamics of the network explains this somewhat surprising result. When the statistics were gathered at illstant 550 s, all 42 scnwitts in thcl network hat1 filled their 8 av&hlc connections except for a single node. This node was attempting to connect to any other node in the network, but could not find one that was close enough to be willing to drop one of its existing connections until instant 675.551901 s. At that time, one of the other nodes dccmcd that thc orphan node was closer than at least onc of its ncighbours, dropped its connection to the most distant of its peers, and accepted the connection request from the orphan node. As a result, the "formerly orphan" node had 7 more connection slots to fill and the newly dropped neighbour had 1 vacant slot. As these CHAPTER 7. EVALUATION OF THE MODIFIED GNUTELLA PROTOCOL 60 9 .CI 0.2- - 0.152 ([I .([I u a, C 01. 0.05- 3 Figure 7.5: Median relative RTT prediction error as a function of time for c = 0.75. , nodes searched for neighbours, they found peers that were close enough to drop an existing connection, which in turn caused more nodes to search for connections. This domino eflect caused the period of moderate connection drop activity observed up until instant 899.921195 s. During this period, a total of 18 coririectiori drop cverits were observed. The activity was less intense than earlier in the simulation because the network had already reorganized into a form where the neighbour selection was quite optimized: nodes in search of peers willing to drop their connections succeeded lcss frc~lut:ntly. Tlicrc wcre also fcwcr cnipty connr:ctioii slots to fill. Ultimately, horn instant 900 s onwards, the network remained in a stable state where all nodes except one filled their 8 connection slots and a single node had 6 connections. This convergence to a stable overlay topology is an expected result for a scenario where the pccrs do not, lcave t,hc n~twork.After the initial turbulcncc in thc nettwork t,opology, peers eventually fill their connection slots with neighbours that are closer than any peers still seeking connections. Hence, no peers are willing to drop connections to accept connection requests from peers with available connection slots: the topology CHAPTER 7. EVALUATION OF T H E MODIFIED GNUTELLA PROTOCOL 61 0 250 500 750 1000 1250 1500 Time (s) Figure 7 . 6 Conne~t~ion events for a stable network of 42 Gnut~clla drop s~rvent~s. CHAPTER 7. EVALUATION OF THE MODIFIED GNUTELLA PROTOCOL 62 -- ..a*.-.-.. . ... . a - .-". ...-.... 0 500 1000 Time (s) 1500 2000 2500 Figure 7.7: Connection drop events for a dynamic network of 42 Gnut,ella servent,~. is frozen. It is not gmrantmd that this is th,c optimal state, but it is certainly somcwhat optimized, since neighbour relationships are guided by topological information. The extent to which this improves the performance of the network is evaluated in Subsection 7.4. The sin~ulationparameters for the second scenario are shown in Table 7.2. In this case, the 42-servent topology discussed in Chapter 6 was used as the basis for a more realistic dynamic Gnutella network. Since large P2P networks tend not to spring up all at once [lo], a period of stability of 1,000 seconds was imposed at the beginning of the simulation. where nodes will never disconnect after a successful qucry (UMASS rnodel [45]). Half of the 42 serverits arc started at the bcgiririirig of this period. This is intended to model a core topology of servents that have been in the Gnutella network long enough to have somewhat stable Vivaldi coordinates. After CHAPTER 7. EVALUATION OF THE MODIFlED GNUTELLA PROTOCOL 63 Table 7.2: Simulation Parameters for a Dynamic 42-Peer Network I Nodes Gnutella servents Maximum number of connections per servent Minimum node start time (core overlay) " , Maximum node start time (core overlay) Earliest allowed node disconnection time Minimum new node start time Maximuni new node start time Probability of going offline after a successful query Number of nodes with the desired content Proportion of nodes sending queries Query interval Simulation time 1 92 1 42 8 0s 50 s 1,000 s 1,000 s 1,999 s 10% 12 25% 100 s 2,500 s 1,000 seconds, nodes will disconnect after a successful query (i.e.. the receipt of a query hit) with a probability of 10%. This 10% value is meant to be representative of realistic Gnutella node behaviour [ll].Once a node goes offline, it does not rejoin the network. Qlicries arc scnt at 100-second intervals throughout the simulation. Bctwccn 1,000 and 2,000 seconds of the simulation, at uniformly distributed random time intervals, the remaining 21 servents were introduced into the network. This period is meant to model the normal conditions in a Gnutella network, with nodes joining and leaving. The connection drop events for this scenario are shown in Fig. 7.7. From the heginning of the siinulation up until instant 560.539315 s, there is fairly intense connection drop activity, as the core network of 21 servents reorganizes itself according to the neighbour selection algorithm. This is similar to the previous scenario, except that the network converges to a stable state faster because of the smaller number of nodes. The next connection drop event is at iristarit 1,034.465494 s, which is after the i~itroduction of dynamic network behaviour. As more and more nodes are introduced into the network between instant 1,000 and 2,000 seconds, and as nodes leave due to successful queries during the same period, connection drop activity (as shown by the dots in Fig. 7.7) i~icrcwesand continues throughout tlio poriod. This wnti~iuousrc~)rgariizrttio~i of the network is the expected result of the neighbour selection algorithm in a dynamic, realistic P2P environment. After instant 2,000 s, no new nodes are introduced into the CHAPTER 7. EVALUATION OF THE MODIFIED GNUTELLA PROTOCOL 64 network, and coririection drop activity decreases, as expected. Nodes are still leaving the network after successful queries, some level of drop activity is indeed expected. In summary, with a stable network, the neighbour selection algorithm causes the overlay to converge to a stable form. With a dynamic Gnutella network, connection drop activity continucs throughout the simulation, as the network reorganizes itself to optimize the integration of the new nodes and replace the departing nodes with neighbours that are close. 7.3 Convergence of Vivaldi Coordinates In this subsection, the convergence properties of the Vivaldi coordinate system implemented in t,he Gnutella protocol are examined. The experiments conducted in the seminal Vivaldi publication [lo] show that with as few as 8 neighbours, most nodes exhibited a relative RTT estimate error of less than 20%. This result was obtained using a set of 1,740 DNS servers, which are considered to be stable nodes. In order to compare the proposed implementation t.o this result, the Gnutaldi tool was used to simulate the 92-node networks discussed in Chapter 6, with the siriiulatiori pararncters sliowri in Table 7.1. The median relative RTT prediction error as a function of time for a stable network where nodes do not disconnect from the P 2 P overlay once t,lley have joined is shown in Fig. 7.8. The neighbour selection algorithm described in Subsection 4.2 is operating 011 a11 thc Gnutclla sc:rvents. The Vivaltli coortli~iatcs c.onvergt: wit11 a iiiedisn relative error of approximatrely5% in the steady-state. It is appropriate to consider the median error rather than the arithmetic mean (for example) because the error for new nodes is initialized to an extremely large value. This outlier value would disproportionately affcct thc evaluation if t,he mean wcrc used. Evcn though thc ncighbour sclcction algorithm introduces additional inst ability in the network, as connections to distant nodes are replaced with connections to closer peers, it can be observed from Fig. 7.8 that the Vivaldi coordinates converge adequately. Note that this is not a realistic situation: P2P networks are quite dynamic in nature. It is, nevertheless, a solid basis for coinparing tlic irriplenientation of Vivaldi in Griutella with the Vivaldi espcriiiients conducted on the stable DNS servers [lo]. The fact that the coordinates converge with a smaller error than in the original experiments may be attributed to the greater CHAPTER 7. EVALUATION OF THE MODIFIED GNUTELLA PROTOCOL 65 "0 250 500 750 1000 1250 1500 Time (s) Figure 7.8: Median relative RTT prediction error as a function of time for a 92-node network with 42 stable Gnutella servents running the neighbour selection algorithm. Each node has up to 8 neighbours. CHAPTER 7. EVALUATION OF THE MODIFIED GNUTELLA PROTOCOL 66 conriectivity of the simulated overlay network: the 8 connectioris allowed for each peer t o reach a larger proportion of the network than the equivalent 8 connections would in the DNS experiment with a network containing forty times as many nodes. Spikes such as the one observed near 400 seconds are not unexpected as the coordinates converge. When a node connects to a new node, it obtains information from different regions of the network, and must adjust its coordinates. During this adjustment period, its median RTT prediction error is high. furthermore, as it adjusts its coordinates, the nodes t o which it is connected will also have t o adjust their coordinates, as the Vivaldi spring system attempts to return to a state of minimal potential energy. The dom,in,o effect of these coordinate adjustments percolates through the network and the median RTT prediction error may rise quickly, only to fall as the coordinates converge based on the new information from their neighbours. The convergence of the Vivaldi coordinates for the same network. but with peers joiiiirig arid leaving the network, accordirig t o tlic paranicters in Table 7.2 is show11 in Fig. 7.9. Each subfigure shows an iteration of the same simulation, differing only by the random seed used to generate the underlying physical network. The random seed also affects which nodes have the desired content, which nodes will send queries, as wcll as thc timcs iioclc~ join and lcavc tlie ~ictwosk.The subfigurcs show tliffcscnt mcdian relative RTT prediction error values because the networks evolve differently over time as a result of random events. Overall. however, the error tends to vary between 5% and lo%, never exceeding 31% (observed at instant 2,050 s, in Fig. 7.9(c)) in any of thc simulations. This is of the same magnitude as the error in the original Vivaldi experiments [lo]. The fact that no significant spike or general increase in median relative RTT prediction error is observed immediately after time 1,000 s, when new nodes are introduced into the stable network. indicates that the new nodes' coordinates are converging very quickly. The convergence delay of several hundred seconds observed at the beginning of tlie sirriulatiori is for an entirely iicw network coiriing into existence all at once. This is not a realistic situation and is only depicted to show that the coordinates do indeed converge, even in this extreme case. Evidently the coordinates in the simulated dynamic network converge quickly to within a moderate erso~ when a core network of nodes with rc1iat)lc coortlinates cxists. This would t)c the case in the real Gnutella network, where new nodes would encounter nodes that had been in the network for some time and had acquired reliable coordinates: the CHAPTER 7. EVALUATION OF THE MODIFIED GNUTELLA PROTOCOL 67 Time (s) (a) Setwork 1. (b) Network 2. no law lbw mm mo 1 m ~rm mm Time (s) Time (s) ( c ) Network 3. (d) Network 4. Figure 7.9: Median rel...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

Sveriges lantbruksuniversitet - IR - 4219
THE EVOLUTION OF A COASTAL COMMUNITY: POWER RELATIONS AND TOURISM GEOGRAPHIES IN TOFINO, BRITISH COLUMBIAErin Johanna Welk B.A. (Honours), University of CalgaryTHESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER O
Sveriges lantbruksuniversitet - IR - 2447
EVALUATING ROLE COLLABORATIVE THE OF PLANNINGIN BC'S PARKS AND PROTECTED AREASMANAGEMENT PLANNING PROCESSTracy C. RonmarkBSc. Royal Roads University, 2000RESEARCH PROJECT SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF
East Los Angeles College - PPTPRES - 0304
Nonparametric tests IIas randomisation testsLecture Outline Background: Nonparametric tests as randomisation tests The sign test The Wilcoxon signed ranks test The Mann-Whitney test General remarks on randomisation tests Brief Review of the
East Los Angeles College - PPTPRES - 0304
Nonparametric tests IBack to basicsLecture Outline What is a nonparametric test? Rank tests, distribution free tests and nonparametric tests Which type of test to useMTB &gt; dotplot 'Male' 'Female'; SUBC&gt; same. . : . . . . . . : :.:. :.:: :. .:
East Los Angeles College - PPTPRES - 0304
Quantitative MethodsModel Selection II: datasets with several explanatory variablesModel Selection II: several explanatory variablesThe problem of model choiceModel Selection II: several explanatory variablesThe problem of model choiceMod
East Los Angeles College - PPTPRES - 0304
QuantitativeMethodsInteractionsgettingmorecomplexInteractionsgettingmorecomplexThefactorialprincipleInteractionsgettingmorecomplexThefactorialprincipleInteractionsgettingmorecomplexThefactorialprinciple333366Interactionsge
East Los Angeles College - PPTPRES - 0304
QuantitativeMethodsRandomEffectsRandomEffectsWhatarerandomeffects?RandomEffectsWhatarerandomeffects?Criterion: Fixedeffects RandomeffectsRandomEffectsWhatarerandomeffects?Criterion: Fixedeffects Randomeffects Repetition: Iftheexperime
East Los Angeles College - PPTPRES - 0304
QuantitativeMethodsCategoricalDataCategoricalDataThePoissonDistributionCategoricalDataThePoissonDistributionItemsContainers Radioactivedecays Telephonecallsbegun Figtrees Fleas Typingmistakes Second Minute Hectare Cat PageCate
East Los Angeles College - PPTPRES - 0203
QuantitativeMethodsRegressionRegressionExamplesforlinearregression Domorebrightlycolouredbirdshavemoreparasites? Howshouldweestimatemerchantablevolumeofwoodfromthe heightofalivingtree? Howispestinfestationlateintheseasonaffectedbythe concen
East Los Angeles College - PPTPRES - 0203
QuantitativeMethodsUsingmorethan oneexplanatoryvariableUsingmorethanoneexplanatoryvariableWhyusemorethanone? Interveningor3rdvariables(schoolchildrensmaths) Reducingerrorvariation(saplings) Thereismorethanoneinterestingpredictor(trees)Usin
Allan Hancock College - CAOB - 2003313
Passed by both Houses New South WalesCrimes Amendment (Sexual Offences)Bill 2003Contents Page
East Los Angeles College - PPTPRES - 0304
QuantitativeMethodsCombiningcontinuousandcategorical variablesCombiningcategoricalandcontinuousvariablesRepriseofmodelsfittedsofarYIELD=FERTIL YIELDM=VARIETY VOLUME=HEIGHT MATHS=ESSAYS SPECIES2=SPECIES1 AMA=YEARS+HGHT FINALHT=INITHT+WATER WGHT
East Los Angeles College - PPTPRES - 0304
QuantitativeMethodsWhatliesbeyond?Whatliesbeyond?GeneralLinearModelWhatdoesGLMdoforus? partitioningofvarianceandDF testsforwhetherxvariablesmatter statisticalelimination bestfitequationshowinghowxvariablesmatter WhatisgeneralaboutGLM? categori
Allan Hancock College - CAOB - 2003313
Passed by both HousesNew South WalesCrimes Amendment (Sexual Offences) Bill 2003ContentsPage1 2 3 4Name of Act Commencement Amendment of Crimes Act 1900 No 40 Amendment of other Acts2 2 2 2Schedules1 2 Amendment of Crimes Act 1900 Ame
N.E. Illinois - MAT - 4275
Eastern Illinois UniversityMathematics and Computer Science InternshipGuidelines for Final Report The purpose of the final report is to provide the coordinator with an appreciation of the experience you have gained and the knowledge you have acquir
N.E. Illinois - MAT - 4275
Eastern Illinois UniversityMathematics and Computer Science InternshipFinal Internship Report Name of Intern: Company/Institution: Date internship work began: Date internship ended: Hours worked per week: Please rate the intern on the following ite
UC Davis - SS - 0809
Ideal Gas Law ProblemsUse the ideal gas law to solve the following problems: 1) If I have 4 moles of a gas at a pressure of 5.6 atm and a volume of 12 liters, what is the temperature?2)If I have an unknown quantity of gas at a pressure of 1.2 at
East Los Angeles College - PPTPRES - 0203
QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTSThis booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http:/users.ox.ac.uk/~grafen/QMnotes/index.html.ALAN GRAFEN Alan G
East Los Angeles College - PPTPRES - 0304
QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTSThis booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http:/users.ox.ac.uk/~grafen/QMnotes/index.html.ALAN GRAFEN Alan G
Allan Hancock College - CAOB - 2008313
[Page Break] New South WalesCrimes Amendment (Sexual Offences)Bill 2008Contents Page 1 Name of Act
BYU - CS - 330
CS 330 Lecture #30: Type Abstraction and Type InferencePage 1Type Equivalence Suppose you had the following: typedef int miles; typedef int kilometers; miles x; kilometers y; x = 100; y = x;Is this legal in C+? Should it be?CS 330 Lecture #30
BYU - LECT - 450
Review: Complex NumbersReview: Complex NumbersCS 450: Introduction to Digital Signal and Image ProcessingBryan Morse BYU Computer ScienceReview: Complex Numbers BasicsComplex NumbersA complex number is one of the form a + bi where i= a: rea
East Los Angeles College - KEBL - 2890
DEM Analysis with ArcViewLearn the basics of ArcView GIS by following the Quick Start Tutorial in the book titled Using ArcView (pp. 5-30). You may also wish to try the tutorial in Using the ArcView Spatial Analyst (this is the manual for the spatia
East Los Angeles College - BALL - 0888
Weak Twos in the Majors[A] DisciplineWe decided to play Weak Two bids in the majors1 because they combine two attractive features. When the opposition holds the balance of the points they are very effective as pre-empts. And on those rare occasion
East Los Angeles College - BODL - 0153
Report on the ALLC/ACH 2004 Joint International Conference of the Association for Literary and Linguistic Computing and the Association for Computers and the HumanitiesThe ALLC/ACH 2004 annual conference took place at the Centre for Humanities Compu
East Los Angeles College - SCAT - 3104
ThepoorarenolessfreethantherichtodineattheRitz.Theyjustcantaffordto. Discuss. Clashbetweenleftandrightwingviewsoffreedom Whatconstitutesfreedom Negativetheoristsconstraintsonfreedomexternalobstacles Freedomasasocialrelationnaturalcausesdonotconstitut
East Los Angeles College - CPGL - 0015
# Verb-initial grammars: A multilingual/parallel perspective# ESRC Project RES-000-23-0505# Oxford University# Charles Randriamasimanana# Malagasy phonetics/phonology &amp; morphology:# MORPHOLOGICAL SEGMENTATION: ESRC-OX-05-CR102.# Malagasy i
UC Davis - LOG - 0302
Give Mr. HASEGAWA his priesthood back! Mr. Ytetsu HASEGAWA has been a regular visitor of the Gesshin-in Temple since 1972, when he began frequent visits to the late KASAHARA Gens, the chief priest of the Gesshin-in Temple at that time. This led him t
UC Davis - ATT - 0302
Give Mr. HASEGAWA his priesthood back! Mr. Ytetsu HASEGAWA has been a regular visitor of the Gesshin-in Temple since 1972, when he began frequent visits to the late KASAHARA Gens, the chief priest of the Gesshin-in Temple at that time. This led him t
Sveriges lantbruksuniversitet - IR - 3467
ND LEGISLATED POVERTYN E W S L E T T E RrnSmallwood and Sihota get your letters!In February the ELP newsletter had two form letters about welfare and wages. One letter asked Moe Sihota, the government person in charge of minimum wage, to increas
Sveriges lantbruksuniversitet - LIB - 3467
ND LEGISLATED POVERTYN E W S L E T T E RrnSmallwood and Sihota get your letters!In February the ELP newsletter had two form letters about welfare and wages. One letter asked Moe Sihota, the government person in charge of minimum wage, to increas
Cal Poly - ELE - 4301
Features High-performance, Low-power AVR 8-bit Microcontroller RISC Architecture 118 Powerful Instructions Most Single Clock Cycle Execution 32 x 8 General Purpose Working Registers Fully Static Operation Up to 16 MIPS Throughput at 16 MHz Dat
Sveriges lantbruksuniversitet - IR - 264
Sweeping graphs and digraphsbyDaniel Dyer B.Sc. (Hons), Memorial University of Newfoundland, 1998. M.Sc., Simon Fraser University, 2001.ATHESIS SUBMITTEDIN PARTIAL FULFILLMENTO F T H E REQUIREMENTS FOR T H E D E G R E E O Fin the Departm
Allan Hancock College - DSOA - 20061
Western Australia Dangerous Sexual Offenders Act 2006 Western Australia Dangerous Sexual Offenders Act 2006 CONTENTS Part 1 -
Sveriges lantbruksuniversitet - IR - 3859
Vancouver/Richmond Health BoardL-.d Levi Chair David Esworthy John P. Kennedy David Khan Ken Leighton Bert Massiah Margant McPhe Bud Osbom Roberta Price Sheila Rowswell Jim Sinclair Marjorie Stewart Renee Taylor Jim Thonteinson Patricia Wilkinson Si
Sveriges lantbruksuniversitet - LIB - 3859
Vancouver/Richmond Health BoardL-.d Levi Chair David Esworthy John P. Kennedy David Khan Ken Leighton Bert Massiah Margant McPhe Bud Osbom Roberta Price Sheila Rowswell Jim Sinclair Marjorie Stewart Renee Taylor Jim Thonteinson Patricia Wilkinson Si
East Los Angeles College - CPGL - 0015
Porting Grammars between Typologically Similar Languages: Japanese to KoreanRoger KIM Palo Alto Research Center 3333 Coyote Hill Rd. Palo Alto, CA 94304 USA rkim@parc.com Ronald M. KAPLAN Palo Alto Research Center 3333 Coyote Hill Rd. Palo Alto, CA
East Los Angeles College - CPGL - 0015
Syntax of natural and accidental coordination: Evidence from agreementMary Dalrymple (mary.dalrymple@ling-phil.ox.ac.uk) Irina Nikolaeva (irina a nikolaeva@yahoo.com) Centre for Linguistics and Philology University of Oxford September 7, 20051Sy
East Los Angeles College - CPGL - 0015
# Verb-initial grammars: A multilingual/parallel perspective# ESRC Project RES-000-23-0505# Oxford University# Charles Randriamasimanana# Malagasy syntax &amp; semantics:# Malagasy Passives: ESRC-OX-05-CR105/203.# Malagasy passive stems can tak
East Los Angeles College - CPGL - 0015
# Verb-initial grammars: A multilingual/parallel perspective# ESRC Project RES-000-23-0505# Oxford University# Charles Randriamasimanana# Malagasy phonetics/phonology &amp; morphology:# Malagasy ka, tra, na endings of words: ESRC-OX-05-CR104.#
East Los Angeles College - CPGL - 0015
# Verb-initial grammars: A multilingual/parallel perspective# ESRC Project RES-000-23-0505# Oxford University# Charles Randriamasimanana# Malagasy lexicon:# Malagasy Roots without a passive meaning: ESRC-OX-05-CR303.# These are roots witho
East Los Angeles College - CPGL - 0015
# Verb-initial grammars: A multilingual/parallel perspective# ESRC Project RES-000-23-0505# Oxford University# Charles Randriamasimanana# Malagasy phonetics/phonology &amp; morphology:# Malagasy Epenthesis: ESRC-OX-05-CR103.# Epenthesis is a w
Sveriges lantbruksuniversitet - IR - 4065
Usage statistics and scholarly communications Heather Morrison, Project Coordinator, BC Electronic Library Network (BC ELN)Based on: Morrison, Heather (2005) The implications of usage statistics as an economic factor in scholarly communications, in
Sveriges lantbruksuniversitet - LIB - 4065
Usage statistics and scholarly communications Heather Morrison, Project Coordinator, BC Electronic Library Network (BC ELN)Based on: Morrison, Heather (2005) The implications of usage statistics as an economic factor in scholarly communications, in
East Los Angeles College - CPGL - 0015
# Verb-initial grammars: A multilingual/parallel perspective# ESRC Project RES-000-23-0505# Oxford University# Charles Randriamasimanana# Malagasy phonetics/phonology &amp; morphology:# Stress assignment in Malagasy: ESRC-OX-05-CR109.# Stress a
Sveriges lantbruksuniversitet - IR - 2669
ND LEGISLATED POVERTYN E-w'S L E T T E RSe~tember2. 1987I-Low income people tak.e o n RichmondGAIN? &quot; I t ' s a complex subj;ect ,I1 r e l j l i e d RichmonQ, b r i n g i n g .out some charts and a p o i n t e r .iLow i c m e - people' a
East Los Angeles College - CPGL - 0015
# Verb-initial grammars: A multilingual/parallel perspective# ESRC Project RES-000-23-0505# Oxford University# Charles Randriamasimanana# Malagasy phonetics/phonology &amp; morphology:# Prefixation in Malagasy: ESRC-OX-05-CR108.# In Malagasy,
Sveriges lantbruksuniversitet - IR - 2771
DOWNTOWN EASTSIDE WOMEN'S CENTRE 44 East Cordova Street, Vancouver JANUARY 1995ISUNDAY 2 HAPPY NEW YEAR! 12:OO Bingo 5:00 CircleMONDAYTUESDAY3 1-3 Health nurse 1:30 Women'sWEDNESDAYTHURSDAYFRIDAYSATURDAYCENTRE CLOSEDvoice 5:00 Wo
Sveriges lantbruksuniversitet - LIB - 2771
DOWNTOWN EASTSIDE WOMEN'S CENTRE 44 East Cordova Street, Vancouver JANUARY 1995ISUNDAY 2 HAPPY NEW YEAR! 12:OO Bingo 5:00 CircleMONDAYTUESDAY3 1-3 Health nurse 1:30 Women'sWEDNESDAYTHURSDAYFRIDAYSATURDAYCENTRE CLOSEDvoice 5:00 Wo
East Los Angeles College - CPGL - 0015
# Verb-initial grammars: A multilingual/parallel perspective# ESRC Project RES-000-23-0505# Oxford University# Charles Randriamasimanana# Malagasy phonetics/phonology &amp; morphology:# Passive imperative ending in Malagasy: ESRC-OX-05-CR107.#
Sveriges lantbruksuniversitet - IR - 2773
DOWNTOWN EASTSIDE WOMEN'S CENTRE 44 East Cordova Street, Vancouver FEBRUARY 'II-SUNDAY1MONDAYTUESDAYWEDNESDAY1 1 3 Health nurse 1:00 Beading 3:00 educ. videoTHURSDAY2 1-3 MargaretlAlDS Vancouver 1030 learning grpFRIDAYSATURDAY
Sveriges lantbruksuniversitet - LIB - 2773
DOWNTOWN EASTSIDE WOMEN'S CENTRE 44 East Cordova Street, Vancouver FEBRUARY 'II-SUNDAY1MONDAYTUESDAYWEDNESDAY1 1 3 Health nurse 1:00 Beading 3:00 educ. videoTHURSDAY2 1-3 MargaretlAlDS Vancouver 1030 learning grpFRIDAYSATURDAY
East Los Angeles College - CPGL - 0015
# Verb-initial grammars: A multilingual/parallel perspective# ESRC Project RES-000-23-0505# Oxford University# Charles Randriamasimanana# Malagasy lexicon:# Malagasy Roots without a passive meaning: ESRC-OX-05-CR302.# These are roots witho
Sveriges lantbruksuniversitet - IR - 2637
Creators of the CommonsHeather Morrison BC Electronic Library Network &amp; The Imaginary Journal of Poetic EconomicsCopyright in Libraries: The Digital onundrum CLA Information Commons Interest Group Preconference to Canadian Library Association Conf
East Los Angeles College - CPGL - 0015
# Verb-initial grammars: A multilingual/parallel perspective# ESRC Project RES-000-23-0505# Oxford University# Charles Randriamasimanana# Malagasy syntax/semantics: # Nominals in Malagasy - ESRC-OX-06-CR210.# Malagasy Nominals.testfile.
Sveriges lantbruksuniversitet - IR - 3580
Wecandos o r n e t h i n a -People d o not end up o n the streets by accident. Public policy and urban development trends can either contribute to poverty and homelessness or prevent it. Vancouver is at a turning point; w e n o w have
Sveriges lantbruksuniversitet - LIB - 3580
Wecandos o r n e t h i n a -People d o not end up o n the streets by accident. Public policy and urban development trends can either contribute to poverty and homelessness or prevent it. Vancouver is at a turning point; w e n o w have
East Los Angeles College - CPGL - 0015
# Verb-initial grammars: A multilingual/parallel perspective# ESRC Project RES-000-23-0505# Oxford University# Charles Randriamasimanana# Malagasy syntax/semantics: # Malagasy Coordination - ESRC-OX-06-CR208.# Malagasy Coordination.testf
East Los Angeles College - CPGL - 0015
# Verb-initial grammars: A multilingual/parallel perspective# ESRC Project RES-000-23-0505# Oxford University# Charles Randriamasimanana# Malagasy syntax/semantics: # Grammatical functions, roots and stems in Malagasy - ESRC-OX-06-CR209.# Gr
East Los Angeles College - CPGL - 0015
# Verb-initial grammars: A multilingual/parallel perspective# ESRC Project RES-000-23-0505# Oxford University# Charles Randriamasimanana# Malagasy syntax/semantics: # Malagasy verbs of saying - - ESRC-OX-05-CR205.# This document will explor
East Los Angeles College - CPGL - 0015
# Verb-initial grammars: A multilingual/parallel perspective# ESRC Project RES-000-23-0505# Oxford University# Charles Randriamasimanana# Malagasy syntax &amp; semantics:# Malagasy Causatives &amp; CONTROL: ESRC-OX-05-CR202.# Causative prefixes amp(