Notes02_21 - Parallel Processing class notes 2/21/11 ...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Parallel Processing class notes 2/21/11 Class outline 1) Intro to distributed systems  ­ ­ I am the scribe 2) team work, MPI project 3) P2P overview 4) define “botnets” 5) read MapReduce paper for next class 1. Distributed Systems: a collection of hardware and software rescources that coordinate to achieve something (typically resource sharing). They communicate via message passing. Examples 1. the intertoobz (the infrastructure for the web) 2. the web itself 3. circe rc cluster Consequences 1. concurrency  ­ ­ different things happening at the same time on different nodes 2. no completely precise global clock  ­ ­ there is usually pretty good synchronization between nodes, but not perfect 3. independent failures possible  ­ ­ hard to differentiate between a high latency connection and a failed node each node has: application on top of OS on top of hardware Challenges: 1. Heterogeneity a. of network  ­ ethernet and ??net via IP b. heterogeneity of hardware  ­ different data representations, such as little ­ vs. big ­endian, different word size, etc. c. heterogeneity of OS’s  ­ system calls for exchanging messages d. heterogeneity of languages  ­ different character representations e. heterogeneity of data types  ­ f. Solution to heterogeneity: standards and protocols! We love these. 2. Openness  ­ ­ the idea that a progran can be extended with new resources and new functions, a system is “open” if it is “extensible”  ­ ­ Can new services be added? Can new computers be added? The internet is extensible; extensions include e ­commerce and RSS. They published all the protocols, so others could add on to it. They used RFC’s, Request For Comments, like a suggestion box so that the services they rolled out were what users wanted. 3. Security  ­ ­ internet started as just a club of trustworthy people, but now it has expanded far beyond anyone’s expectations a. confidentiality  ­ message cannot be seen by non ­recipients b. integrity  ­ no one can edit messages they didn’t send 4. 5. 6. 7. c. availability  ­ resources must stay available for use d. Denial of service attacks  ­ very common, a malicious user can automate the sending of requests to such a high rate that the resource becomes inaccessible to normal users e. Security of mobile codes  ­ one may download some code or some program expecting it to do one thing, then it does another Scalability  ­ ­ the property of a system such that it stays effective as the number of users and/or the number of resources grows. a. resource cost may be a concern  ­ consider a network that requires clique topology; adding the nth node requires adding n ­1 connections. b. performance loss  ­ computing a solution on a problem of size n with log(n) time and resource cost is considered good scaling c. preventing software resources from running out  ­ i.e. an IP address was 32 bytes to save space on computers, but now we’re increasing it d. bottlenecks in design  ­ original DNS service was just one computer with one text file on it…as the number of sites grew, this computer because ridiculously overloaded Failure handling a. detect them when possible, i.e. a checksum b. redundancy  ­ makes it less likely that a failure will matter c. tolerate failures  ­ THIS DOESN’T SOUND VERY COMPUTER SCIENCY! d. recovery  ­ make a check point at intervals, and “roll back” to that in the case of a failure Concurrency a. race conditions, precautions need to be taken b. challenge only applies to shared resources Transparency a. access  ­ same interface accessing an internal and external resource, i.e. viewing files on another server may use same interface as local files b. location  ­ shouldn’t matter where a resource is physically, it must present always the same resources and information c. mobility  ­ ??? d. performance  ­ performance should be independent of changing load e. scaling  ­ performance should be independent of extent of resource 4. Botnets (see powerpoint as well) definition: a network of compromised computers that have been infected with malicious code and can be remotely controlled by a remote botmaster  ­ Often used in sending and propagating spam, 80% of all spam  ­ Estonian government site crashed due to botnet in 2007  ­ cheating in online polls and games  ­ bots can be recruited via email attachments containing malware which installs software which contacts the botmaster  ­ google and microsoft have each conducted a study where they scour websites for botnets and found that many are infected  ­ many millions of computers are infected currently 3. Peer to peer systems (see powerpoint as well) system architecture  ­ ­ a division of responsibilities between system components and the ??? of the components on the systems on the net a. Client server  ­ client makes requests to server, which sends back replies. A server can also be a client to other servers. b. peer to peer architecture  ­ also called a flat architecture, each user is equal, has the same responsibilities and priviledges c. they have passed the test of deployment and usability, they are currently in use, like KaZaA and eDonkey d. definition one: application that uses the resources on the edge of the internet (end user’s computers rather than central supercomputers) e. definition two: f. Napster had a central server that stored an index of which nodes had which files, modern P2P networks have no such central index a. decentralization of storage was a huge boon to Napster, great advance in computing! b. every participating node acted as both a client and a server, a “servent” c. weakenss of Napster: one central node with the index g. Gnutella had no central server, it was developed by Nullsoft in 14 days as a quick hack to share cooking recipes, AOL shut it down after just a few hours of running a. The protocol was published and third parties reverse engineered new clients b. Gnutella ended up having a big impact  ­ ­ lots of files exchanged, papers written, etc. c. search occurs in an unstuctured overlay (message flooding). Each node gets, rather randomly, four neighbors. One node sends a search to its neighbors, who send it to their neighbors until someone has the file, TTL starts and 7 and is decremented with each step, query dropped when TTL = 0 (TTL = time to leave). d. Researchers have been very interested in Gnutella, have conducted research using eavesdropping and crawling. e. Originally, there was great heterogeneity of connection speed. f. Gnutella protocal 0.6 interacted with so ­called “super ­nodes” differently than with “leaves”  ­ ­ it set up something more like a network of super ­nodes, each of which acted like a server to a subset of leaves. g. Gnutella had a free ­rider problem  ­ ­ 75% of users shared < 100 files h. Message flooding may involve many nodes seeing the same request twice, some strategies for diminishing that inefficiency exist ...
View Full Document

This note was uploaded on 02/18/2012 for the course CIS 4930 taught by Professor Staff during the Spring '08 term at University of South Florida.

Ask a homework question - tutors are online