Notes02_21 - Parallel Processing class notes Class...

Info icon This preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Parallel Processing class notes 2/21/11 Class outline 1) Intro to distributed systems  ­ ­ I am the scribe 2) team work, MPI project 3) P2P overview 4) define “botnets” 5) read MapReduce paper for next class 1. Distributed Systems: a collection of hardware and software rescources that coordinate to achieve something (typically resource sharing). They communicate via message passing. Examples 1. the intertoobz (the infrastructure for the web) 2. the web itself 3. circe rc cluster Consequences 1. concurrency  ­ ­ different things happening at the same time on different nodes 2. no completely precise global clock  ­ ­ there is usually pretty good synchronization between nodes, but not perfect 3. independent failures possible  ­ ­ hard to differentiate between a high latency connection and a failed node each node has: application on top of OS on top of hardware Challenges: 1. Heterogeneity a. of network  ­ ethernet and ??net via IP b. heterogeneity of hardware  ­ different data representations, such as little ­ vs. big ­endian, different word size, etc. c. heterogeneity of OS’s  ­ system calls for exchanging messages d. heterogeneity of languages  ­ different character representations e. heterogeneity of data types  ­ f. Solution to heterogeneity: standards and protocols! We love these. 2. Openness  ­ ­ the idea that a progran can be extended with new resources and new functions, a system is “open” if it is “extensible”  ­ ­ Can new services be added? Can new computers be added? The internet is extensible; extensions include e ­commerce and RSS. They published all the protocols, so others could add on to it. They used RFC’s, Request For Comments, like a suggestion box so that the services they rolled out were what users wanted. 3. Security  ­ ­ internet started as just a club of trustworthy people, but now it has expanded far beyond anyone’s expectations a. confidentiality  ­ message cannot be seen by non ­recipients b. integrity  ­ no one can edit messages they didn’t send 4. 5. 6. 7. c. availability  ­ resources must stay available for use d. Denial of service attacks  ­ very common, a malicious user can automate the sending of requests to such a high rate that the resource becomes inaccessible to normal users e. Security of mobile codes  ­ one may download some code or some program expecting it to do one thing, then it does another Scalability  ­ ­ the property of a system such that it stays effective as the number of users and/or the number of resources grows. a. resource cost may be a concern  ­ consider a network that requires clique topology; adding the nth node requires adding n ­1 connections. b. performance loss  ­ computing a solution on a problem of size n with log(n) time and resource cost is considered good scaling c. preventing software resources from running out  ­ i.e. an IP address was 32 bytes to save space on computers, but now we’re increasing it d. bottlenecks in design  ­ original DNS service was just one computer with one text file on it…as the number of sites grew, this computer because ridiculously overloaded Failure handling a. detect them when possible, i.e. a checksum b. redundancy  ­ makes it less likely that a failure will matter c. tolerate failures  ­ THIS DOESN’T SOUND VERY COMPUTER SCIENCY! d. recovery  ­ make a check point at intervals, and “roll back” to that in the case of a failure Concurrency a. race conditions, precautions need to be taken b. challenge only applies to shared resources Transparency a. access  ­ same interface accessing an internal and external resource, i.e. viewing files on another server may use same interface as local files b. location  ­ shouldn’t matter where a resource is physically, it must present always the same resources and information c. mobility  ­ ??? d. performance  ­ performance should be independent of changing load e. scaling  ­ performance should be independent of extent of resource 4. Botnets (see powerpoint as well) definition: a network of compromised computers that have been infected with malicious code and can be remotely controlled by a remote botmaster  ­ Often used in sending and propagating spam, 80% of all spam  ­ Estonian government site crashed due to botnet in 2007  ­ cheating in online polls and games  ­ bots can be recruited via email attachments containing malware which installs software which contacts the botmaster  ­ google and microsoft have each conducted a study where they scour websites for botnets and found that many are infected  ­ many millions of computers are infected currently 3. Peer to peer systems (see powerpoint as well) system architecture  ­ ­ a division of responsibilities between system components and the ??? of the components on the systems on the net a. Client server  ­ client makes requests to server, which sends back replies. A server can also be a client to other servers. b. peer to peer architecture  ­ also called a flat architecture, each user is equal, has the same responsibilities and priviledges c. they have passed the test of deployment and usability, they are currently in use, like KaZaA and eDonkey d. definition one: application that uses the resources on the edge of the internet (end user’s computers rather than central supercomputers) e. definition two: f. Napster had a central server that stored an index of which nodes had which files, modern P2P networks have no such central index a. decentralization of storage was a huge boon to Napster, great advance in computing! b. every participating node acted as both a client and a server, a “servent” c. weakenss of Napster: one central node with the index g. Gnutella had no central server, it was developed by Nullsoft in 14 days as a quick hack to share cooking recipes, AOL shut it down after just a few hours of running a. The protocol was published and third parties reverse engineered new clients b. Gnutella ended up having a big impact  ­ ­ lots of files exchanged, papers written, etc. c. search occurs in an unstuctured overlay (message flooding). Each node gets, rather randomly, four neighbors. One node sends a search to its neighbors, who send it to their neighbors until someone has the file, TTL starts and 7 and is decremented with each step, query dropped when TTL = 0 (TTL = time to leave). d. Researchers have been very interested in Gnutella, have conducted research using eavesdropping and crawling. e. Originally, there was great heterogeneity of connection speed. f. Gnutella protocal 0.6 interacted with so ­called “super ­nodes” differently than with “leaves”  ­ ­ it set up something more like a network of super ­nodes, each of which acted like a server to a subset of leaves. g. Gnutella had a free ­rider problem  ­ ­ 75% of users shared < 100 files h. Message flooding may involve many nodes seeing the same request twice, some strategies for diminishing that inefficiency exist ...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern