1 Page

tacomaNapICDCS

Course: VIVO 23036, Fall 2008
School: Cornell
Rating:
 
 
 
 
 

Document Preview

Practical NAP: Fault-Tolerance for Itinerant Computations Dag Johansen Keith Marzulloy Fred B. Schneiderz Dmitrii Zagorodnovy Kjetil Jacobsen NAP is a protocol for supporting fault-tolerance in intinerant computations. It employs a form of failure detection and recovery, and it generalizes the primarybackup approach to a new compuational model. The guarantees o ered by NAP as well as an implementation for NAP in...

Register Now

Unformatted Document Excerpt

Coursehero >> New York >> Cornell >> VIVO 23036

Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.

Course Hero has millions of student submitted documents similar to the one below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Practical NAP: Fault-Tolerance for Itinerant Computations Dag Johansen Keith Marzulloy Fred B. Schneiderz Dmitrii Zagorodnovy Kjetil Jacobsen NAP is a protocol for supporting fault-tolerance in intinerant computations. It employs a form of failure detection and recovery, and it generalizes the primarybackup approach to a new compuational model. The guarantees o ered by NAP as well as an implementation for NAP in tacomaare discussed. One use of mobile agents is support for itinerant computation 5]. An itinerant computation is a program that moves from host to host in a network. Which hosts the program visits is determined by the program. The program can have a pre-de ned itinerary or can dynamically compute the next host to visit as it visits each successive host; it can visit the same host repeatedly or it can even create multiple concurrent copies of itself on a single host. Itinerant computations are susceptible to processor failures, communications failures, and crashes due to program bugs. Prior work in fault-tolerance for itinerant computations has focused on the use of replication and masking. For example, 14] discusses a technique for replicating (on independently failing processors) the environment|herein called a landing pad| in which an itinerant computation executes. Thus, failures are masked below the landing pad and the programmer of an itinerant computation need not be concerned with handling them. Replication and masking, however, has limitations Department of Computer Science, University of Troms , Troms , Norway. This work was supported by NSF (Norway) grant No. 112578/431 (DITS program). y Department of Computer Science and Engineering, University of California San Diego, La Jolla 92093-0114, California, USA. In doing this work, Marzullo was supported by NSF (Norway) grant No. 112578/431 (DITS program) z Department of Computer Science, Cornell University, Ithaca 14853-7501, New York, USA. Supported in part by ARPA/RADC grant F30602-96-1-0317 and AFOSR grant F49620-94-1-0198. The views and conclusions contained herein are those of the author and should not be interpreted as necessarily representing the o cial policies or endorsements, either expressed or implied, of these organizations or the U.S. Government. Abstract 1 Introduction because replication requires redundant processing, which is expensive. Furthermore, preserving the necessary consistency between replicas can be done efciently only within a local-area network. Replication and masking approaches are also unable to tolerate program bugs. Thus, a fault-tolerance method based on failure detection and recovery seems the better choice when itinerant compuations must operate beyond a local area network and must employ potentially buggy software. We present such a fault-tolerance method in this paper. It has roots in the primary-backup approach 1, 4], only with the xed backup processors being replaced by mobile agents called rear guards 9]. With our method, a rear guard performs some recovery action and continues the itinerant computation after a failure is detected. The key di erences between our approach and the primary-backup approach are: Unlike a backup which, in response to a failure, continues executing the program that was running, a recovering rear guard executes recovery code. The recovery code can be identical to the code that was executing when the failure occurred, but it need not be. Rear guards are not executed by a single, xed, set of backups. Instead, rear guards are hosted by landing pads where the itinerant computation recently executed. Much of what is novel about NAP stems from the need to orchestrate rear guards as the itinerant computation moves from host to host. We call our protocol NAP.1 The idea for such a protocol was rst discussed in 9]. This paper eshes out the idea, describing the tacoma 8] landing-pad support for NAP and the guarantees that NAP can provide to programmers. We also discuss an actual Python-based implementation of NAP. 1 NAP stands for Norwegian Army Protocol. The protocol was motivated by a strategy employed by the rst author's Army troop for moving in a hostile territory. 1 In tacoma, an itinerant computation is structured from mobile agents. Each host in the network is assumed to run a landing pad; a mobile agent is started on host H by giving the landing pad at H the program text and the initial state of the agent. A program running on a host can crash, and a host or landing pad can crash thereby crashing all programs running on that host or landing pad. When a mobile agent terminates as a result of one of these crashes, we say that execution of the agent has experienced a fault. We assume that a fault is eventually detected by one of more of a small, well-de ned set of landing pads. This is equivalent to assuming the fail-stop failure model of 13]. Replication of data and control is what enables an itinerant computation to recover from faults. We can characterize how much replication is needed in terms of a parameter, f. One simple characterizaton is given by: hosts or landing pads during the maximum period of time it takes the agent to traverse i distinct hosts. This characterization is convenient because f remains xed during the entire itinerant computation. However, a more practical characterization would have f depending on the host currently being visited (how reliable is it?) and on the current state of the itinerant computation. We use the Bounded Crash Rate characterization in this paper for expository simplicity; extending our protocols to more realistic characterizations is straightforward. Finally, each pair of hosts in the network is assumed to be connected by a FIFO communications link that masks communications failures. In Section 6, we revisit this assumption and discuss how to adapt NAP to networks that can partition. 2 Assumptions i f, there can be no more than i crashes of Bounded Crash Rate. For any integer 0 3 Fault-Tolerant Itinerant Computations in tacoma A tacoma mobile agent can move to another host using a move operation, or continue executing on the current host and create a new agent on another host using a spawn operation. This means that execution of a tacoma mobile agent de nes a sequence of actions, where a mobile agent executing its ith action is said to be version i of that mobile agent. For a mobile agent a, we denote version i of this agent as a i]. In tacoma, a fault during the execution of an action terminates that agent in an unde ned state, except an option to tacoma's move can specify that the interrupted action be re-executed when (and if) the a ected landing pad restarts 8]. To accommodate the guarantess that NAP implements, we therefore now extend the de nition of a tacoma action along lines rst proposed in connection with fault-tolerant actions 13]. A fault-tolerant action FTA FTA: action A recovery A where A is called a regular action and A is called the recovery action associated with A executes according to the following: 1. A executes at most once, either with or without failing. 2. If A fails, then A executes at least once and executes without failing exactly once. 3. A executes only if A fails. An action fails if that action experiences a fault during its execution. A fault that occurs between the execution of two fault-tolerant actions is attributed to one or the other. So, it is possible for all of the user's code in A to execute, yet to have A also execute because a fault occurs just after A nishes. However, once a subsequent action A0 starts executing, a fault will result in A0 executing rather than A executing. Fault-tolerant actions are general enough to program any kind of fault-tolerance scheme that is based on detection and recovery. For example, given an operation undo/redo mechanism 3], fault-tolerant actions can be used to implement atomic transactions. The recovery action that an agent should take will most likely be changed when that agent moves or spawns a new agent. Hence, move and spawn both are de ned as terminating an action.2 For example, Figure 1 shows an itinerant computation originating with a1 1]. The second version of agent a1, a1 2], starts when a1 1] executes move naming host H2 and terminates by executing spawn. The spawn creates both the third version a1 3] of a1, still on H2, and the third version a2 3] of a new agent a2 (on H4). By convention, we de ne a1 2] to be the second version of a1 and a2. 2 A third operation, checkpoint, also terminates an action. This operation is described later in this section. move move checkpoint a1[1] Host H1 a1[2] a1[3] Host H2 a1[4] a1[5] Host H3 spawn move a2[3] a2[5] Host H4 move a2[4] Host H5 Figure 1: Versions of Mobile Agents tacoma agents can be written in many di erent languages, so fault-tolerant actions are encoded rather than being programmed using the syntax given above. For this encoding, the state of a tacoma mobile agent is described in a data structure called a briefcase. A briefcase stores a named set of folders, hname; valuei pairs; each of the names in the briefcase is unique. A tacomamobile agent's briefcase would have ve folders associated with fault-tolerant actions and two additional folders associated with recovery actions. The purpose of these folders is summarized in Table 1. The e ect of move and spawn can be described operationally in terms of folders. For example, move(b) starts executing the program given as the head3 of b:code at the landing pad named in the head of b:host4 . This code starts executing as a regular action, and it is given a briefcase b0 identical to b except that: b0 :host is the tail of b:host. the itinerant computation. When a regular action executing with a briefcase b experiences a fault, the code for the recovery action is the head of b:recovery. The briefcase b0 this recovery action gets is identical to b except that two new folders are added: b00 :recovery host is the identity of host upon which the recovery action is executing. ture of the failure of the regular action. b00 :failure status is information about the na- b0 :code is the tail of b:code. b0 :recovery is the tail of b:recovery. b0 :version is b:version + 1. A fault during exection of a regular action invokes the associated recovery action, and a fault during execution of a recovery action causes that recovery action to be re-executed. With NAP, the recovery action executes on some landing pad that was recently visited by 3 Given a list the head of is the rst element of the list and the tail of is the list with the head removed. 4 The list of hosts can be changed at any time, so the itinerary of a mobile agent changes under program control. `, ` ` b:host A mobile agent can interact with its environment, and|at times|the mobile agent will need to change its recovery action for such interaction. For example, suppose a mobile agent nds some information on a host it is visiting, and because of this information the mobile agent decides to delete a local le. If this le should be deleted no matter how the local information changes, then the recovery action should change to ensure that the le is eventually deleted. This need to change recovery actions is a manifestation of the output commit problem 6]: before taking an irrevocable action, the mobile agent ensures that its current state is stable so that any recovery action will have the information that led to the irrevocable action and will be able to complete the action (even if the regular action was interrupted by a fault). A third tacoma operation, checkpoint, can be used to do ensure that saved state is stable, so that state is available to recovery actions. Figure 1, for example, shows version a1 4] creating version a1 5] by executing checkpoint. Operationally, checkpoint(b) is like move(b) but the new action head(b:code) is executed at the current landing pad rather than at head(b:host) and, therefore, the implementation of checkpoint can be cheaper than implementing it directly with move. Appendix A contains a tacoma mobile agent that illustrates the implementation of fault-tolerant actions by use of the tacomamove, spawn, and At a high level, implementing NAP is simple. Consider a regular action a i] executing at a landing pad Li . When a i] terminates, the identity of the next landing pad Li+1 is the head of the host folder in current briefcase b. We can thus achieve the desired behavior for NAP if Li uses a reliable broadcast protocol 7] to send b to a set G(a i]) of landing pads, where the rear guards for a i] and the landing pad Li+1 are in G(a i]). Reliable broadcast guarantees that all nonfaulty landing pads in G(a i]) either deliver b or do not deliver b. Three outcomes are possible from the reliable broadcast: 1. No landing pad delivers b. This implies that the landing pad Li crashed. The recovery action a i] should be executed by one of the rear guards in 4 Protocol checkpointoperations. 4.1 Runtime Architecture G(a i]). 2. Li+1 delivers b. This implies that all nonfaulty landing pads in G(a i]) have delivered b. The regular action a i + 1] should thus begin to execute. 3. Some landing pad delivers b, but Li+1 does not. This implies that Li+1 crashed. A rear guard for a i + 1] in G(a i]) will determine this fact and execute the recovery action a i + 1]. The reliable broadcast protocol we use for our implementation of NAP is a re nement of the one presented in 15], instantiated with a linear \broadcast strategy." Here is how that works. 4.2 Reliable Broadcast Each host has, in one process, a landing pad thread and a failure detection thread. The landing pad maintains a NAP state object that stores information about mobile agents the host is executing or for which the host serves as a rear guard. The landing pad thread informs the failure detection thread which landing pads to monitor. (See below.) Each mobile agent at a host executes in its own process; that process is created by the host's landing pad and, therefore, the reliable broadcast is initiated when mobile agent process exits. Consider a process p0 that broadcasts a value b to a group G = fp0; p1 ; : : : ; pn 1 g. For process p0 to ensure that all nonfaulty processes in G either deliver b or do not deliver b, p0 sends b to p1 and waits for an acknowledgment from p1 . Process p1 , upon receipt of b from p0 , ensures that, assuming it does not fail, all nonfaulty processes in G fp0 g deliver b. In general, when pi receives b it becomes responsible for ensuring that b is delivered by all nonfaulty processes in G fp0; p1 ; : : : ; pi 1 g = fpi ; pi+1 ; : : : ; pn 1 g. And when this obligation is discharged, pi sends an acknowledgment to pi 1 . Thus, if there are no crashes, then message b will travel from p0 to p1 to p2 and so on to pn 1 , and then the acknowledgment will travel back from pn 1 to pn 2 to pn 3 and so on back to p0 . After pi sends b to pi+1 , process pi monitors pi+1 for a crash. If pi detects pi+1 's crash before receiving an acknowledgment from pi+1 , then pi takes over the task of establishing that the nonfaulty processes in fpi+1 ; pi+2 ; : : : ; pn 1 g deliver b. In particular, pi sends b to pi+2 and waits for an acknowledgment from pi+2 . pi+2 sends the acknowledgment to pi when it can. (For example, pi+2 can immediately send the acknowledgment if it had already sent an acknowledgment to pi 1 ). If pi detects pi+2 's crash before receiving this acknowledgment, then pi continues by sending b to pi+3 , and so on. The reliable broadcast protocol in 15] also implements an election protocol: there is always eventually one process (initially p0 ) that knows itself to be elected. A process remains elected until it fails. This is important when using arbitrary broadcast strategies, because if p0 fails, then a process must take over to complete the broadcast. The election protocol used in NAP is as follows 3]: 1. Upon receiving b from pi k , process pi monitors for the crash of pi k . 2. If while monitoring pi k , process pi then detects the crash of pi k then pi either monitors for the crash of pi k 1 (if k 6= i) or it elects itself (if k = i). NAP builds on the reliable broadcast protocol just given. Process p` in the reliable broadcast protocol is assigned to the landing pad Li+1 ` that executed regular action a i + 1 `]. Two simple changes are: 1. By the Bounded Failure Rate assumption, once f + 1 landing pads have b, then b cannot be lost due to crashes. Thus, once f+1 landing pads have b, it is safe for Li+1 . Therefore, once a landing 4.3 NAP use host list of hosts to be visited (head is the next host to visit) code list of regular actions (head is next to be executed) recovery list of recovery actions (head is associated with this action) version the version of the current action num guards minimum number of rear guards rally point list of hosts to retreat to in case of disaster recovery host host on which recovery action is executing failure status information regarding failure of regular action folder Table 1: Folders relevant to Fault-Tolerant Actions pad L determines that f + 1 landing pads have received b (equivalently, that b:NUM GUARDS rear guards have b), L sends a b stable message to Li+1 . Li+1 does not start executing a i + 1] until it receives this message. 2. If a landing pad nds itself elected after having last received b, then it starts executing the recovery action a i]. In the remainder of this section, we describe other changes. Appendix B gives the complete protocol in pseudocode. pads could piggyback information with their NAP acknowledgments. The information, for example, might include performance measurements provided by the failure detection thread. Li+1 could use this information to determine which rear guard is introducing the most latency and therefore should leave G(a i + 2]). This rear guard's identity could be included in the broadcast of bi+1 . One additional membership rule is required for when a mobile agent revisits a landing pad. That landing pad may nd itself twice in the broadcast strategy. For example, consider agent a2 in Figure 1. If f = 3, then G(a2 5]) = fH1; H2; H4; H5g where H4 both precedes and follows H5 in the broadcast strategy. When this happens, the second entry is dropped from the broadcast strategy. For example, the broadcast to G(a2 5]) uses the broadcast strategy H4; H5; H2; H1. Membership. One can think of NAP as a reli- able broadcast protocol to a process group, where the group changes with each broadcast. The changes are determined by membership rules: G(a i]) is de ned to be G(a i 1]) plus a set of landing pads that join G(a i]) and minus a set of landing pads that leave G(a i]). The only requirement on group membership that we require is that G(a i]) include Li+1 . Group G(a i]) must contain at least f +1 members. Thus, any landing pad that receives b after f +1 landing pads have received b need not deliver b nor need be in G(a i + 1]). Since landing pads sequentially learn the number of landing pads that have b, we can use the following rule: when a landing pad receives b, if previously f + 1 other landing pads have already delivered b, then it leaves G(a i]). This rule is attractive, because it is simple to implement and has an intuitive appeal. There are other plausible rules for choosing which landing pads leave G(a i]). The oldest landing pads might be required to remain in G(a i]), since they have not failed recently and thus appear to be stable. With this rule, the latest rear guard would drop out of G(a i]) once it receives the acknowledgment that the broadcast of b is complete. More generally, landing Catastrophic Failure. Although not admitted by our failure model, in practice there will be situations (such as programming bugs) in which action recovery a i] will fail repeatedly. All rear guards thus fail. A reasonable response for this case is to pass the briefcase b of the failing agent to a well-known host; we call this host the rally point. The identity of the rally point is speci ed in the rally point folder. One implementation of would have rally point prp be a member of the group G(a i]) for each version i, and to have prp take over should it detect all of the other members of the group as having crashed. A more e cient implementation is to have at least f +1 rather than f rear guards. If a rear guard nds that all other rear guards have failed, it passes the briefcase to prp . NAP for this agent must also terminate. Surprisingly, even though the reliable broadcast protocol that NAP is based on cannot terminate 15], orchestrating ter- Termination. When a mobile agent terminates, the mination of NAP is straightforward. The tacoma operation exit is a command that instructs a landing pad to terminate support for the corresponding mobile agent. Suppose the last user-de ned action of some mobile agent is: FTA! : action A! recovery A! To orchestrate termination of NAP, FTA! can be then replaced by two actions: action f A! ; checkpointg recovery A! ; When the last landing pad executes exit, it will appear to have crashed, resulting in a failure detection5 The election protocol in NAP will then choose a rear guard to execute the recovery action. The agent that executes the recovery action will then terminate executing NAP, causing another failure detection and another rear guard executing the recovery action. This will continue until all rear guards have terminated executing NAP for this program. When a rally point is de ned, this termination protocol will pass the nal briefcase b! to b! :rally point. Hence, all executions end up at the rally point at termination. The reason for termination (abnormal or regular) can be recorded in the nal briefcase b! . egy yields a simple protocol, but has the worst latency of all broadcast strategies. With a linear broadcast strategy, before a version of a mobile agent can start executing, a chain of f + 1 messages must be sent and received. As we show in Section 5, for a move operation and for reasonably small values of f, the latency of the reliable broadcast is subsumed by the latency of initializing the new agent version, but for spawn and checkpoint the latency can be signi cant. For spawn and checkpoint, optimistic execution can mask some of the latency imposed by the reliable broadcast. Instead of blocking the execution of a new mobile agent version a i + 1] until a \b stable" message is delivered locally, a i + 1] starts executing as soon as possible. This creates the danger that crashes may cause a i] to be executed after user code associated with a i + 1] starts executing. If this does pose a problem, then a i + 1] can use the tacoma wait stable operation to block until b has been delivered by at least f + 1 landing pads. If a i + 1] does not explicitly execute wait stable, then wait stableis implicitly executed at the end of a i + 1]. 5 Tthe failure detection latency can be reduced by sending an explicit message indicating that the landing pad is terminating. An illustration of this optimization appears in Appendix A. We have implemented NAP in a Python-based6 version of tacoma. We chose Python because it is a convenient language for prototyping. Of primary concern was deciding how we would integrate NAP into the existing tacoma architecture. The performance of this rst version of NAP in tacomawas of less importance The cost of doing a move with NAP are given in Table 2. These values were obtained on a system comprising Pentium Pro processors with 200 MHz clocks. Each machine had 128MB of RAM and 100MB Ethernet. Each was running FreeBSD 2.2.7. To compute each value in Table 2, 100 measurements were made; the standard deviation was within 5 percent of the averages. A least-squares t to these values gives the cost of a move given g rear guards as 51:6 + 87:5g msec. We expect to be able to reduce this cost signi cantly. number of rear guards time (msec) 5 Implementation action exit recovery exit; 01 2 3 4 54 138 235 311 405 Reducing Latency. Using a linear broadcast strat- Table 2: Cost of NAP as a function of number of rear guards 6 Conclusions NAP provides fault-tolerance for itinerant compuations at low cost. The replication needed for faulttolerance is obtained by leaving some code running at landing pads the mobile agent visited recently. No additional processors are required, and the recovery that a mobile agent performs in response to a crash is something that can be speci ed by the programmer. Thus, when a low cost method of recovery is possible, the programmer can use that method (rather than, for example, active replication 14] or primary-backup 12]). We believe that this exibility is especially important when partitioning is possible. NAP is based on a reliable broadcast that uses a linear broadcast strategy. A linear broadcast strategy results in a simple rule for determining when a landing pad should be dropped from the rear guards. For small values of f, the latency of NAP is subsumed by the cost of a move, the most common method of terminating a regular action. The latency is not subsumed 6 http://www.python.org/doc/ref/. A reference manual for Python can be found at by the cost of a spawn, though. The latency could be reduced by using a broadcast strategy with a larger fanout than our linear broadcast strategy. We are examining versions of NAP built using such broadcast strategies for itinerant computations that frequently use spawn and checkpoint. NAP, as presented here, cannot be implemented in a system that can experience partitions, because no crash failure-detector can be implemented in such a system. However, in systems that can partition, processes within the same partition can agree on which processes are unreachable (even though they cannot distinguish between the case of the unreachable process being crashed or being partitioned away 16]). With such a failure detector, a network partitioning into two connected components may lead to a regular action and its recovery action both executing without failing. We are currently designing a version of NAP that provides better support for partitioned operation. The failure detection thread for this version is as described above: it implements consistent detection within a set of connected landing pads of the unreachability of the other landing pads. This version also has a set of tools that aid the tacoma programmer in writing a tacomamobile agent that executes in a partitionable environment. For example, tacoma already provides a mechanism for the transactional update of collections of folders on stable storage. We plan to use this mechanism to allow applications to have the same measure of fault-tolerance that, for example, the protocol of 12] gives. It will also allow for applications more demanding than those supported by 12], such as those for which a transaction spans many landing pads. For mobile agents that do not require such strict semantics, we will have tools that provide information on the network's topology and current performance. Such tools allow one to write \partition-aware" 2] mobile agents. The mobile agent described in Appendix A is one that we believe would t well into this second class of applications. 2] O. Babaoglu, R. Davoli, A. Montresor, and R. Segala. System support for partition-aware network applications. In Proceedings of the Eighteenth International Conference on Distributed Computing Systems, Amsterdam, The Netherlands, 26-29 May 1998, pages 184-191. 3] Philip A. Bernstein, Nathan Goodman, and Vassos Hadzilacos. Concurrency Control and Recovery in Database Systems. Addison-Wesley 1987. 4] Navin Budhiraja, Keith Marzullo, Fred B. Schneider, and Sam Toueg. Primary-backup protocols: lower bounds and optimal implementations. in Proceedings of the Third IFIP Working Conference on Dependable Computing for Critical Applications (DCCA-3), Mondello, Italy, 1416 September 1992, pp. 321{343. 5] David Chess, Benjamin Grosof, Colin Harrison, David Levine, Colin Parris, and Gene Tsudik. Itinerant Agents for Mobile Computing. IEEE Personal Communications 2(5):34-49, October 1995. 6] E. N. Elnozahy and W. Zwaenepoel. On the use and implementation of message logging. In Digest of Papers, The Twenty-Fourth International Symposium on Fault-Tolerant Computing, Austin, TX, USA, 15-17 June 1994, pp. 298-307. 7] Vassos Hadzilacos and Sam Toueg. Fault-tolerant broadcasts and related problems. In Distributed Systems, Second Edition, Sape Mullender editor, ACM Press Frontier Series, Addison-Welsey 1993. 8] Dag Johansen, Robbert van Renesse and Fred B. Schneider. An Introduction to TACOMA distributed system version 1.0. University of Troms Department of Computer Science Technical Report 95-23, June 1995. 9] Dag Johansen, Robbert van Renesse and F. B. Schneider. Operating systems support for mobile agents. In Proceedings of the Fifth IEEE Workshop on Hot Topics in Operating Systems, Orcas Island, Wahsington, USA, 4{5 May 1995, pp. 42{ 45. 10] Friedmann Mattern. Virtual time and global states of distributed systems. In Parallel and Distributed Algorithms (M. Cosnard et. al. editor), Elsevir Science Publishers B. V. 1989, pp. 215{ 226. Acknowledgements We would like to thank the other members of the tacoma researh group and the anonymous referees for insightful comments on earlier versions of the paper. References 1] P. A. Alsberg and J. D. Day. A principle for resilient sharing of distributed resources. In Proceedings of the Second International Conference on Software Engineering, San Francisco, California, USA, 13-15 October 1976, pp. 627{644. 11] G. van Rossum and J. de Boer. Linking a stub generator (AIL) to a prototyping language (Python). In Proceedings of the Spring 1991 EurOpen Conference, Troms , Norway, 20-24 May 1991, pp. 229-247. 12] Kurt Rothermel and Markus Stra er. A faulttolerant protocol for providing the exactly-once property of mobile agents. In Proceedings of the Seventeenth IEEE Symposium on Reliable Distributed Systems, 20-23 October 1998, pp. 100108. 13] R. D. Schlichting and F. B. Schneider. Failstop processors: An approach to designing faulttolerant computing systems. ACM Transactions on Computer Systems 1(3):222-238, August 1983. 14] F. B. Schneider. Towards fault-tolerant and secure agentry. In Proceedings of the Eleventh Workshop on Distributed Algorithms, Saarbrucken, Germany, 24{26 September 1997, pp. 1-14. 15] F. B. Schneider, D. Gries, and R. D. Schlichting. Fault-tolerant broadcasts. Science of Computer Programming 4(1):1{15, April 1984. 16] J. Sussman and K. Marzullo. The Bancomat problem: AN example of resource allocation in a partitionable asynchronous system. In Proceedings of DISC'98: Twelfth International Symposium on Distributed Computing, 23-25 September 1998, Andros, Greece, pp 363{377. The following description of a tacoma mobile agent illustrates the programming and use of faulttolerant actions. The mobile agent visits a set of hosts, speci ed as a parameter. For each host visited, the mobile agent creates a folder that describes the action the agent took there or whether it found the host to be unavailable. This folder is returned to the originating host. The mobile agent takes the following actions for each host it visits: If le license exists and contains the word \customer", then the mobile agent renames the le program to old_program and writes a new le program. Otherwise, the mobile agent deletes the le A Example: License Checker program. If le license exists and contains the word \demo", then the mobile agent takes no action. We wish the agent to update the host with some care, however. ...

Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more. Course Hero has millions of course specific materials providing students with the best way to expand their education.

Below is a small sample set of documents:

Cornell - CS - 99
SASI Enforcement of Security Policies: A Retrospective Ulfar Erlingsson Fred B. SchneiderDepartment of Computer Science Cornell University Ithaca, New York 14853 AbstractSASI (Security Automata SFI Implementation) enforces security policies by mo
Cornell - VIVO - 23036
SASI Enforcement of Security Policies: A Retrospective Ulfar Erlingsson Fred B. SchneiderDepartment of Computer Science Cornell University Ithaca, New York 14853 AbstractSASI (Security Automata SFI Implementation) enforces security policies by mo
Cornell - CS - 99
IRM Enforcement of Java Stack Inspection Ulfar Erlingsson deCODE Genetics Lyngh ls 1, 110 a Reykjavk, Iceland ulfar@decode.is Fred B. Schneider Department of Computer Science Cornell University Ithaca, New York 14853 fbs@cs.cornell.eduAbstractTw
Cornell - VIVO - 23036
IRM Enforcement of Java Stack Inspection Ulfar Erlingsson deCODE Genetics Lyngh ls 1, 110 a Reykjavk, Iceland ulfar@decode.is Fred B. Schneider Department of Computer Science Cornell University Ithaca, New York 14853 fbs@cs.cornell.eduAbstractTw
Cornell - CS - 99
IRM Enforcement of Java Stack Inspection Ulfar Erlingsson deCODE Genetics Lynghls 1, 110 Reykjav a k Iceland Fred B. Schneider Department of Computer Science Cornell University Ithaca, New York 14853February 19, 2000Abstract Two implementations
Cornell - VIVO - 23036
IRM Enforcement of Java Stack Inspection Ulfar Erlingsson deCODE Genetics Lynghls 1, 110 Reykjav a k Iceland Fred B. Schneider Department of Computer Science Cornell University Ithaca, New York 14853February 19, 2000Abstract Two implementations
Cornell - CS - 99
Open Source in Security: Visiting the BizarreFred B. Schneider Department of Computer Science Cornell University Ithaca, New York 14853 fbs@cs.cornell.eduAbstractAlthough open-source software development has virtues, there is reason to believe th
Cornell - VIVO - 23036
Open Source in Security: Visiting the BizarreFred B. Schneider Department of Computer Science Cornell University Ithaca, New York 14853 fbs@cs.cornell.eduAbstractAlthough open-source software development has virtues, there is reason to believe th
Cornell - CS - 99
A Language-Based Approach to SecurityFred B. Schneider1 , Greg Morrisett1 , and Robert Harper22Cornell University, Ithaca, NY Carnegie Mellon University, Pittsburgh, PA1Abstract. Language-based security leverages program analysis and program
Cornell - VIVO - 23036
A Language-Based Approach to SecurityFred B. Schneider1 , Greg Morrisett1 , and Robert Harper22Cornell University, Ithaca, NY Carnegie Mellon University, Pittsburgh, PA1Abstract. Language-based security leverages program analysis and program
Cornell - VIVO - 23036
Chain Replication for Supporting High Throughput and AvailabilityRobbert van Renesservr@cs.cornell.eduFred B. Schneiderfbs@cs.cornell.eduFAST Search & Transfer ASA Troms, Norway and Department of Computer Science Cornell University Ithaca, New
Cornell - VIVO - 23036
Peer-to-Peer Authentication with a Distributed Single Sign-On ServiceWilliam Josephson, Emin G n Sirer, Fred B. Schneider uDepartment of Computer Science Cornell University Ithaca, New York 14853Abstract. CorSSO is a distributed service for authe
Cornell - VIVO - 23036
Peer-to-Peer Authentication with a Distributed Single Sign-On ServiceWilliam Josephsonwkj@cs.cornell.eduEmin G n Sirer uegs@cs.cornell.eduFred B. Schneiderfbs@cs.cornell.eduDepartment of Computer Science Cornell University Ithaca, New York
Cornell - VIVO - 23036
Distributed Blinding for Distributed ElGamal Re-encryptionLidong Zhou Microsoft Research Silicon Valley Mountain View, CA lidongz@microsoft.com Fred B. Schneider Department of Computer Science Cornell University fbs@cs.cornell.edu Michael A. Marsh I
Cornell - VIVO - 23036
Distributed Blinding for ElGamal Re-encryptionLidong Zhou Michael A. Marsh , , Fred B. Schneider, and Anna Redz January 2, 2004Abstract A protocol is given that allows a set of n servers to cooperate and produce an ElGamal ciphertext encrypted un
Cornell - VIVO - 23036
Belief in Information FlowMichael R. Clarkson Andrew C. Myers Fred B. Schneider Department of Computer Science Cornell University {clarkson,andru,fbs}@cs.cornell.edu AbstractInformation leakage traditionally has been dened to occur when uncertainty
Cornell - VIVO - 23036
Certied In-lined Reference Monitoring on .NET Kevin W. HamlenCornell University hamlen@cs.cornell.eduGreg MorrisettHarvard University greg@eecs.harvard.eduFred B. SchneiderCornell University fbs@cs.cornell.eduAbstractMobile is an extension
Cornell - VIVO - 23036
Certied In-lined Reference Monitoring on .NETKevin W. Hamlen Cornell University Greg Morrisett Harvard University November 9, 2005 Fred B. Schneider Cornell UniversityAbstract MOBILE is an extension of the .NET Common Intermediate Language that pe
Cornell - VIVO - 23036
Network Security and the Need to Consider Provider Coordination in Network Access PolicyAaron J. BursteinSamuelson Law, Technology & Public Policy Clinic Berkeley Center for Law & Technology University of California, Berkeley School of Law (Boalt H
Cornell - VIVO - 23036
The Building Blocks of ConsensusYee Jiun Song1 , Robbert van Renesse1 , Fred B. Schneider1 , and Danny Dolev22 1 Cornell University The Hebrew University of JerusalemAbstract. Consensus is an important building block for building replicated syste
Cornell - VIVO - 23036
HyperpropertiesMichael R. Clarkson Fred B. Schneider{clarkson,fbs}@cs.cornell.edu Department of Computer Science Cornell UniversityAbstractProperties, which have long been used for reasoning about systems, are sets of traces. Hyperproperties, i
Cornell - VIVO - 23036
HyperpropertiesMichael R. ClarksonFred B. Schneider{clarkson,fbs}@cs.cornell.edu Department of Computer Science Cornell University Computing and Information Science Technical Report http:/hdl.handle.net/1813/9480 January 25, 2008Hyperproperti
Cornell - MATH - 7
Chaos in a spatial epidemic modelRick Durrett* and Daniel Remenik *Department of Mathematics and Center for Applied Mathematics, Cornell University, Ithaca, New York 14853 November 17, 2008Abstract We investigate an interacting particle system ins
Cornell - VIVO - 24452
Training Structural SVMs when Exact Inference is IntractableThomas Finley tomf@cs.cornell.edu Thorsten Joachims tj@cs.cornell.edu Cornell University, Department of Computer Science, Upson Hall, Ithaca, NY 14853 USAAbstractWhile discriminative tr
Cornell - VIVO - 24452
Predicting Diverse Subsets Using Structural SVMsYisong Yue Department of Computer Science, Cornell University, Ithaca, NY 14853 USA Thorsten Joachims Department of Computer Science, Cornell University, Ithaca, NY 14853 USAyyue@cs.cornell.edu tj@c
Cornell - VIVO - 24452
Learning Diverse Rankings with Multi-Armed BanditsFilip Radlinski Robert Kleinberg Thorsten Joachims Department of Computer Science, Cornell University, Ithaca, NY 14853 USA filip@cs.cornell.edu rdk@cs.cornell.edu tj@cs.cornell.eduAbstractAlgorit
Cornell - VIVO - 24452
Information Genealogy: Uncovering the Flow of Ideas in Non-Hyperlinked Document DatabasesBenyah ShaparenkoDepartment of Computer Science Cornell University Ithaca, NY 14853Thorsten JoachimsDepartment of Computer Science Cornell University Ithaca
Cornell - VIVO - 24452
Active Exploration for Learning Rankings from Clickthrough DataDepartment of Computer Science Cornell University Ithaca, NY, USAFilip Radlinskilip@cs.cornell.eduDepartment of Computer Science Cornell University Ithaca, NY, USAThorsten Joachi
Cornell - VIVO - 24452
Parameter Learning for Loopy Markov Random Fields with Structural Support Vector MachinesThomas Finley tomf@cs.cornell.edu Thorsten Joachims tj@cs.cornell.edu Cornell University, Department of Computer Science, Upson Hall, Ithaca, NY 14853 USAAbs
Cornell - VIVO - 24452
A Support Vector Method for Optimizing Average PrecisionYisong YueCornell University Ithaca, NY, USA yyue@cs.cornell.eduThomas FinleyCornell University Ithaca, NY, USA tomf@cs.cornell.eduFilip RadlinskiCornell University Ithaca, NY, USA lip@c
Cornell - VIVO - 24452
Support Vector Training of Protein Alignment ModelsChun-Nam John Yu1 , Thorsten Joachims1 , Ron Elber1 , and Jaroslaw Pillardy21Dept. of Computer Science, Cornell University, Ithaca NY 14853, USA {cnyu,tj,ron}@cs.cornell.edu 2 Cornell Theory Cent
Cornell - VIVO - 24452
Recommending Related Papers Based on Digital Library Access RecordsStefan Pohl sp424@cs.cornell.edu ABSTRACTAn important goal for digital libraries is to enable researchers to more easily explore related work. While citation data is often used as a
Cornell - VIVO - 24452
Training Linear SVMs in Linear TimeThorsten JoachimsDepartment of Computer Science Cornell University Ithaca, NY, USAtj@cs.cornell.eduABSTRACTLinear Support Vector Machines (SVMs) have become one of the most prominent machine learning techniqu
Cornell - VIVO - 24452
Minimally Invasive Randomization for Collecting Unbiased Preferences from Clickthrough LogsFilip Radlinski and Thorsten JoachimsDepartment of Computer Science Cornell University, Ithaca, NY {lip,tj}@cs.cornell.eduAbstractClickthrough data is a p
Cornell - VIVO - 24452
Identifying Temporal Patterns and Key Players in Document CollectionsBenyah Shaparenko Rich Caruana Johannes Gehrke Thorsten Joachims Department of Computer Science Cornell University Ithaca, NY 14853 {benyah, caruana, johannes, tj}@cs.cornell.edu
Cornell - VIVO - 24452
Error Bounds for Correlation ClusteringThorsten Joachims tj@cs.cornell.edu Cornell University, Dept. of Computer Science, 4153 Upson Hall, Ithaca, NY 14853 USA John Hopcroft jeh@cs.cornell.edu Cornell University, Dept. of Computer Science, 5144 Ups
Cornell - VIVO - 24452
Evaluating the Robustness of Learning from Implicit FeedbackFilip Radlinski Department of Computer Science, Cornell University, Ithaca, NY 14853 USA Thorsten Joachims Department of Computer Science, Cornell University, Ithaca, NY 14853 USAfilip@c
Cornell - VIVO - 24452
Unstructuring User Preferences: Ecient Non-Parametric Utility RevelationCarmel Domshlak Fac. of Industrial Engineering & Management Technion - Israel Institute of Technology Haifa, Israel 32000Thorsten Joachims Computer Science Dept. Cornell Univ
Cornell - VIVO - 24452
Learning to Align Sequences: A Maximum-Margin ApproachThorsten Joachims Tamara Galor Ron Elber Department of Computer Science Cornell University Ithaca, NY 14853 {tj,galor,ron}@cs.cornell.edu June 24, 2005Abstract We propose a discriminative method
Cornell - VIVO - 24452
Eye-Tracking Analysis of User Behavior in WWW SearchLaura A. GrankaCornell University Human-Computer Interaction GroupThorsten JoachimsCornell University Department of Computer ScienceGeri GayCornell University Human-Computer Interaction Grou
Cornell - VIVO - 24452
KDD-Cup 2004: Results and AnalysisRich CaruanaCornell University Dept. of Computer Science Ithaca, NY, USThorsten JoachimsCornell University Dept. of Computer Science Ithaca, NY, USLars BackstromCornell University Dept. of Computer Science It
Cornell - VIVO - 24452
Learning a Distance Metric from Relative ComparisonsMatthew Schultz and Thorsten Joachims Department of Computer Science Cornell University Ithaca, NY 14853 schultz,tj @cs.cornell.eduAbstractThis paper presents a method for learning a distance m
Cornell - VIVO - 24452
Transductive Learning via Spectral Graph PartitioningThorsten Joachims tj@cs.cornell.edu Cornell University, Department of Computer Science, Upson Hall 4153, Ithaca, NY 14853 USAAbstractWe present a new method for transductive learning, which ca
Cornell - VIVO - 24452
Evaluating Retrieval Performance using Clickthrough DataThorsten Joachims Cornell University Department of Computer Science Ithaca, NY 14853 USA tj@cs.cornell.eduAbstract This paper proposes a new method for evaluating the quality of retrieval func
Cornell - VIVO - 24452
A Statistical Learning Model of Text Classication for Support Vector MachinesThorsten JoachimsGMD Forschungszentrum IT, AIS.KD Schloss Birlinghoven, 53754 Sankt Augustin, GermanyThorsten.Joachims@gmd.de ABSTRACT
Cornell - VIVO - 24452
Cornell - VIVO - 24452
Estimating the Generalization Performance of an SVM E cientlyThorsten JoachimsInformatik LS VIII, Universitat Dortmund, Baroper Str. 301, 44221 Dortmund, Germanyjoachims@ls8.informatik.uni-dortmund.deThis paper proposes and analyzes an e cient a
Cornell - VIVO - 24452
11Making Large-Scale SVM Learning PracticalThorsten Joachims Universitat Dortmund, Informatik, AI-Unit Thorsten Joachims@cs.uni-dortmund.de http: www-ai.cs.uni-dortmund.de PERSONAL joachims.html To be published in: 'Advances in Kernel Methods - S
Cornell - VIVO - 24452
Combining statistical learning with a knowledge based approach | A case study in intensive care monitoringKatharina Morik and Peter Brockhausen and Thorsten Joachimsfmorik,brockhausen,joachimsg@ls8.cs.uni-dortmund.deUniversitat Dortmund, LS VIII 4
Cornell - VIVO - 24452
Text Categorization with Support Vector Machines: Learning with Many Relevant FeaturesThorsten JoachimsUniversitat Dortmund Informatik LS8, Baroper Str. 301 44221 Dortmund, GermanyAbstract. This paper explores the use of Support Vector Machines
Cornell - VIVO - 24452
UNIVERSITAT DORTMUNDFachbereich Informatik Lehrstuhl VIII Kunstliche IntelligenzMaking Large-Scale SVM Learning PracticalLS 8 Report 24Thorsten JoachimsDortmund, 15. June, 1998Universitat Dortmund Fachbereich InformatikUniversity of Dortmu
Cornell - VIVO - 24452
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text CategorizationThorsten JoachimsUniversitat Dortmund, Fachbereich Informatik, Lehrstuhl 8 Baroper Str. 301 44221 Dortmund, Germany thorsten@ls8.informatik.uni-dortmund.deAbstrac
Cornell - VIVO - 24452
UNIVERSITAT DORTMUNDFachbereich Informatik Lehrstuhl VIII Kunstliche IntelligenzText Categorization with Support Vector Machines: Learning with Many Relevant FeaturesLS 8 Report 23Thorsten JoachimsDortmund, 27. November, 1997 Revised: 19. Apri
Cornell - VIVO - 24452
DiplomarbeitEinsatz eines intelligenten, lernenden Agenten fr das World Wide WebThorsten JoachimsDiplomarbeit am Fachbereich Informatik der Universitt Dortmund4. Dezember 1996Betreuer: Prof. Dr. Katharina Morik Prof. Dr. Norbert FuhrZusamm
Cornell - VIVO - 24452
WebWatcher: Machine Learning and HypertextThorsten Joachims, Tom Mitchell, Dayne Freitag, and Robert ArmstrongSchool of Computer Science Carnegie Mellon University May 29, 1995This paper describes the rst implementation of WebWatcher, a Learning
Cornell - VIVO - 24707
July 24, 200723:27WSPC - Proceedings Trim Size: 9.75in x 6.5inpaper1A conservative parametric approach to motif signicance analysisUri Keich, Patrick Ng Department of Computer Science, Cornell University, Ithaca, NY, USA We suggest a novel
Cornell - VIVO - 24707
BIOINFORMATICSVol. 22 no. 14 2006, pages e393e401 doi:10.1093/bioinformatics/btl245Apples to apples: improving the performance of motif nders and their signicance analysis in the Twilight ZonePatrick Ng1, Niranjan Nagarajan1, Neil Jones2 and Uri
Cornell - VIVO - 24707
Rening motif nders with E-value calculationsNiranjan Nagarajan, Patrick Ng, Uri Keich Department of Computer Science, Cornell University, Ithaca, NY, USAAbstract Motif nders are an important tool for searching for regulatory elements in DNA. Popula
Cornell - VIVO - 24707
A Fast and Numerically Robust Method for Exact Multinomial Goodness-of-Fit TestUri KEICH and Niranjan NAGARAJANEvaluating the signicance of goodness-of-ts tests for multinomial data in general, and estimating the p value of the log-likelihood ratio
Cornell - VIVO - 24707
BIOINFORMATICSVol. 21 Suppl. 1 2005, pages i311i318 doi:10.1093/bioinformatics/bti1044Computing the P -value of the information content from an alignment of multiple sequencesNiranjan Nagarajan1 , Neil Jones2 and Uri Keich1,Science Department,
Cornell - VIVO - 24707
A Faster Reliable Algorithm to Estimate the p-Value of the Multinomial llr StatisticUri Keich and Niranjan NagarajanDepartment of Computer Science, Cornell University, Ithaca, NY-14850, USA {keich,niranjan}@cs.cornell.eduAbstract. The subject of