the-byz-generals - 12. The Byzantine Generals DANNY DOLEV,...

Info iconThis preview shows pages 1–12. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 6
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 8
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 10
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 12
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 12. The Byzantine Generals DANNY DOLEV, LESLIE LAMPORT, MARSHALL PEASE, and ROBERT SHOSTAK ABSTRACT Reliable computer systems must handle malfunctioning components that give conflicting information to different parts of the system. This situation can be expressed abstractly in terms of a group of generals of the Byzantine amiy camped with their troops around an enemy city. Communicating only by messenger, the generals must agree upon a common battle plan. How ever, one or more of them may be traitors who will try to confuse the others. The problem is to find an algorithm to ensure that the loyal generals will reach agreement. It is shown that, using only oral messages, this problem is solvable if and only if more than two—thirds of the generals are loyal, so a single traitor can confound two loyal generals. With unforgeable written messages, the problem is solvable for any number of generals and possible traitors. The solution for a general distributed system requires connectivity of more than twice the number of traitors, while in the case of unforgeable written messages. connectivity larger than the number of traitors suffices. Applications of the solutions to reliable computer systems are then dis— cussed. 1. INTRODUCTION A reliable computer system must be able to cope with the failure of one or more of its components. A failed component may exhibit a type of behavior that is often overlooked—namely, sending conflicting information to different This work was supported in part by the National Aeronautics and Space Administration undo contract number NASl-l5428 Mod. 3, and the Ballistic Missile Defense Systems Command undi- contract number DASGfiO—787C70046. and the Army Research Office under contract number DAAGZg-TQ-Ceflllll 348 THE BYZANTINE GENERALS 349 i 'L of the system. The problem of coping with this type of failure is expressed I ctly as the Byzantine Generals Problem. We devote the major part of the let to a discussion of this abstract problem, and conclude by indicating how “solutions can be used in implementing a reliable computer system. .' ._2 we imagine that several divisions of the Byzantine Army are camped outside alchemy city, each division commanded by its own general. The generals can ._'¢ommunicate with one another only by messenger. After observing the enemy, they must decide upon a common plan of action. However, some of the generals may be traitors, trying to prevent loyal generals from reaching agreement. The generals must have an algorithm to guarantee that: CONDITION A. All loyal generals decide upon the same plan of .action. The loyal generals will all do what the algorithm says they should, but the traitors may do anything they wish. The algorithm must guarantee Condition A regardless of what the traitors do. The loyal generals should not only reach agreement, but should agree upon a reasonable plan. We therefore also want to insure that: CONDITION B. A small number of traitors cannot cause the loyal generals to adopt a bad plan. Condition B is hard to formalize, since it requires saying precisely what a ind plan is, and we will not attempt to do so. Instead, we consider how the generals reach a decision. Each general observes the enemy and communicates his observations to the others. Let 12(1‘) be the information communicated by the ith general. Each general uses some method for combining the values r)( l ), . , v(n) into a single plan of action, where n is the number of generals. Condition A is achieved by having all generals use the same method for com— bining the information, and Condition B is achieved by using a robust method. For example, if the only decision to be made is whether to attack or retreat, then tr(i) can be General i’s opinion of which option is best, and the final decision can be based upon a majority vote among them. A small number of traitors can affect the decision only if the loyal generals were almost equally divided between the two possibilities, in which case neither decision could be called bad. While this approach may not be the only way to satisfy Conditions A and B. .t is the only one that we know of. It assumes a method by which the generals communicate their values 0(1') to one another. The obvious method is for the .'lh general to send 11(1' ) by messenger to each other general. However, this does not work because satisfying Condition A requires that every loyal general obtain the same values 12(1), . . . , out), and a traitorous general may send different 350 CONCURRENCY CONTROL AND RELIABILITY IN DISTRIBUTED SYSTEMS values to different generals. For Condition A to be satisfied, the following must be tme. ' CONDI'I'ION l. Every loyal general must obtain the same information of!) , Mn). ‘ Condition | implies that a general cannot necessarily use :1 value of (it!) obtained directly from the ith general, since a traitorous r'th general may send different values to different generals. This means that, unless WE: are careful, in meeting Condition 1 we might introduce the possibility that the generals Lise a value of vtr‘) different from the one sent by the ith general—~even though the 1th general is loyal. We must not allow this to happen if Condition B is to he met. For example, We cannot permit a few traitors to cause the loyal generals to base their decision upon the values "retreat", . . . , “retreat” ifevery loyal general sent the value “attack.” We therefore have the following requirement or each i: ! CONDITION 2. If the ith general is loyal, then the value that he sends must be used by every loyal general as the value of v(i ). We can rewrite Condition 1 as the condition that, for every 1' (whether or not the rth general is loyal): CONDITION 1'. Any two loyal generals use the same value of o(i ). Conditions 1’ and 2 are both conditions on the single value sent by the ith general. We can therefore restrict our consideration to the problem of how a srngle general sends his value to the others. We phrase this in terms of a com; manding general sending an order to his lieutenants, obtaining the following problem. -Byzanrr'ne Generals Problem: A commanding general must send an order to his 11 — l lieutenant generals such that: CONDITION lCl. All loyal lieutenants obey the same order. CONDITION 1C2. If the commanding general is loyal, then every loyal lieutenant obeys the order he sends. Conditions lCl and [C2 are called the interactive consistency conditions. Not: that if the commander is loyal, then 1C1 follows from 1C2. However, the com- mander need not be loyal. To solve our original problem, the ith general sends his value of uti) by THE BYZANTlNE GENERALS 351 using a solution to the Byzantine Generals Problem to send the order “use rrti) as my value," with the other generals acting as the lieutenants. 2. IMPOSSIBILITY RESULTS The Byzantine Generals Problem seems deceptively simple. lts difficulty is in- dicated by the surprising fact that, it the generals can send only oral messages. then no solution will work unless more than twoethirds of the generals are loyal. In particular, with only three generals, no solution can work in the presence of a single traitor. An oral message is one whose contents are completely under the control of the sender, so that a traitorous sender can transmit any possible message. Such a message corresponds to the type of message that computers normally send to one another. In Section 4, we will consider signed, written messages, for which this is not true. We new study that, with oral messages, no solution for three generals can handle a single traitor. For simplicity, we consider the case in which the only possible decisions are “attack” or “retreat.” Let us first examine the scenario pictured in Fig. 12.1, in which the commander is loyal and sends an “attack” order, but Lieutenant 2 is a traitor and reports to Lieutenant 1 that he received a “retreat” order. For Condition 1C2 to be satisfied, Lieutenant l must obey the order to attack. Now consider another scenario, shown in Fig. 12.2, in which the commander is a traitor and sends an “attack” order to Lieutenant 1 and a “retreat” order to Lieutenant 2. Lieutenant 1 does not know who the traitor is, and cannot tell what message the commander actually sent to Lieutenant 2. Hence, the scenarios in these two pictures appear exactly the same to Lieutenant 1. If the traitor lies consistently, then there is no way for Lieutenant l to distinguish between these two situations, so he must obey the “attack” order in both of them. Hence, whenever Lieutenant 1 receives an “attack” order from the commander. he must obey it. However, a similar argument shows that if Lieutenant 2 receives a “retreat” order from the commander, then he must obey it even if Lieutenant 1 tells him “he said ‘retreat’” “attack” “muck” Fig. 12.1. Lieutenant 2 a traitor. 352 CONCURRENCY CONTROL AND RELIABILITY IN DISTRIBUTED SYSTEMS Fig. 12.2. The commander a traitor. that the commander said “attack.” Therefore, in the scenario of Fig. 12.2. Lieutenant 2 must obey the “retreat” order while Lieutenant 1 obeys the “at, tack” order, thereby violating Condition 1C 1. Hence, no solution exists for three generals that works in the presence of a single traitor. This argument may appear convincing, but we strongly advise the reader to be very suspicious of such nonrigorous reasoning. Although this result is indeed correct, we have seen equally plausible “proofs” of invalid results. We knew of no area in computer science or mathematics in which informal reasoning is more likely to lead to errors than in the study of this type of algorithm. For a rigorous proof of the impossibility of a threeigcneral solution that can handle a single traitor, we refer the reader to reference 12. Using this result, we can show that no solution with fewer than 3m + i generals can cope with m traitors.* The proof is by contradictioniwe assume such a solution for a group of 3m or fewer generals, and use it to construct a threeigeneral solution to the Byzantine Generals Problem that works with one traitor, which we know to be impossible. To avoid confusion between the two algorithms, we will call the generals of the assumed solution Albanian generals. and those of the constructed solution will be called Byzantine generals. Thus. starting from an algorithm that allows 3m or fewer Albanian generals to cope with m traitors, we will construct a solution that allows three Byzantine generals to handle a single traitor. The thrceegeneral solution is obtained by having each of the Byzantine gen- erals simulate approximately one-third of the Albanian generals, so that each Byzantine general is simulating at most at Albanian generals. The Byzantint commander simulates the Albanian commander plus at most in — l Albanian lieutenants, and each of the two Byzantine lieutenants simulates at most m Albanian lieutenants. Since only one Byzantine general can be a traitor, and hi: simulates at most in Albanians, at most, m of the Albanian generals are traitors. Hence, the assumed solution guarantees that 1C] and [C2 hold for the Albanian . *More precisely. no such solution exists for three or more generals. since the problem is trivial two generals. THE BYZANTINE GENERALS 353 generals. By ICl, all the Albanian lieutenants being simulated byra loyal By- zantine lieutenant obey the same order, which is the order he is to obey. It is easy to check that Conditions lCl and 1C2 of the Albanian generals solution imply the corresponding conditions for the Byzantine generals, so we have constructed the required impossible solution. One might think that the difficulty in solving the Byzantine Generals Problem stems from the requirement of reaching exact agreement. We now demonstrate that this is not the case by showing that reaching approximate agreement is just as hard as reaching exact agreement. Let us assume that instead of trying to agree on a precise battle plan, the generals must agree only upon an approximate time of attack. More precisely, we assume that the commander orders the time of the attack, and we require the following two conditions to hold: CONDITION 1C1 '. All loyal lieutenants attack within ten minutes of one an- other. CONDl'l‘ION ICZ'. If the commanding general is loyal, then every loyal lieui tenant attacks within ten minutes of the time given in the commander’s order. (We assume that the orders are given and processed the day before the attack, and the time at which an order is received is irrelevant—only the attack time given in the order matters.) Like the Byzantine Generals Problem, this problem is unsolvable unless more than two-thirds of the generals are loyal. We prove this by first showing that if there were a solution for three generals that coped with one traitor, then we could construct a three-general solution to the Byzantine Generals Problem that also worked in the presence of one traitor. Suppose the commander wishes to send an “attack” or “retreat” order. He orders an attack by sending an attack time of l :00, and orders a retreat by sending an retreat time of 2 r 00, using the assumed algorithm. Each lieutenant uses the following procedure to obtain his order. 1 . After receiving the attack time from the commander. a lieutenant does one of the following: If the time is 1: 10 or earlier, then attack. If the time is l :50 or later, then retreat. Otherwise, continue to Step 2. 2. Ask the other lieutenant what decision he reached in Step 1. 1f the other lieutenant reached a decision, then make the same decision he did. Otherwise, retreat. 354 CONCURRENCY CONTROL AND RELIABILITY 1N DISTRIBUTED SYSTEMS It l‘ollon's from IC2’ that. if the commander is loyal, then a loyal lieutenant will obtain the correct order in Step 1, so 1C2 is satisfied. If the commander is loyal. then 1C1 follows from 1C2, so we need only prove 1C1 under the as- sumption that the commander is a traitor. Since there is at most one traitor, this means that both lieutenants are loyal. If follows from ICl ' that, if one lieutenant decided to attack in Step I. then the other cannot decide to retreat in Step 1. Hence, they will both either come to the same decision in Step 1, or at least one of them will defer his decision until Step 2. In this case, it is easy to see that they both arrive at the same decision, so 1C1 is satisfied. We have therefore constructed a three-general solution to the Byzantine Generals Problem that handles one traitor, which is impossible. Hence, we cannot have a three-general algorithm that maintains lCl’ and lC2' in the presence of a traitor. The method of having one general simulate m others can now be used to prove that no solution with fewer than 3m + l generals can cope with m traitors. The proof is similar to the one for the original Byzantine Generals Problem, and is left to the reader. 3. A SOLUTION WITH ORAL MESSAGES ,. We have shown above that, for a solution to the Byzantine Generals Problem using oral messages to cope with m traitors. there must be at least 3m + 1 generals. We now give a solution that works for 3m + 1 or more generals. However, We first specify exactly what we mean by “oral messages.“ Each general is supposed to execute some algorithm that involves sending messages to the other generals, and we assume that a loyal general correctly executes his algorithm. The definition of an oral message is embodied in the following as— sumptions which we make for the generals“ message system. A1. Every message that is sent is delivered correctly. A2. The receiver of a message knows who sent it. A3. The absence of a message can be detected. Assumptions A1 and A2 prevent a traitor from interfering with the commuv nication between two other generals, since by A1 he cannot interfere with the messages they do send, and by A2 he cannot confuse their intercourse by intro- ducing spurious messages. Assumption A3 will foil a traitor who tries to prevent a decision by simply not sending messages. The practical implementation of these assumptions is discussed in Section 6. Note that assumptions Al—A3 do not imply that a general hears any message sent between two other generals. The algorithms in this section and in the following one require that each general be able to send messages directly to every other general. In Section 5, we describe algorithms which do not have this requirement. THE BYZANTINE GENERALS 355 A traitorous commander may decide not to send any order. Since the lieuten— ants must obey some order. they need some default order to obey in this case. We let RETREAT be this default order. We inductively define the Oral Message algorithms as 0M(mt for all not» negative integers m, by which a commander sends an order to n — l lieutenants. We will show that OM (m) solves the Byzantine Generals Problem for Bar + l or more generals in the presence of at most In traitors. We will find it more convenient to describe this algorithm in terms of the lieutenants “obtaining a value” rather than “obeying an order.” ‘ ‘ I The algorithm assumes a function majority With the property that, if a majority of the values tr, equal D, then majority (oh - l - , on, 1) equals 12. (Actually, it assumes a sequence of such functionsuone for each n.) There are two natural choices for the value of rrtajor‘iry (in, ' - ' , on r , )r 1. The majority value among the v,- if it exists, otherwise the value RE- TREAT. 2. The median of the vi, assuming that they come from an ordered set. The following algorithm requires only the aforementioned property of majority. Algorithm 0M (0): 1. The commander sends his Value to every lieutenant. 2. Each lieutenant uses the value he receives from the commander, or uses the value RETREAT if he receives no value. Algorithm OM(m), m > 0: l. The commander sends his value to every lieutenant. 2. For each i, let t},- be the value Lieutenant i receives from the commander. or else be RETREAT if he receives no value. Lieutenant i acts as the commander in algorithm OMtn-r m l) to send the value r»; to each of the n A 2 other lieutenants. - 3. For each i, and eachj i i. let 191- be the value Lieutenant i received from Lieutenant j in Step 2 (using Algorithm 0M(m A 1)). or else RETREAT if he received no such value. Lieutenant 1’ uses the value majority (01, ' - ' , on“). To execute 3. every processor must know when to apply the majority function, in other words, when to stop Waiting for more values to come. To do this, one can use some sort of time-out technique, as we will discuss in Section 6. Note that recently, Fischer, Lynch, and Paterson8 proved that there is no. way to reach any agreement unless we assume some bound on the time at which a reliable processor responds. 356 .CONCURRENCY CONTROL AND RELIABlLITY IN DISTRIBUTED SYSTEMS Fig. 12.3. Algorithm OMll liLieutenant 3 a traitor. To understand how Algorithm OM(m) works, we consider the case at 2 1. it = 4. Figure 12.3 illustrates the messages received by Lieutenant 2 when the commander sends the value v and Lieutenant 3 is a traitor. In the first Step of OM( l ), the commander sends v to all three lieutenants. In the second Step. Lieutenant 1 sends the value v to Lieutenant 2, using the trivial algorithm OM(O). Also in the second Step, the traitorous Lieutenant 3 sends Lieutenant 2 some other value x. In Step 3, Lieutenant 2 then has it. : {’3 = it, and U3 = x. so he obtains the correct value I) = majority (12, it, x). Next, we see what happens if the commander is a traitor. Figure 12.4 shows the values received by the lieutenants if a traitorous commander sends three arbitrary values it, y, and z to the three lieutenants. Each lieutenant obtains V. = x, V : y, and V3 = ,3, so they all obtain the same value majority (x, y. a) in Step 3, regardless of whether or not any of the three values x, y, and z are equaL The recursive algorithm OM(m) invokes n — 1 separate executions of the algorithm OM(m — 1), each of which invokes n — 2 executions of OM(m * 2), etc. This means that for m > 1, a lieutenant sends many separate messages to each other lieutenant. There must be some way to distinguish between these different messages. The reader can verify that all ambiguity is removed if each COMMANDER Fig. 12.4. The algorithm OMil leThe commander a traitor. THE BYZANTINE GENERALS 357 llClllCnflni t' prefixes the number 1' to the value l'i that he sends in Step 2. As the recursion “unfolds,” the algorithm OM(m * k} will be called (it — l). , . . (n — k) times to send a value prefixed by a sequence of A- lieutenant's numbers. This implies that. the algorithm requires sending an exponential number .ti‘ messages. There exist algorithms which require only it polynomial number of messages,4 but they are substantially more complex than the one we present. To prove the correctness of the algorithm OM(m) for arbitrary or. we first prove the following lemma. Lemma 1: For any in and k, Algorithm OM(m) satisfies Condition 1C2 if there are more than 2k + m generals, and at most It traitors. Proof: The proof is by induction on m. Condition 1C2 only specifies what must happen if the commander is loyal. Using Al, it is easy to see the trivial algorithm OM(O) works if the commander is loyal, so the lemma is true in : t). We now assume it is true form — l, m > 0, and prove it form. In Step 1, the loyal commander sends a value I) to all n —— l lieutenants. In Step 2, each loyal lieutenant applies M(m a l) with n - 1 generals. Since by hypothesis H > 2k + m, we have n - l > 2k + (m - 1), so we can apply the induction hypothesis to conclude that every loyal lieutenant gets f/‘J- = v for each loyal Lieutenant j. Since there are at most k traitors. and n — l > 2k + int — l) 2 2k, a majority ofthe rt — l lieutenants are loyal. Hence. each loyal lieutenant has majority (in, - - - , 12,, ,, l) = it in Step 3. proving 1C2. The following theorem asserts that Algorithm OM(m) solves the Byzantine Generals Problem. Theorem 1: For any in, Algorithm OM(m) satisfies Conditions {Cl and 1C2 if there are more than 3m generals, and at most at traitors. Proof: The proof is by induction on m. If there are no traitors. then it is easy to see that OM(O) satisfies 1C1 and 1C2. We therefore assume that the theorem is true for OM(m — l) and prove it for OM(m), m > 0. We first consider the case in which the commander is loyal. By taking 1: equal to m in Lemma 1, we see that OM(m) satisfies 1C2. Condition 1C1 follows from 1C2 if the commander is loyal, so we need only verify 1C1 in the case that the commander is a traitor. There are at mest m traitors and the commander is one of them. so at most "I — l of the lieutenants are traitors. Since there are more than 3m generals. there are more than 3m - l lieutenants, and 3m — l > 3(m * 1). We may therefore apply the induction hypothesis to conclude that OM (m — 1) satisfies conditions ICI and 1C2. Hence. For each j, any two loyal lieutenants get the 358 CONCURRENCY CONTROL AND RELIABILITY IN DISTRIBUTED SYSTEMS same value for n,- in Step 3. (This follows from 1C2 if one of the two lieutenants is Lieutenant j, and from 1C1 otherwise.) Hence, any two loyal lieutenants get the same vector of values 1),, - - - , one I, and therefore obtain the same value majority (v1, - - - , it,,,,) in Step 3, proving 1C1. 4. A SOLUTION WITH SIGNED MESSAGES As we saw from the scenario of Fig. 12.1 and 12.2, it is the traitors’ ability to lie that makes the Byzantine Generals Problem so difficult. The problem be- comes easier to solve if we can restrict that ability. One way to do this is to allow the generals to send unforgeable signed messages. More precisely, we add to Al—A3 the following assumption. A4.(a) A loyal general‘s signature cannot be forged, and any alteration of the contents of his signed messages can be detected. (b) Anyone can verify the authenticity of a general’s signature. Note that we make no assumptions about a traitorous general‘s signature. In particular, we allow his signature to be forged by another traitor, thereby per- mitting collusion among the traitors. Having introduced signed messages, our previous argument that four generals are required to cope with one traitor no longer holds. In fact, a three—general solution does exist. We now give an algorithm that copes with m traitors for any number of generals. (The problem is vacuous if there are fewer than in + 2 generals.) In our algorithm, the commander sends a signed order to each of his lieuten~ ants. Each lieutenant then adds his signature to that order and sends it to the other lieutentants, who add their signatures and send it to others, and so on. This means that a lieutenant must effectively receive one signed message, make several copies of it, and sign and send these copies. It does not matter how these copies are obtainedia single message might be photocopied, or else each message might consist of a stack of identical messages which are signed and distributed as required. ' Our algorithm uses a function choice, which is applied to a set of orders to obtain a single one. It is defined as follows: If the set Vconsists of the single element it, then choice (V) = 1), otherwise choice (V) = RETREAT In the following algorithm, we let x:t' denote the value x signed by general 1’. Thus, v:j:t‘ denotes the value it signed by j, and then that value Uij signed THE BYZANTINE GENERALS 359 by t'. We let general 0 be the commander. In this algorithm, each lieutenant i nittintains a set l/,-, containing the set of properly signed orders he has received at) far. (If the commander is loyal, then this set should never contain more than .2 single element.) Do not confuse V,, the set of orders he has received, with the set of messages that he has received. There may be many different messages with the same order. We assume the existence of a bound on the time it takes correct processors to sign and relay 3 message. Thus. it implies the existence of some phases such that, if a message with r signatures arrives after phase r, then only faulty processors relayed it, so it can be ignored. This assumption tloes not necessarily mean complete synchronization of the processors. Algorithm SM(m) initially V,- = o. l. The commander signs and sends his value to every lieutenant at phase 0. 2. For each i: A. If Lieutenant 1' receives a message ofthe form 1’20 from the commander at phase 0. and he has not yet received any order, then: (i) He lets V, equal {v}. (ii) He sends the message 11:021' to every other lieutenant. B. If Lieutenant 1' receives a message of the form 11:0:j. : - - - :j, at k, l S k $ at, V,- contains at most one value, U is not in the set V,, and the signatures belong to the different lieutenants, then: (i) He adds I' to V,-.- (ii) If k < m, then he sends the message 0:0:j1: - ' wjkzi to every lieutenant other thanjl, ' - ' ,jt. 3. For each 1': At the end of phase in he obeys the order choice (K). Observe that the algorithm requires in + 1 phases of message exchange. Note that in Step 2. Lieutenant i ignores any message containing an order 1’ that is already in the set V,, and accepts at most two different orders originated by the commander. Moreover, Lieutenant t' ignores any messages that do not have the proper form ofa value followed by a string of dili'crent signatures. If packets of identical messages are used to avoid having to copy messages, this means that-he throws away any packet that does not consist of a sullicicnt number of identical. prop~ crly signed messages. (There should be (it —— k — 2)(n * t't — 3). . . . (It — m h 2) copies of the message if it has been signed by k lieutenants.) Figure 12.5 illustrates algorithm SMU) for the case of three generals. when the commander is a traitor. The commander sends an "attack" order to one lieutenant and a “retreat” order to the other. Both lieutenants receive the two 360 CONCURRENCY CONTROL AND RELIABILITY IN DISTRIBUTED SYSTEMS COMMANDER "macaw" Fig. 12.5. Algorithm SMl‘l lmThe commander a traitor. orders in Step 2, so after step 2 V] = V3 2 {“attaek,” “retreat"}, and they both obey the order choice ({“attack,” “retreat”}). Observe that here, unlike the situation in Fig. 12.2, the lieutenants know the commander is a traitor because his signature appears on two different orders, and A4 states that only he could have generated those signatures. In algorithm SM(m), a lieutenant signs his name to acknowledge his receipt of an order. If he is the mth lieutenant to add his signature to the order, then that signature is not relayed to anyone else by its recipient, so it is superfluous. (More precisely, assumption A2 makes it unnecessary.) In particular, the lieu- tenants need not sign their messages in SMG). We now prove the correctness of our algorithm. Theorem 2: For any in: Algorithm SM( m) solves the Byzantine Generals Problem, if there are at most In traitors. Proof} We first prove 1C2. If the commander is loyal, then he sends his signed order 0:0 to every lieutenant in Step l. Every loyal lieutenant will therefore receive the order v on time in Step 2A. Moreover, since no traitorous lieutenant can forge any other message of the form L“ :0, a loyal lieutenant can receive no additional order in Step 28. Hence, for each loyal lieutenant r', the set V,- obtained in Step 2 consists of the single order 1:, which he will obey in Step 3 by property 1 of the choice function. This proves 1C2. Since 1C1 f01I0ws from 1C2 if the commander is loyal, to prove lCl we need only consider the case in which the commander is a traitor. Two loyal lieutenants i and j obey the same order in Step 3 if the function choice applied to the sets of orders V, and that they receive in Step 2 induces the same value. Therefore to prove [C] it suffices to prove two parts: one, if a loyal lieutenant i puts exactly one order v into in Step 2, then every loyal lieutenant will put exactly the same order 0 into in Step 2; two, if has two elements for some loyal lieutenantj, then Vk has two elements for any other loyal lieutenant It. To prove the first part, we must show that j receives a properly signed message containing that order. Ifi receives the order U in Step 2A on time, then he sends ' THE BYZANTINE GENERALS 361 it toj in Step 2A(ii), so thatj receives it on time (by Al). If i adds the order to V,- in Step 2B, then he must receive a first message ofthe fonn 11:0:j. : ‘ - - :yk. ll‘j is one of the j,, then by A4 he must already have received the order 1’. If not, we consider two cases: I. k < m: In this case. 1' sends the message rrz0:j, 1- - -:jk:i to}, soj must receive the order 1/. 2. k i m: Since the commander is a traitor. at most at i 1 of the lieutenants are traitors. Hence, at least one of the lieutenants jl, - ' ‘ , j,,, is loyal. This loyal lieutenant must have scntj the value it when he first received it, soj must, therefore, receive that value. Similar arguments prove that if any loyal lieutenant i decides to put two orders in V,. then every other loyal lieutenant will decide to do so. This completes the proof. During the algorithm. every loyal lieutenant relays‘to every other lieutenant at most two orders. Therefore, the total number of messages exchanged is bounded by 2n(n * l), where n is the total number of generals. By usrng more phases and more sophisticated algorithms, one can reduce the total number of ‘7 . ‘ messages to 0(n + m') as shown in reference 5. 5. MISSING COMMUNICATION PATHS Thus far, we have assumed that a general (or lieutenant) can send messages directly to every other general (or lieutenant). We now remove this assumption Instead, we supposed that physical barriers place some restrictions on who can send messages to whom. We consider the generals to form the nodes of a simple,* tinitc, undirected network graph G, where an are between two nodes indicates that those two generals can send messages directly to one another. We now extend algorithms OM (m) and SM ( m ), which assumed G to be completely connected, to more general graphs. I I ~ The commander sends his value through routes in the network. For sunplicrty. assume that every message contains the information about the route through which it is supposed to be delivered. Thus. before sending a message, the Commander chooses a route and sends the message containing the route. The receiving lieutenant, however, does not know in advance the route through which it is going to receive the message. Notice that a traitor may also change the routing through which the message is supposed to be delivered. Moreover. *A simple graph is one in which there is at most one are joining any two nodes. and every .trL connects two distinct nodes. 362 CONCURRENCY CONTROL AND RELIABILITY iN DISTRIBUTED SYSTEMS a traitor may also produce many false copies of the message it is supposed to relay, then send them through various routes of its own choice. A traitor may change the record of the route to prevent the receiving lieutenant identifying it as the source of faulty messages. To ensure the inclusion of trai— tors’ names in the routes, assume that, aftera loyal lieutenant receives a message to relay. he makes sure the lieutenant from which the message has arrived is supposed to relay it to him. Only then does he relay the message to the next lieutenant along the route to the receiving lieutenant. A network has corznecrivt'rv k if. for every pair of nodes, there exists It node- independent paths connecting them. To extend our oral message algorithm OM(m), we need the following defi- nition, where two generals are said to be neighbors if they are joined by an arc. Definition: Let {an - - - , or} be the set of copies of the commander’s value received by Lieutenant i. Let U,- be a set of lieutenants that does not contain the commander himself. A set U!- is called a set of suspicious lieutenants detemiined by lieutenant i if every message aj that did not pass through lieutenants in Ui carries the same value. Algorithm Purifying (m, at, ' ‘ ' . (If, i) 1. If a set U, of up to m suspicious generals exists, then the purified value is the value of the messages that did not pass thorugh Ui. If no message is left, the value is RETREAT. 2. If there is no set Ui- of cardinality up to m, then the purified vlaue is RETREAT. Notice that if more than one set of suspicious generals exists. then there may be many purified values, but because of the way the algorithm will be used, a plurality of possible values will pose no problem. Before proving that the For rifying Algorithm actually does the tight filtration, consider application of the Purifying Algorithm to the netw0rk shown in Fig. l2.6. The network contains 10 generals, and at most 2 traitors. Assume that s and u are the faulty generals. The commander 5 sends the value a to Lieutenants l and 2, and the value b to the other lieutenants. Assume that Lieutenant 1 receives 3’s value through the following paths: :51 :521 :ml :5741 '5851. ' P‘PP‘P" S's-what: THE BYZANTINE GENERALS 363 Fig. 12.6. Ten generals with tWO traitors, s is the commander. The Purifying Algorithm provides the purified value a to Lieutenant 1. by choos- ing {7, 8} as the set of suspicious generals. Similarly, Lieutenant 2 obtains the Value a. But the rest of the network obtain the value b by choosing {1. 2} as the set of suspicious generals. The following theorem proves that, with sufficient connectivity, all of the loyal lieutenants obtain the same value if the commander is loyal. Theorem 3: Let G be a network of generals which contains at most I traitors, and the connectivity of which is at least 2m + 1. if a loyal commander sends 2m + 1 copies of its value to every lieutenant, through disjoint paths. then, by use of the Purifying Algorithm, every loyal lieutenant can obtain the com- mander’s value. Proof: The ioyal commander sends every lieutenant 2m + 1 copies of a Value. through disioint paths. It sends the same value to all lieutenants. Let a], 364 CONCURRENCY CONTROL AND RELIABILITY 1N DISTRIBUTED SYSTEMS ' . a, be the set of all of the copies of the commander’s value that Lieutenant 1' receives. There are at most in traitors; therefore, at most in values might be lost. This implies that the number of copies, r, is at least in + 1. At least in + l of the messages are relayed through routes which contain only loyal generals; each one of the loyal lieutenants relays the message faithfully without changing it. This implies that at least in + l of the received copies carry the original value. Note that, if the commander were a traitor, then the above reasoning would fail to hold. It may be that the number of copies received is much more than in + l, and even that the majority of them carry a faulty value. The task of Lieutenant i is to find the correct value out of this mess. It does this by applying the Purifying Algorithm. Observe that the technique, described at the beginning of the Section, of adding the names of the generals along the route to the message, enables i to dilierentiate among the values. Every message which passed through traitors contains at least one name of a traitor; more precisely, every list of generals added to a message contains at least the name of the last traitor that relayed it. Step 1 of the Purifying Algorithm requires one to look for a set U; of up to m generals with the property that all of the values which have not been relayed by generals from this set are the same. The network contains at most in traitor generals, and by assumption, the commander is loyal. Therefore, Lieutenant i should be able to find such a set U,-; it may be that the set he finds is not exactly the set of traitors, but U, necessarily eliminates the wrong values. The set U, cannot eliminate the correct values, because there are at least in + 1 independent copies of them and U; can eliminate at most In independent copies. This come pletes the proof of the theorem. In the case where the commander is a traitor, Theorem 3 does not ensure the ability to reach a unique agreement on a value. But the way we will use it in algorithm OM(m) will overcome the faultiness of the commander. To obtain Byzantine Agreement in a network with connectivity k, k 2 2m + l, we improve algorithm OM(m) as follows: whenever a general sends a message to another, he sends it through 2141 + l disjoint paths; whenever a lieutenant has to receive a meSsage, he uses the Purifying Algorithm to decide on a purified value. Call the improved algorithm OM’(m). To prove the validity of the algorithm OM' (m), observe that the same general can be used again and again as a relay in the disjoint paths between pairs of generals, even if he was a commander in previous recursions. Moreover, even being a traitor does not matter for the simple reason that the total number of independent paths that would be affected by traitors will never exceed in. Theorem 4. Let G be a network of n generals with connectivity k 2 2m + l, where n 2 2m + i. If there are at most In traitors, then Algorithm OM'(m) (with the above modifications) solves the Byzantine Generals Problem. THE BYZANTINE GENERALS 365 ‘llllllllli -+ Fig. 12.7. T is the set of traitors. Proof. The proof is essentially the same as the proof of Theorem I, using Theorem 3 everywhere to show that, when a loyal lieutenant sends a value. every other loyal lieutenant agrees on it. The fact that we use the whole network to relay the information again and again eliminates any loss of connectivity, and enables us to obtain the desired result. The details are left to the reader. To show that Theorem 4 is the best possible, we prove that the connectivity of 2m + l is necessary for solving the Byzantine General Problem. The case in which the number of traitors is not less than half of the connecr tivity is easier to visualize, and is proved in Lemma 2. Figure 12.7 describes the ease schematically. The basic idea is that, if the traitors are not less than half of the bottleneck, then they can prevent the loyal generals from reaching an agreement by behaving as a filter. Every message that passes from right to left would be changed to carry one value, and every message in the reverse direction would carry another value. This behavior can cause all of the. generals on the right side to agree on a different value from those on the left side. Lemma 2: The Byzantine General Problem cannot be solved in a network of it generals if the number of traitors is not less than half the connectivity of the network. Proof‘ Let G be a network with connectivity k. and let 5., - ' - . s; be a set of generals which disconnect the network into two nonempty parts, GI and Ca. Assume that the subset 5., - - ‘ . s," is the set of traitors, where m 2 k/2. Consider the following cases for the various locations in which the commander can be. Assume the commander s is in the subnetwork GI, and that he sends the value a to all of the lieutenants in the network. The traitors can follow the doctrine: change every message which passes from CI to G3 to carry the value b: and leave every other value as a. Change the messages passing back from G: to Gl to carry the value a. In this situation. every lieutenant in G. can consider 5 to be a loyal general. and thus agree on a. Similarly. the processors S 3,, choose a. But every receiver in G3 cannot considers a traitor. m+I1-"s 366 CONCURRENCY CONTROL AND RELIABILITY IN DISTRIBUTED SYSTEMS They are able to ignore the conflicting values they have received by ignoring either the set 5., ' ' ' . 5,, or 5m“, - - ' , st. On the other hand, they cannot agree on a value, because each of the values can be correct, depending upon what the commander has said and which generals are traitors. Since in 2 k, the lieutenants in (32 will choose [2, in contradiction to 1C2. The case where the commander is in G2 is identical by symmetry. Assume now that the commander is in the set 31, - ' - . 5*. If the commander is loyal and sends the same value a to every lieutenant, then by reasoning, similar to the previous case, the traitors can prevent agreement. If the Gear mander is a traitor, he can send the value a to G, and b to (32. Thus. similarly to the previous case every decision implies violation of 1C2. For a more rigorous proof see reference 3. Our extension of Algorithm 0M(m) requires that the graph G be 2m + 1 connected, which is a rather strong connectivity hypothesis. In contrast, Algou rithm SM(m) is easily extended to allow the weakest possible connectivity hypothesis. Let us first consider how much connectivity is needed for the By— zantine Generals Problem to be solvable. 1C2 requires that a loyal lieutenant obey a loyal commander. This is clearly impossible if the commander cannot communicate with the lieutenant. In particular, if every message from the com- mander to the lieutenant must be relayed by traitors, then there is no way to guarantee that the lieutenant gets the commander’s order. Similarly, ICl cannot be guaranteed if there are two lieutenants who can only communicate with one another via traitorous intemtediaries. The weakest connectivity hypothesis for which the Byzantine Generals Prob— lem is solvable is that the subnetwork formed by the loyal generals be connected. We will show that, under this hypothesis, the algorithm SM(n e 2) is a so- lution, where n is the number of generalsmregardless of the number of traitors. Of course, we must modify the algorithm so that generals only send messages to where they can be sent. More precisely, in Step 1, the commander sends his signed order only to his neighboring lieutenants; and in Step 2B, lieutenant i only sends the message to every neighboring lieutenant not among the j,. We prove the following more general results, where the diameter of a network is the smallest number :1 such that any two nodes are connected by a path containing at most d ares. Theorem 5: For any in and d, if there are at most in traitors and the network of loyal generals has diameter d, then Algorithm SM(m + d e l) (with the above modification) solves the Byzantine Generals Problem. Proof: The proof is quite similar to that of Theorem 2, and will just be sketched. To prove 1C2, observe that, by hypothesis, there is a path from the THE BYZANTINE GENERALS 387 loyal commander to a lieutenant i going through d ~— l or fewer loyal lieutenants. Those lieutenants will correctly relay the order until it reaches i. As before, assumption A4 prevents a traitor from forging a different order. To prove 1C1, we assume that the commander is a traitor and must show that all loyal lieutenants have received a unique order, or every one decides on RETREAT. The idea is exactly as above. Suppose r' receives an order I'ZOIjl‘. - :jk not signed by j. If k < tn, then i will send it to every neighbor who has not already received that order, and it will be relayed toj within (1' a l more steps. If k 2 m, then one of the first in signers must be loyal. and must have sent it to all of his neighbors, whereupon it will be relayed by loyal generals and will reachj within at — 1 steps. Corollary. If the network of loyal generals is connected, then SM(n - 2) (as modified above) solves the Byzantine Generals algorithm for n generals. Proof: Let (1’ be the diameter of the network of loyal generals. Since the diameter of a connected graph is less than the number of nodes, there must be more than (I loyal generals, and fewer than n i d traitors. The result follows from the theorem by letting in : n i d i l. Theorem 5 assumes that the subnetwork of loyal generals is connected. Its proof is easily extended to show that, even if this is not the case, if there are at most in traitors, then the algorithm SM(m + d — 1) has the following two properties: 1) Any two loyal generals connected by a path of length at most if passing through only loyal generals will obey the same order; and 2) if the commander is loyal, then any loyal lieutenant connected to him by a path of length at most or + of passing only through loyal generals will obey his order. 6. RELIABLE SYSTEMS Other than using intrinsically reliable circuit components, the only way we know for implementing a reliable computer system is to use several different “pro- cessors” to compute the same result, and perform a majority vote on their outputs to obtain a single value. (The voting may be performed within the system, or externally by the users of the output.) This is true whether one is implementing a reliable computer using redundant circuitry to protect against the failure of individual chips, or a ballistic missile defense system using re- dundant cemputing sites to protect against the destruction of individual sites by a nuclear attack. The only difference is in the size of the replicated “processor.” The use of majority voting to achieve reliability is based upon the assumption that all the nonfaulty processors will produce the same output. This is true so long as they all use the same input. However, any single input datum comes 368 CONCURRENCY CONTROL AND RELIABILITY IN DISTRIBUTED SYSTEMS from a single physical component—cg, from some other circuit in the reliable computer, or from some radar site in the missile defense system—and a ma]- functioning component can give different values to different processors. More- over, different processors can get different values even from a nonfaulty input unit, if they read the value while it is changing. For example, if two precessors read a clock while it is advancing, then one may get the old time and the other the new time. This can only be prevented by synchronizing the reads with the advancing of the clock. In order for majority voting to yield a reliable system, the following two conditions should be satisfied: 1) All nonfaulty, processors must use the same input value (so that they produce the same output): and 2) if the input unit is nonfaulty, then all nonfaulty processes use the value it provides as input (so that they produce the correct output). These are just our interactive consistency conditions 1C1 and 1C2, where the “commander” is the unit generating the input, the “lieutenants” are the pro— cessors. and “loyal” means nonfaulty. It is tempting to try to circumvent the problem with a “hardware” Solution. For example, one might try to insure that all processors obtain the same input value by having them all read it from the same wire. However, a faulty input unit could send a marginal signal along the wire—a Signal that can be interpreted by some processors as a 0 and by others as a 1. There is no way to guarantee that ditterent proecssors will get the same value from a possibly faulty input device except by having the processors communicate among themselves to solve the Byzantine Generals Problem. Of course, a faulty input device may provide meaningless input values. All that a Byzantine Generals Solution can do is guarantee that all processors use the same input value. If the input is an important one, then there should be several separate input devices providing redundant values. For example, there should be redundant radars as well as redundant processing sites in a missile defense system. However, redundant inputs cannot achieve reliability; it is still necessary to insure that the nonfaulty processors use the redundant data to pro« duce the same output. In case the input device is nonfaulty but gives different values because it is read while its value is changing, we still want the nonfaulty processors to obtain a reasonable input value. It can be shown that if the functions majority and choice are taken to be the median functions, then our algorithms have the prop- erty that the value obtained by the nonfaulty processors lies within the range of values provided by the input unit. Thus, the nonfaulty processors will obtain a reasonable value so long as the input unit produces a reasonable range of values. We have given several solutions, but they have been stated in terms of By— zantine Generals rather than in terms of computing systems. ‘5 [J in ()I ll]. ll. I 1 l]. v THE BYZANTINE GENERALS 369 REFERENCES DeMillo, R. A., N. A. Lynch, and M. Merritt, Cryptographic protocols. in Prov. Milt/1C)” SIGACT Symp. rm Theory of Computing, pp. 383-400, May 1982. Diiiic, W. and M. E. Hellman, New directions in cryitrtgmphr. [FEE Thins. Inform. theory [7122, pp. 6447654, Nov. 1976. [)olev, D. The Byzantine generals strike again, J. Algorithrtis, vol. 3, pp. 14—30, Jan. l982. Dolcv, D., M. Fischer, R. Fowler, N. Lynch, and R. Strong, Efficient Byzantine agreement without authentiCation, Info. and Control, vol. 3, pp. 2577274. 1983. Dolev, D. and R. Reischuk, Bounds on information exchange for Byzantine agreement, JACM, vol. 32. pp. 191A204, lass. ' Dolev, D., R. Reisehuk, and H. R. Strong, ‘Eventual‘ is earlier than ‘imrnediaie,‘ Prot‘. 23rd Annual IEEE Swap. on Foundations of Computer Science, pp. 196203. 1982. Dolcv, D. and H. R. Strong, Authenticated algorithms for Byzantine agreement, SIAM J, on Cmnp., vol. 12, pp. 6567666, I933. Fischer, M., N. Lynch, and M. Paterson. impossibility oi‘distrihuted consensus with one fault} processor, JACM, vol. 32, pp. 374682, 1985. Lmnport. L. and P, M. Melliar—Smith, Si‘ttrhmm'zing Clocks in the Presence othtttt'ts. Tech. Rep, Computer Science Lab, SR1 lntemational, 1984. Larnpon, l.., R. Shostak, and M. Pcase, The Byzantine generals problem, ACM firms. on Programming languages and Si‘srems, vol, 4, pp. 382—40}, July 1982. Lynch, N. arid M. Fischer, A lower bound for the time to assure interactive consistency. Information Processing Letters, vol. 14, pp. 182—186, 1982. Pease, M.. R. Shostak. and L. Lampon. “Reaching Agreement in the Presence of Faults," J, ACMZP’. vol. 2, pp. 228434. Apr. 1980. Rivest, R. L, A. Shamir, and L. Adleman, A method for obtaining digital signatures and public~key cryptosystcms, CACM, vol. 21, pp. 120—126. Feb. 1978. Ci??? I CONCURRENCY CONTROL ' AND RELIABILITY IN DISTRIBUTED SYSTEMS Edited by Bharat K. Bhargava Department of Computer Science Purdue University West Lafayette, Indiana VAN NOSTRAND REINHOLD COMPANY W New York ...
View Full Document

Page1 / 12

the-byz-generals - 12. The Byzantine Generals DANNY DOLEV,...

This preview shows document pages 1 - 12. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online