This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 12. The Byzantine Generals DANNY DOLEV, LESLIE LAMPORT, MARSHALL PEASE,
and ROBERT SHOSTAK ABSTRACT Reliable computer systems must handle malfunctioning components that
give conﬂicting information to different parts of the system. This situation
can be expressed abstractly in terms of a group of generals of the Byzantine
amiy camped with their troops around an enemy city. Communicating only
by messenger, the generals must agree upon a common battle plan. How
ever, one or more of them may be traitors who will try to confuse the others.
The problem is to find an algorithm to ensure that the loyal generals will
reach agreement. It is shown that, using only oral messages, this problem
is solvable if and only if more than two—thirds of the generals are loyal, so
a single traitor can confound two loyal generals. With unforgeable written
messages, the problem is solvable for any number of generals and possible
traitors. The solution for a general distributed system requires connectivity
of more than twice the number of traitors, while in the case of unforgeable
written messages. connectivity larger than the number of traitors suffices.
Applications of the solutions to reliable computer systems are then dis—
cussed. 1. INTRODUCTION A reliable computer system must be able to cope with the failure of one or
more of its components. A failed component may exhibit a type of behavior
that is often overlooked—namely, sending conflicting information to different This work was supported in part by the National Aeronautics and Space Administration undo
contract number NASll5428 Mod. 3, and the Ballistic Missile Defense Systems Command undi
contract number DASGﬁO—787C70046. and the Army Research Ofﬁce under contract number
DAAGZgTQCeﬂllll 348 THE BYZANTINE GENERALS 349 i 'L of the system. The problem of coping with this type of failure is expressed
I ctly as the Byzantine Generals Problem. We devote the major part of the
let to a discussion of this abstract problem, and conclude by indicating how
“solutions can be used in implementing a reliable computer system. .' ._2 we imagine that several divisions of the Byzantine Army are camped outside alchemy city, each division commanded by its own general. The generals can
._'¢ommunicate with one another only by messenger. After observing the enemy,
they must decide upon a common plan of action. However, some of the generals
may be traitors, trying to prevent loyal generals from reaching agreement. The
generals must have an algorithm to guarantee that: CONDITION A. All loyal generals decide upon the same plan of .action. The loyal generals will all do what the algorithm says they should, but the
traitors may do anything they wish. The algorithm must guarantee Condition A
regardless of what the traitors do. The loyal generals should not only reach agreement, but should agree upon
a reasonable plan. We therefore also want to insure that: CONDITION B. A small number of traitors cannot cause the loyal generals to
adopt a bad plan. Condition B is hard to formalize, since it requires saying precisely what a
ind plan is, and we will not attempt to do so. Instead, we consider how the
generals reach a decision. Each general observes the enemy and communicates
his observations to the others. Let 12(1‘) be the information communicated by
the ith general. Each general uses some method for combining the values r)( l ), . , v(n) into a single plan of action, where n is the number of generals.
Condition A is achieved by having all generals use the same method for com—
bining the information, and Condition B is achieved by using a robust method.
For example, if the only decision to be made is whether to attack or retreat,
then tr(i) can be General i’s opinion of which option is best, and the final
decision can be based upon a majority vote among them. A small number of
traitors can affect the decision only if the loyal generals were almost equally
divided between the two possibilities, in which case neither decision could be
called bad. While this approach may not be the only way to satisfy Conditions A and B.
.t is the only one that we know of. It assumes a method by which the generals
communicate their values 0(1') to one another. The obvious method is for the .'lh general to send 11(1' ) by messenger to each other general. However, this does not work because satisfying Condition A requires that every loyal general obtain
the same values 12(1), . . . , out), and a traitorous general may send different 350 CONCURRENCY CONTROL AND RELIABILITY IN DISTRIBUTED SYSTEMS values to different generals. For Condition A to be satisﬁed, the following must
be tme. ' CONDI'I'ION l. Every loyal general must obtain the same information of!)
, Mn). ‘ Condition  implies that a general cannot necessarily use :1 value of (it!)
obtained directly from the ith general, since a traitorous r'th general may send
different values to different generals. This means that, unless WE: are careful, in
meeting Condition 1 we might introduce the possibility that the generals Lise a
value of vtr‘) different from the one sent by the ith general—~even though the
1th general is loyal. We must not allow this to happen if Condition B is to he
met. For example, We cannot permit a few traitors to cause the loyal generals
to base their decision upon the values "retreat", . . . , “retreat” ifevery loyal general sent the value “attack.” We therefore have the following requirement
or each i: ! CONDITION 2. If the ith general is loyal, then the value that he sends must be
used by every loyal general as the value of v(i ). We can rewrite Condition 1 as the condition that, for every 1' (whether or not
the rth general is loyal): CONDITION 1'. Any two loyal generals use the same value of o(i ). Conditions 1’ and 2 are both conditions on the single value sent by the ith
general. We can therefore restrict our consideration to the problem of how a
srngle general sends his value to the others. We phrase this in terms of a com; manding general sending an order to his lieutenants, obtaining the following
problem. Byzanrr'ne Generals Problem: A commanding general must send an order to
his 11 — l lieutenant generals such that: CONDITION lCl. All loyal lieutenants obey the same order. CONDITION 1C2. If the commanding general is loyal, then every loyal lieutenant
obeys the order he sends. Conditions lCl and [C2 are called the interactive consistency conditions. Not: that if the commander is loyal, then 1C1 follows from 1C2. However, the com
mander need not be loyal. To solve our original problem, the ith general sends his value of uti) by THE BYZANTlNE GENERALS 351 using a solution to the Byzantine Generals Problem to send the order “use rrti)
as my value," with the other generals acting as the lieutenants. 2. IMPOSSIBILITY RESULTS The Byzantine Generals Problem seems deceptively simple. lts difficulty is in dicated by the surprising fact that, it the generals can send only oral messages.
then no solution will work unless more than twoethirds of the generals are loyal.
In particular, with only three generals, no solution can work in the presence of
a single traitor. An oral message is one whose contents are completely under
the control of the sender, so that a traitorous sender can transmit any possible
message. Such a message corresponds to the type of message that computers
normally send to one another. In Section 4, we will consider signed, written
messages, for which this is not true. We new study that, with oral messages, no solution for three generals can
handle a single traitor. For simplicity, we consider the case in which the only
possible decisions are “attack” or “retreat.” Let us ﬁrst examine the scenario
pictured in Fig. 12.1, in which the commander is loyal and sends an “attack”
order, but Lieutenant 2 is a traitor and reports to Lieutenant 1 that he received
a “retreat” order. For Condition 1C2 to be satisﬁed, Lieutenant l must obey
the order to attack. Now consider another scenario, shown in Fig. 12.2, in which the commander
is a traitor and sends an “attack” order to Lieutenant 1 and a “retreat” order
to Lieutenant 2. Lieutenant 1 does not know who the traitor is, and cannot tell
what message the commander actually sent to Lieutenant 2. Hence, the scenarios
in these two pictures appear exactly the same to Lieutenant 1. If the traitor lies
consistently, then there is no way for Lieutenant l to distinguish between these
two situations, so he must obey the “attack” order in both of them. Hence,
whenever Lieutenant 1 receives an “attack” order from the commander. he
must obey it. However, a similar argument shows that if Lieutenant 2 receives a “retreat”
order from the commander, then he must obey it even if Lieutenant 1 tells him “he said ‘retreat’” “attack” “muck” Fig. 12.1. Lieutenant 2 a traitor. 352 CONCURRENCY CONTROL AND RELIABILITY IN DISTRIBUTED SYSTEMS Fig. 12.2. The commander a traitor. that the commander said “attack.” Therefore, in the scenario of Fig. 12.2.
Lieutenant 2 must obey the “retreat” order while Lieutenant 1 obeys the “at,
tack” order, thereby violating Condition 1C 1. Hence, no solution exists for
three generals that works in the presence of a single traitor. This argument may appear convincing, but we strongly advise the reader to
be very suspicious of such nonrigorous reasoning. Although this result is indeed
correct, we have seen equally plausible “proofs” of invalid results. We knew
of no area in computer science or mathematics in which informal reasoning is
more likely to lead to errors than in the study of this type of algorithm. For a
rigorous proof of the impossibility of a threeigcneral solution that can handle a
single traitor, we refer the reader to reference 12. Using this result, we can show that no solution with fewer than 3m + i
generals can cope with m traitors.* The proof is by contradictioniwe assume
such a solution for a group of 3m or fewer generals, and use it to construct a
threeigeneral solution to the Byzantine Generals Problem that works with one
traitor, which we know to be impossible. To avoid confusion between the two
algorithms, we will call the generals of the assumed solution Albanian generals.
and those of the constructed solution will be called Byzantine generals. Thus.
starting from an algorithm that allows 3m or fewer Albanian generals to cope
with m traitors, we will construct a solution that allows three Byzantine generals
to handle a single traitor. The thrceegeneral solution is obtained by having each of the Byzantine gen
erals simulate approximately onethird of the Albanian generals, so that each
Byzantine general is simulating at most at Albanian generals. The Byzantint
commander simulates the Albanian commander plus at most in — l Albanian
lieutenants, and each of the two Byzantine lieutenants simulates at most m
Albanian lieutenants. Since only one Byzantine general can be a traitor, and hi:
simulates at most in Albanians, at most, m of the Albanian generals are traitors. Hence, the assumed solution guarantees that 1C] and [C2 hold for the Albanian . *More precisely. no such solution exists for three or more generals. since the problem is trivial
two generals. THE BYZANTINE GENERALS 353 generals. By ICl, all the Albanian lieutenants being simulated byra loyal By
zantine lieutenant obey the same order, which is the order he is to obey. It is
easy to check that Conditions lCl and 1C2 of the Albanian generals solution
imply the corresponding conditions for the Byzantine generals, so we have
constructed the required impossible solution. One might think that the difficulty in solving the Byzantine Generals Problem
stems from the requirement of reaching exact agreement. We now demonstrate
that this is not the case by showing that reaching approximate agreement is just
as hard as reaching exact agreement. Let us assume that instead of trying to
agree on a precise battle plan, the generals must agree only upon an approximate
time of attack. More precisely, we assume that the commander orders the time
of the attack, and we require the following two conditions to hold: CONDITION 1C1 '. All loyal lieutenants attack within ten minutes of one an other.
CONDl'l‘ION ICZ'. If the commanding general is loyal, then every loyal lieui
tenant attacks within ten minutes of the time given in the commander’s order. (We assume that the orders are given and processed the day before the attack,
and the time at which an order is received is irrelevant—only the attack time given in the order matters.) Like the Byzantine Generals Problem, this problem is unsolvable unless more
than twothirds of the generals are loyal. We prove this by ﬁrst showing that if
there were a solution for three generals that coped with one traitor, then we
could construct a threegeneral solution to the Byzantine Generals Problem that
also worked in the presence of one traitor. Suppose the commander wishes to
send an “attack” or “retreat” order. He orders an attack by sending an attack
time of l :00, and orders a retreat by sending an retreat time of 2 r 00, using the
assumed algorithm. Each lieutenant uses the following procedure to obtain his
order. 1 . After receiving the attack time from the commander. a lieutenant does one of the following:
If the time is 1: 10 or earlier, then attack.
If the time is l :50 or later, then retreat. Otherwise, continue to Step 2. 2. Ask the other lieutenant what decision he reached in Step 1.
1f the other lieutenant reached a decision, then make
the same decision he did.
Otherwise, retreat. 354 CONCURRENCY CONTROL AND RELIABILITY 1N DISTRIBUTED SYSTEMS It l‘ollon's from IC2’ that. if the commander is loyal, then a loyal lieutenant
will obtain the correct order in Step 1, so 1C2 is satisﬁed. If the commander is
loyal. then 1C1 follows from 1C2, so we need only prove 1C1 under the as
sumption that the commander is a traitor. Since there is at most one traitor, this
means that both lieutenants are loyal. If follows from ICl ' that, if one lieutenant
decided to attack in Step I. then the other cannot decide to retreat in Step 1.
Hence, they will both either come to the same decision in Step 1, or at least
one of them will defer his decision until Step 2. In this case, it is easy to see
that they both arrive at the same decision, so 1C1 is satisﬁed. We have therefore
constructed a threegeneral solution to the Byzantine Generals Problem that
handles one traitor, which is impossible. Hence, we cannot have a threegeneral
algorithm that maintains lCl’ and lC2' in the presence of a traitor. The method of having one general simulate m others can now be used to
prove that no solution with fewer than 3m + l generals can cope with m traitors.
The proof is similar to the one for the original Byzantine Generals Problem,
and is left to the reader. 3. A SOLUTION WITH ORAL MESSAGES
,. We have shown above that, for a solution to the Byzantine Generals Problem
using oral messages to cope with m traitors. there must be at least 3m + 1
generals. We now give a solution that works for 3m + 1 or more generals.
However, We ﬁrst specify exactly what we mean by “oral messages.“ Each
general is supposed to execute some algorithm that involves sending messages
to the other generals, and we assume that a loyal general correctly executes his
algorithm. The deﬁnition of an oral message is embodied in the following as—
sumptions which we make for the generals“ message system. A1. Every message that is sent is delivered correctly.
A2. The receiver of a message knows who sent it.
A3. The absence of a message can be detected. Assumptions A1 and A2 prevent a traitor from interfering with the commuv
nication between two other generals, since by A1 he cannot interfere with the
messages they do send, and by A2 he cannot confuse their intercourse by intro
ducing spurious messages. Assumption A3 will foil a traitor who tries to prevent
a decision by simply not sending messages. The practical implementation of
these assumptions is discussed in Section 6. Note that assumptions Al—A3 do
not imply that a general hears any message sent between two other generals. The algorithms in this section and in the following one require that each
general be able to send messages directly to every other general. In Section 5,
we describe algorithms which do not have this requirement. THE BYZANTINE GENERALS 355 A traitorous commander may decide not to send any order. Since the lieuten—
ants must obey some order. they need some default order to obey in this case.
We let RETREAT be this default order. We inductively define the Oral Message algorithms as 0M(mt for all not»
negative integers m, by which a commander sends an order to n — l lieutenants.
We will show that OM (m) solves the Byzantine Generals Problem for Bar + l
or more generals in the presence of at most In traitors. We will ﬁnd it more
convenient to describe this algorithm in terms of the lieutenants “obtaining a value” rather than “obeying an order.” ‘ ‘ I
The algorithm assumes a function majority With the property that, if a majority of the values tr, equal D, then majority (oh  l  , on, 1) equals 12. (Actually, it
assumes a sequence of such functionsuone for each n.) There are two natural
choices for the value of rrtajor‘iry (in, '  ' , on r , )r 1. The majority value among the v, if it exists, otherwise the value RE TREAT.
2. The median of the vi, assuming that they come from an ordered set. The following algorithm requires only the aforementioned property of majority. Algorithm 0M (0): 1. The commander sends his Value to every lieutenant.
2. Each lieutenant uses the value he receives from the commander, or uses the value RETREAT if he receives no value. Algorithm OM(m), m > 0: l. The commander sends his value to every lieutenant. 2. For each i, let t}, be the value Lieutenant i receives from the commander.
or else be RETREAT if he receives no value. Lieutenant i acts as the
commander in algorithm OMtnr m l) to send the value r»; to each of
the n A 2 other lieutenants.  3. For each i, and eachj i i. let 191 be the value Lieutenant i received
from Lieutenant j in Step 2 (using Algorithm 0M(m A 1)). or else
RETREAT if he received no such value. Lieutenant 1’ uses the value majority (01, '  ' , on“). To execute 3. every processor must know when to apply the majority function,
in other words, when to stop Waiting for more values to come. To do this, one
can use some sort of timeout technique, as we will discuss in Section 6. Note
that recently, Fischer, Lynch, and Paterson8 proved that there is no. way to reach
any agreement unless we assume some bound on the time at which a reliable processor responds. 356 .CONCURRENCY CONTROL AND RELIABlLITY IN DISTRIBUTED SYSTEMS Fig. 12.3. Algorithm OMll liLieutenant 3 a traitor. To understand how Algorithm OM(m) works, we consider the case at 2 1.
it = 4. Figure 12.3 illustrates the messages received by Lieutenant 2 when the
commander sends the value v and Lieutenant 3 is a traitor. In the ﬁrst Step of
OM( l ), the commander sends v to all three lieutenants. In the second Step.
Lieutenant 1 sends the value v to Lieutenant 2, using the trivial algorithm
OM(O). Also in the second Step, the traitorous Lieutenant 3 sends Lieutenant
2 some other value x. In Step 3, Lieutenant 2 then has it. : {’3 = it, and U3 =
x. so he obtains the correct value I) = majority (12, it, x). Next, we see what happens if the commander is a traitor. Figure 12.4 shows
the values received by the lieutenants if a traitorous commander sends three
arbitrary values it, y, and z to the three lieutenants. Each lieutenant obtains V.
= x, V : y, and V3 = ,3, so they all obtain the same value majority (x, y. a)
in Step 3, regardless of whether or not any of the three values x, y, and z are
equaL The recursive algorithm OM(m) invokes n — 1 separate executions of the
algorithm OM(m — 1), each of which invokes n — 2 executions of OM(m *
2), etc. This means that for m > 1, a lieutenant sends many separate messages
to each other lieutenant. There must be some way to distinguish between these
different messages. The reader can verify that all ambiguity is removed if each COMMANDER Fig. 12.4. The algorithm OMil leThe commander a traitor. THE BYZANTINE GENERALS 357 llClllCnﬂni t' preﬁxes the number 1' to the value l'i that he sends in Step 2. As the
recursion “unfolds,” the algorithm OM(m * k} will be called (it — l). , . . (n — k) times to send a value preﬁxed by a sequence of A lieutenant's
numbers. This implies that. the algorithm requires sending an exponential number
.ti‘ messages. There exist algorithms which require only it polynomial number
of messages,4 but they are substantially more complex than the one we present. To prove the correctness of the algorithm OM(m) for arbitrary or. we ﬁrst
prove the following lemma. Lemma 1: For any in and k, Algorithm OM(m) satisfies Condition 1C2 if
there are more than 2k + m generals, and at most It traitors. Proof: The proof is by induction on m. Condition 1C2 only speciﬁes what
must happen if the commander is loyal. Using Al, it is easy to see the trivial
algorithm OM(O) works if the commander is loyal, so the lemma is true in : t). We now assume it is true form — l, m > 0, and prove it form. In Step 1, the loyal commander sends a value I) to all n —— l lieutenants. In
Step 2, each loyal lieutenant applies M(m a l) with n  1 generals. Since by
hypothesis H > 2k + m, we have n  l > 2k + (m  1), so we can apply
the induction hypothesis to conclude that every loyal lieutenant gets f/‘J = v for
each loyal Lieutenant j. Since there are at most k traitors. and n — l > 2k +
int — l) 2 2k, a majority ofthe rt — l lieutenants are loyal. Hence. each loyal lieutenant has majority (in,    , 12,, ,, l) = it in Step 3. proving 1C2. The following theorem asserts that Algorithm OM(m) solves the Byzantine
Generals Problem. Theorem 1: For any in, Algorithm OM(m) satisﬁes Conditions {Cl and 1C2
if there are more than 3m generals, and at most at traitors. Proof: The proof is by induction on m. If there are no traitors. then it is
easy to see that OM(O) satisﬁes 1C1 and 1C2. We therefore assume that the
theorem is true for OM(m — l) and prove it for OM(m), m > 0. We ﬁrst consider the case in which the commander is loyal. By taking 1: equal
to m in Lemma 1, we see that OM(m) satisﬁes 1C2. Condition 1C1 follows
from 1C2 if the commander is loyal, so we need only verify 1C1 in the case
that the commander is a traitor. There are at mest m traitors and the commander is one of them. so at most
"I — l of the lieutenants are traitors. Since there are more than 3m generals.
there are more than 3m  l lieutenants, and 3m — l > 3(m * 1). We may
therefore apply the induction hypothesis to conclude that OM (m — 1) satisﬁes conditions ICI and 1C2. Hence. For each j, any two loyal lieutenants get the 358 CONCURRENCY CONTROL AND RELIABILITY IN DISTRIBUTED SYSTEMS same value for n, in Step 3. (This follows from 1C2 if one of the two lieutenants
is Lieutenant j, and from 1C1 otherwise.) Hence, any two loyal lieutenants get
the same vector of values 1),,    , one I, and therefore obtain the same value
majority (v1,    , it,,,,) in Step 3, proving 1C1. 4. A SOLUTION WITH SIGNED MESSAGES As we saw from the scenario of Fig. 12.1 and 12.2, it is the traitors’ ability to
lie that makes the Byzantine Generals Problem so difﬁcult. The problem be
comes easier to solve if we can restrict that ability. One way to do this is to
allow the generals to send unforgeable signed messages. More precisely, we
add to Al—A3 the following assumption. A4.(a) A loyal general‘s signature cannot be forged, and any alteration of
the contents of his signed messages can be detected. (b) Anyone can verify the authenticity of a general’s signature. Note that we make no assumptions about a traitorous general‘s signature. In
particular, we allow his signature to be forged by another traitor, thereby per
mitting collusion among the traitors. Having introduced signed messages, our previous argument that four generals
are required to cope with one traitor no longer holds. In fact, a three—general
solution does exist. We now give an algorithm that copes with m traitors for
any number of generals. (The problem is vacuous if there are fewer than in +
2 generals.) In our algorithm, the commander sends a signed order to each of his lieuten~
ants. Each lieutenant then adds his signature to that order and sends it to the
other lieutentants, who add their signatures and send it to others, and so on.
This means that a lieutenant must effectively receive one signed message, make
several copies of it, and sign and send these copies. It does not matter how
these copies are obtainedia single message might be photocopied, or else each
message might consist of a stack of identical messages which are signed and
distributed as required. ' Our algorithm uses a function choice, which is applied to a set of orders to
obtain a single one. It is deﬁned as follows: If the set Vconsists of the single element it,
then choice (V) = 1),
otherwise choice (V) = RETREAT In the following algorithm, we let x:t' denote the value x signed by general 1’. Thus, v:j:t‘ denotes the value it signed by j, and then that value Uij signed THE BYZANTINE GENERALS 359 by t'. We let general 0 be the commander. In this algorithm, each lieutenant i
nittintains a set l/,, containing the set of properly signed orders he has received
at) far. (If the commander is loyal, then this set should never contain more than
.2 single element.) Do not confuse V,, the set of orders he has received, with
the set of messages that he has received. There may be many different messages
with the same order. We assume the existence of a bound on the time it takes
correct processors to sign and relay 3 message. Thus. it implies the existence
of some phases such that, if a message with r signatures arrives after phase r,
then only faulty processors relayed it, so it can be ignored. This assumption
tloes not necessarily mean complete synchronization of the processors. Algorithm SM(m)
initially V, = o. l. The commander signs and sends his value to every lieutenant at phase 0.
2. For each i:
A. If Lieutenant 1' receives a message ofthe form 1’20 from the commander
at phase 0. and he has not yet received any order, then: (i) He lets V,
equal {v}. (ii) He sends the message 11:021' to every other lieutenant.
B. If Lieutenant 1' receives a message of the form 11:0:j. :    :j, at k, l
S k $ at, V, contains at most one value, U is not in the set V,, and
the signatures belong to the different lieutenants, then: (i) He adds I'
to V,. (ii) If k < m, then he sends the message 0:0:j1:  ' wjkzi to every lieutenant
other thanjl, '  ' ,jt. 3. For each 1': At the end of phase in he obeys the order choice (K). Observe that the algorithm requires in + 1 phases of message exchange. Note
that in Step 2. Lieutenant i ignores any message containing an order 1’ that is
already in the set V,, and accepts at most two different orders originated by the
commander. Moreover, Lieutenant t' ignores any messages that do not have the proper
form ofa value followed by a string of dili'crent signatures. If packets of identical
messages are used to avoid having to copy messages, this means thathe throws
away any packet that does not consist of a sullicicnt number of identical. prop~
crly signed messages. (There should be (it —— k — 2)(n * t't — 3). . . . (It —
m h 2) copies of the message if it has been signed by k lieutenants.) Figure 12.5 illustrates algorithm SMU) for the case of three generals. when
the commander is a traitor. The commander sends an "attack" order to one
lieutenant and a “retreat” order to the other. Both lieutenants receive the two 360 CONCURRENCY CONTROL AND RELIABILITY IN DISTRIBUTED SYSTEMS COMMANDER "macaw" Fig. 12.5. Algorithm SMl‘l lmThe commander a traitor. orders in Step 2, so after step 2 V] = V3 2 {“attaek,” “retreat"}, and they
both obey the order choice ({“attack,” “retreat”}). Observe that here, unlike
the situation in Fig. 12.2, the lieutenants know the commander is a traitor because his signature appears on two different orders, and A4 states that only
he could have generated those signatures. In algorithm SM(m), a lieutenant signs his name to acknowledge his receipt
of an order. If he is the mth lieutenant to add his signature to the order, then
that signature is not relayed to anyone else by its recipient, so it is superfluous.
(More precisely, assumption A2 makes it unnecessary.) In particular, the lieu
tenants need not sign their messages in SMG). We now prove the correctness of our algorithm. Theorem 2: For any in: Algorithm SM( m) solves the Byzantine Generals
Problem, if there are at most In traitors. Proof} We ﬁrst prove 1C2. If the commander is loyal, then he sends his
signed order 0:0 to every lieutenant in Step l. Every loyal lieutenant will
therefore receive the order v on time in Step 2A. Moreover, since no traitorous
lieutenant can forge any other message of the form L“ :0, a loyal lieutenant can
receive no additional order in Step 28. Hence, for each loyal lieutenant r', the
set V, obtained in Step 2 consists of the single order 1:, which he will obey in
Step 3 by property 1 of the choice function. This proves 1C2. Since 1C1 f01I0ws from 1C2 if the commander is loyal, to prove lCl we need
only consider the case in which the commander is a traitor. Two loyal lieutenants
i and j obey the same order in Step 3 if the function choice applied to the sets
of orders V, and that they receive in Step 2 induces the same value. Therefore
to prove [C] it sufﬁces to prove two parts: one, if a loyal lieutenant i puts
exactly one order v into in Step 2, then every loyal lieutenant will put exactly
the same order 0 into in Step 2; two, if has two elements for some loyal
lieutenantj, then Vk has two elements for any other loyal lieutenant It. To prove the ﬁrst part, we must show that j receives a properly signed message containing that order. Ifi receives the order U in Step 2A on time, then he sends ' THE BYZANTINE GENERALS 361 it toj in Step 2A(ii), so thatj receives it on time (by Al). If i adds the order
to V, in Step 2B, then he must receive a ﬁrst message ofthe fonn 11:0:j. : ‘   :yk.
ll‘j is one of the j,, then by A4 he must already have received the order 1’. If not, we consider two cases: I. k < m: In this case. 1' sends the message rrz0:j, 1  :jk:i to}, soj must
receive the order 1/. 2. k i m: Since the commander is a traitor. at most at i 1 of the lieutenants
are traitors. Hence, at least one of the lieutenants jl,  ' ‘ , j,,, is loyal.
This loyal lieutenant must have scntj the value it when he ﬁrst received
it, soj must, therefore, receive that value. Similar arguments prove that if any loyal lieutenant i decides to put two orders
in V,. then every other loyal lieutenant will decide to do so. This completes the proof. During the algorithm. every loyal lieutenant relays‘to every other lieutenant
at most two orders. Therefore, the total number of messages exchanged is
bounded by 2n(n * l), where n is the total number of generals. By usrng more
phases and more sophisticated algorithms, one can reduce the total number of ‘7 . ‘
messages to 0(n + m') as shown in reference 5. 5. MISSING COMMUNICATION PATHS Thus far, we have assumed that a general (or lieutenant) can send messages
directly to every other general (or lieutenant). We now remove this assumption
Instead, we supposed that physical barriers place some restrictions on who can
send messages to whom. We consider the generals to form the nodes of a
simple,* tinitc, undirected network graph G, where an are between two nodes
indicates that those two generals can send messages directly to one another. We
now extend algorithms OM (m) and SM ( m ), which assumed G to be completely
connected, to more general graphs. I I ~
The commander sends his value through routes in the network. For sunplicrty.
assume that every message contains the information about the route through
which it is supposed to be delivered. Thus. before sending a message, the
Commander chooses a route and sends the message containing the route. The
receiving lieutenant, however, does not know in advance the route through
which it is going to receive the message. Notice that a traitor may also change
the routing through which the message is supposed to be delivered. Moreover. *A simple graph is one in which there is at most one are joining any two nodes. and every .trL
connects two distinct nodes. 362 CONCURRENCY CONTROL AND RELIABILITY iN DISTRIBUTED SYSTEMS a traitor may also produce many false copies of the message it is supposed to
relay, then send them through various routes of its own choice. A traitor may change the record of the route to prevent the receiving lieutenant
identifying it as the source of faulty messages. To ensure the inclusion of trai—
tors’ names in the routes, assume that, aftera loyal lieutenant receives a message
to relay. he makes sure the lieutenant from which the message has arrived is
supposed to relay it to him. Only then does he relay the message to the next
lieutenant along the route to the receiving lieutenant. A network has corznecrivt'rv k if. for every pair of nodes, there exists It node
independent paths connecting them. To extend our oral message algorithm OM(m), we need the following deﬁ
nition, where two generals are said to be neighbors if they are joined by an arc. Deﬁnition: Let {an    , or} be the set of copies of the commander’s value
received by Lieutenant i. Let U, be a set of lieutenants that does not contain the
commander himself. A set U! is called a set of suspicious lieutenants detemiined
by lieutenant i if every message aj that did not pass through lieutenants in Ui
carries the same value. Algorithm Purifying (m, at, ' ‘ ' . (If, i) 1. If a set U, of up to m suspicious generals exists, then the puriﬁed value is
the value of the messages that did not pass thorugh Ui. If no message is
left, the value is RETREAT. 2. If there is no set Ui of cardinality up to m, then the puriﬁed vlaue is
RETREAT. Notice that if more than one set of suspicious generals exists. then there may
be many puriﬁed values, but because of the way the algorithm will be used, a
plurality of possible values will pose no problem. Before proving that the For
rifying Algorithm actually does the tight ﬁltration, consider application of the
Purifying Algorithm to the netw0rk shown in Fig. l2.6. The network contains 10 generals, and at most 2 traitors. Assume that s and
u are the faulty generals. The commander 5 sends the value a to Lieutenants l
and 2, and the value b to the other lieutenants. Assume that Lieutenant 1 receives
3’s value through the following paths: :51
:521
:ml
:5741
'5851. ' P‘PP‘P"
S'swhat: THE BYZANTINE GENERALS 363 Fig. 12.6. Ten generals with tWO traitors, s is the commander. The Purifying Algorithm provides the puriﬁed value a to Lieutenant 1. by choos
ing {7, 8} as the set of suspicious generals. Similarly, Lieutenant 2 obtains the
Value a. But the rest of the network obtain the value b by choosing {1. 2} as
the set of suspicious generals. The following theorem proves that, with sufﬁcient connectivity, all of the
loyal lieutenants obtain the same value if the commander is loyal. Theorem 3: Let G be a network of generals which contains at most I traitors,
and the connectivity of which is at least 2m + 1. if a loyal commander sends
2m + 1 copies of its value to every lieutenant, through disjoint paths. then, by
use of the Purifying Algorithm, every loyal lieutenant can obtain the com
mander’s value. Proof: The ioyal commander sends every lieutenant 2m + 1 copies of a
Value. through disioint paths. It sends the same value to all lieutenants. Let a], 364 CONCURRENCY CONTROL AND RELIABILITY 1N DISTRIBUTED SYSTEMS ' . a, be the set of all of the copies of the commander’s value that Lieutenant
1' receives. There are at most in traitors; therefore, at most in values might be
lost. This implies that the number of copies, r, is at least in + 1. At least in +
l of the messages are relayed through routes which contain only loyal generals;
each one of the loyal lieutenants relays the message faithfully without changing
it. This implies that at least in + l of the received copies carry the original
value. Note that, if the commander were a traitor, then the above reasoning
would fail to hold. It may be that the number of copies received is much more than in + l, and
even that the majority of them carry a faulty value. The task of Lieutenant i is
to ﬁnd the correct value out of this mess. It does this by applying the Purifying
Algorithm. Observe that the technique, described at the beginning of the Section,
of adding the names of the generals along the route to the message, enables i
to dilierentiate among the values. Every message which passed through traitors
contains at least one name of a traitor; more precisely, every list of generals
added to a message contains at least the name of the last traitor that relayed it. Step 1 of the Purifying Algorithm requires one to look for a set U; of up to
m generals with the property that all of the values which have not been relayed
by generals from this set are the same. The network contains at most in traitor
generals, and by assumption, the commander is loyal. Therefore, Lieutenant i
should be able to ﬁnd such a set U,; it may be that the set he ﬁnds is not exactly
the set of traitors, but U, necessarily eliminates the wrong values. The set U,
cannot eliminate the correct values, because there are at least in + 1 independent
copies of them and U; can eliminate at most In independent copies. This come
pletes the proof of the theorem. In the case where the commander is a traitor, Theorem 3 does not ensure the
ability to reach a unique agreement on a value. But the way we will use it in
algorithm OM(m) will overcome the faultiness of the commander. To obtain Byzantine Agreement in a network with connectivity k, k 2 2m
+ l, we improve algorithm OM(m) as follows: whenever a general sends a
message to another, he sends it through 2141 + l disjoint paths; whenever a
lieutenant has to receive a meSsage, he uses the Purifying Algorithm to decide
on a puriﬁed value. Call the improved algorithm OM’(m). To prove the validity of the algorithm OM' (m), observe that the same general
can be used again and again as a relay in the disjoint paths between pairs of
generals, even if he was a commander in previous recursions. Moreover, even
being a traitor does not matter for the simple reason that the total number of
independent paths that would be affected by traitors will never exceed in. Theorem 4. Let G be a network of n generals with connectivity k 2 2m + l, where n 2 2m + i. If there are at most In traitors, then Algorithm OM'(m)
(with the above modifications) solves the Byzantine Generals Problem. THE BYZANTINE GENERALS 365 ‘llllllllli + Fig. 12.7. T is the set of traitors. Proof. The proof is essentially the same as the proof of Theorem I, using
Theorem 3 everywhere to show that, when a loyal lieutenant sends a value.
every other loyal lieutenant agrees on it. The fact that we use the whole network
to relay the information again and again eliminates any loss of connectivity,
and enables us to obtain the desired result. The details are left to the reader. To show that Theorem 4 is the best possible, we prove that the connectivity
of 2m + l is necessary for solving the Byzantine General Problem. The case in which the number of traitors is not less than half of the connecr
tivity is easier to visualize, and is proved in Lemma 2. Figure 12.7 describes
the ease schematically. The basic idea is that, if the traitors are not less than
half of the bottleneck, then they can prevent the loyal generals from reaching
an agreement by behaving as a ﬁlter. Every message that passes from right to
left would be changed to carry one value, and every message in the reverse
direction would carry another value. This behavior can cause all of the. generals
on the right side to agree on a different value from those on the left side. Lemma 2: The Byzantine General Problem cannot be solved in a network
of it generals if the number of traitors is not less than half the connectivity of the network. Proof‘ Let G be a network with connectivity k. and let 5.,  '  . s; be a set of generals which disconnect the network into two nonempty parts, GI and
Ca. Assume that the subset 5.,   ‘ . s," is the set of traitors, where m 2 k/2.
Consider the following cases for the various locations in which the commander
can be. Assume the commander s is in the subnetwork GI, and that he sends the
value a to all of the lieutenants in the network. The traitors can follow the
doctrine: change every message which passes from CI to G3 to carry the value
b: and leave every other value as a. Change the messages passing back from
G: to Gl to carry the value a. In this situation. every lieutenant in G. can
consider 5 to be a loyal general. and thus agree on a. Similarly. the processors S 3,, choose a. But every receiver in G3 cannot considers a traitor. m+I1"s 366 CONCURRENCY CONTROL AND RELIABILITY IN DISTRIBUTED SYSTEMS They are able to ignore the conflicting values they have received by ignoring
either the set 5., ' ' ' . 5,, or 5m“,   ' , st. On the other hand, they cannot
agree on a value, because each of the values can be correct, depending upon
what the commander has said and which generals are traitors. Since in 2 k, the
lieutenants in (32 will choose [2, in contradiction to 1C2. The case where the
commander is in G2 is identical by symmetry. Assume now that the commander is in the set 31,  '  . 5*. If the commander
is loyal and sends the same value a to every lieutenant, then by reasoning,
similar to the previous case, the traitors can prevent agreement. If the Gear
mander is a traitor, he can send the value a to G, and b to (32. Thus. similarly
to the previous case every decision implies violation of 1C2. For a more rigorous
proof see reference 3. Our extension of Algorithm 0M(m) requires that the graph G be 2m + 1
connected, which is a rather strong connectivity hypothesis. In contrast, Algou
rithm SM(m) is easily extended to allow the weakest possible connectivity
hypothesis. Let us ﬁrst consider how much connectivity is needed for the By—
zantine Generals Problem to be solvable. 1C2 requires that a loyal lieutenant
obey a loyal commander. This is clearly impossible if the commander cannot
communicate with the lieutenant. In particular, if every message from the com
mander to the lieutenant must be relayed by traitors, then there is no way to
guarantee that the lieutenant gets the commander’s order. Similarly, ICl cannot
be guaranteed if there are two lieutenants who can only communicate with one
another via traitorous intemtediaries. The weakest connectivity hypothesis for which the Byzantine Generals Prob—
lem is solvable is that the subnetwork formed by the loyal generals be connected.
We will show that, under this hypothesis, the algorithm SM(n e 2) is a so
lution, where n is the number of generalsmregardless of the number of traitors.
Of course, we must modify the algorithm so that generals only send messages
to where they can be sent. More precisely, in Step 1, the commander sends his
signed order only to his neighboring lieutenants; and in Step 2B, lieutenant i
only sends the message to every neighboring lieutenant not among the j,. We prove the following more general results, where the diameter of a network
is the smallest number :1 such that any two nodes are connected by a path
containing at most d ares. Theorem 5: For any in and d, if there are at most in traitors and the network
of loyal generals has diameter d, then Algorithm SM(m + d e l) (with the
above modiﬁcation) solves the Byzantine Generals Problem. Proof: The proof is quite similar to that of Theorem 2, and will just be
sketched. To prove 1C2, observe that, by hypothesis, there is a path from the THE BYZANTINE GENERALS 387 loyal commander to a lieutenant i going through d ~— l or fewer loyal lieutenants.
Those lieutenants will correctly relay the order until it reaches i. As before,
assumption A4 prevents a traitor from forging a different order. To prove 1C1, we assume that the commander is a traitor and must show that
all loyal lieutenants have received a unique order, or every one decides on
RETREAT. The idea is exactly as above. Suppose r' receives an order I'ZOIjl‘.  :jk not signed by j. If k < tn, then i will send it to every neighbor who
has not already received that order, and it will be relayed toj within (1' a l
more steps. If k 2 m, then one of the ﬁrst in signers must be loyal. and must
have sent it to all of his neighbors, whereupon it will be relayed by loyal generals
and will reachj within at — 1 steps. Corollary. If the network of loyal generals is connected, then SM(n  2)
(as modiﬁed above) solves the Byzantine Generals algorithm for n generals. Proof: Let (1’ be the diameter of the network of loyal generals. Since the
diameter of a connected graph is less than the number of nodes, there must be
more than (I loyal generals, and fewer than n i d traitors. The result follows
from the theorem by letting in : n i d i l. Theorem 5 assumes that the subnetwork of loyal generals is connected. Its
proof is easily extended to show that, even if this is not the case, if there are
at most in traitors, then the algorithm SM(m + d — 1) has the following two
properties: 1) Any two loyal generals connected by a path of length at most if
passing through only loyal generals will obey the same order; and 2) if the
commander is loyal, then any loyal lieutenant connected to him by a path of
length at most or + of passing only through loyal generals will obey his order. 6. RELIABLE SYSTEMS Other than using intrinsically reliable circuit components, the only way we know
for implementing a reliable computer system is to use several different “pro
cessors” to compute the same result, and perform a majority vote on their
outputs to obtain a single value. (The voting may be performed within the
system, or externally by the users of the output.) This is true whether one is
implementing a reliable computer using redundant circuitry to protect against
the failure of individual chips, or a ballistic missile defense system using re
dundant cemputing sites to protect against the destruction of individual sites by
a nuclear attack. The only difference is in the size of the replicated “processor.” The use of majority voting to achieve reliability is based upon the assumption
that all the nonfaulty processors will produce the same output. This is true so
long as they all use the same input. However, any single input datum comes 368 CONCURRENCY CONTROL AND RELIABILITY IN DISTRIBUTED SYSTEMS from a single physical component—cg, from some other circuit in the reliable
computer, or from some radar site in the missile defense system—and a ma]
functioning component can give different values to different processors. More
over, different processors can get different values even from a nonfaulty input
unit, if they read the value while it is changing. For example, if two precessors
read a clock while it is advancing, then one may get the old time and the other
the new time. This can only be prevented by synchronizing the reads with the
advancing of the clock. In order for majority voting to yield a reliable system, the following two
conditions should be satisﬁed: 1) All nonfaulty, processors must use the same
input value (so that they produce the same output): and 2) if the input unit is
nonfaulty, then all nonfaulty processes use the value it provides as input (so that
they produce the correct output). These are just our interactive consistency conditions 1C1 and 1C2, where the
“commander” is the unit generating the input, the “lieutenants” are the pro—
cessors. and “loyal” means nonfaulty. It is tempting to try to circumvent the problem with a “hardware” Solution.
For example, one might try to insure that all processors obtain the same input
value by having them all read it from the same wire. However, a faulty input
unit could send a marginal signal along the wire—a Signal that can be interpreted
by some processors as a 0 and by others as a 1. There is no way to guarantee
that ditterent proecssors will get the same value from a possibly faulty input
device except by having the processors communicate among themselves to solve
the Byzantine Generals Problem. Of course, a faulty input device may provide meaningless input values. All
that a Byzantine Generals Solution can do is guarantee that all processors use
the same input value. If the input is an important one, then there should be
several separate input devices providing redundant values. For example, there
should be redundant radars as well as redundant processing sites in a missile
defense system. However, redundant inputs cannot achieve reliability; it is still
necessary to insure that the nonfaulty processors use the redundant data to pro«
duce the same output. In case the input device is nonfaulty but gives different values because it is
read while its value is changing, we still want the nonfaulty processors to obtain
a reasonable input value. It can be shown that if the functions majority and
choice are taken to be the median functions, then our algorithms have the prop
erty that the value obtained by the nonfaulty processors lies within the range of
values provided by the input unit. Thus, the nonfaulty processors will obtain a
reasonable value so long as the input unit produces a reasonable range of values. We have given several solutions, but they have been stated in terms of By—
zantine Generals rather than in terms of computing systems. ‘5 [J in ()I ll]. ll. I 1 l]. v THE BYZANTINE GENERALS 369 REFERENCES DeMillo, R. A., N. A. Lynch, and M. Merritt, Cryptographic protocols. in Prov. Milt/1C)”
SIGACT Symp. rm Theory of Computing, pp. 383400, May 1982. Diiiic, W. and M. E. Hellman, New directions in cryitrtgmphr. [FEE Thins. Inform. theory
[7122, pp. 6447654, Nov. 1976. [)olev, D. The Byzantine generals strike again, J. Algorithrtis, vol. 3, pp. 14—30, Jan. l982.
Dolcv, D., M. Fischer, R. Fowler, N. Lynch, and R. Strong, Efﬁcient Byzantine agreement
without authentiCation, Info. and Control, vol. 3, pp. 2577274. 1983. Dolev, D. and R. Reischuk, Bounds on information exchange for Byzantine agreement, JACM,
vol. 32. pp. 191A204, lass. ' Dolev, D., R. Reisehuk, and H. R. Strong, ‘Eventual‘ is earlier than ‘imrnediaie,‘ Prot‘. 23rd
Annual IEEE Swap. on Foundations of Computer Science, pp. 196203. 1982. Dolcv, D. and H. R. Strong, Authenticated algorithms for Byzantine agreement, SIAM J, on
Cmnp., vol. 12, pp. 6567666, I933. Fischer, M., N. Lynch, and M. Paterson. impossibility oi‘distrihuted consensus with one fault}
processor, JACM, vol. 32, pp. 374682, 1985. Lmnport. L. and P, M. Melliar—Smith, Si‘ttrhmm'zing Clocks in the Presence othtttt'ts. Tech.
Rep, Computer Science Lab, SR1 lntemational, 1984. Larnpon, l.., R. Shostak, and M. Pcase, The Byzantine generals problem, ACM ﬁrms. on
Programming languages and Si‘srems, vol, 4, pp. 382—40}, July 1982. Lynch, N. arid M. Fischer, A lower bound for the time to assure interactive consistency.
Information Processing Letters, vol. 14, pp. 182—186, 1982. Pease, M.. R. Shostak. and L. Lampon. “Reaching Agreement in the Presence of Faults," J,
ACMZP’. vol. 2, pp. 228434. Apr. 1980. Rivest, R. L, A. Shamir, and L. Adleman, A method for obtaining digital signatures and
public~key cryptosystcms, CACM, vol. 21, pp. 120—126. Feb. 1978. Ci??? I CONCURRENCY CONTROL
' AND RELIABILITY
IN DISTRIBUTED SYSTEMS Edited by Bharat K. Bhargava
Department of Computer Science
Purdue University
West Lafayette, Indiana VAN NOSTRAND REINHOLD COMPANY
W New York ...
View
Full Document
 Spring '08
 Agrawal,V

Click to edit the document details