This preview shows pages 1–4. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: A Parallel Algorithm for Constructing Binary Decision Diagrams Shinji Kimura
Dept. of Electronics Engineering
Kobe University Kobe, 657 Japan Abstract Ordered binary decision diagrams [1] are widely used for rep
resenting Boolean functions in various CAD applications. This
paper gives a parallel algorithm for constructing such graphs and
describes the performance of this algorithm on a 16 processor En
core Multimax. The execution statistics that we have obtained
for a number of examples show that our algorithm achieves a
high degree of parallelism. In particular, with ﬁfteen processors
our algorithm is almost an order of magnitude faster on some
examples than the program described in When we construct
a binary decision graph, our parallel algorithm follows the syn
tactic structure of the Boolean formula. First, the level of each
Boolean operation is determined. Operations in the same level
can be performed in parallel. If there are few operations at some
level, then these operations are divided into a sequence of sub
operations that can be processed in parallel. 1 Introduction The ordered binary decision diagram [1] is an acyclic graph rep—
resentation for Boolean functions. Because this representation
provides a canonical form (i.e. two functions are logically equiv
alent if and only if they have the same form) and is quite succinct
in most cases, it has become widely used in CAD applications.
However, the construction of binary decision diagrams for certain
large or particularly complex Boolean functions can be very time
consuming. Consequently, it is important to ﬁnd ways of speed
ing up the construction process. This paper describes a parallel
algorithm for this task. The algorithm has been implemented on
a 16 processor Encore Multimax and tested on several standard
examples. Our approach to binary decision diagrams uses some simple
ideas from ﬁnite automata theory. An nargument Boolean func
tion can identified with the set of Boolean vectors that make it
true. If we associate a Boolean vector as a string, then f can be
represented by a ﬁnite set of strings. Since all finite languages are
regular, there is the minimal ﬁnite automaton that accepts the
set. This automaton provides a canonical representation for the
original Boolean function. Logical operations on Boolean func
tions can be implemented by set operations on the languages ac
cepted by the ﬁnite automata, and standard constructions from
elementary automata theory can be used to build the binary de—
cision diagram for the result of logical operations. In the construction of a binary decision diagram corresponding
to a. Boolean function, a parse tree of the function is used, where 0The second author was partially supported by NSF grant CCR»877226»33
and by the Defense Advanced Research Projects Agency ARPA Order No.
4976. CH2909O/90/0000/0220S01.00 © 1990 IEEE Edmund M. Clarke
School of Computer Science Carnegie Mellon University
Pittsburgh, PA 15213 leaf nodes correspond to input variables, and nonleaf nodes cor— respond to Boolean operations. The level of each node is deﬁned
from leaf nodes to the top of the tree, and operations at the
same level are performed in parallel. If there are only a few op
erations in some level, these operations are divided into several
suboperations to extract additional parallelism. 2 Binary Decision Diagrams We start with some simple deﬁnitions on ﬁnite automata and
binary decision diagrams. A string is a sequence of symbols over
some alphabet Z. In this paper, the alphabet will always be
2 = {0,1}, where 0 represents False and 1 represents True. The
length of a string is the number of symbols in the string. A ﬁnite automaton M is a 5tuple (Q, )3, 45, qo, F), where Q is
a ﬁnite set of states, 2 is the alphabet for strings, 6 is the state
transition function from Q X E to Q, 110 is the initial state in Q,
and F is a set of ﬁnal states in Q. M accepts a string alumna"
where each a; E E if and only if there exists a sequence of states
qo,q1,...,q,, such that q, = 6(q,_1,a,) and (1,, e F. The set of
strings accepted by lid is called the language of M and will be
denoted by For examples M =({qo.q1,qzyq3,q4.qs,L},{0.1}.6,qo.{q5})
accepts {010, 110, 111}, where 6 is deﬁned as 6(qg,0) : (11,
6010.1) = 412, 6(qii0) =J—i 6011.1) = as, 5(qziﬂ) =i, 6(q2,1) =
q4, 6013.0) = qs, 6013.1) =L. 601470) = qs. 6014.1) = qs,
6(q5,0) :1, 6(q5,1) =l, 6(l,0) :l, and 6(1,1) =1. 1 is
called a sink state. The representation of 6 as a directed graph
is shown in Figure 1. The sink state is not shown in the ﬁgure
for simplicity. A Boolean function f with n—variables is a function from
{0,1}" to {0,1}. The set of elements in {0, 1}" for which f
is 1 can be used to represent f. If we associate the 7i~tuple (a1,
a2, ..., (1,.) with the string alaz...a,,, then each set of n~tuples
from {0, 1}’L will correspond to a set of strings over S = {0,1}
with length n. This correspondence allows us to associate a ﬁnite
language contained in 2” = {0, 1}" with each n variable Boolean
function f. Since all ﬁnite languages are regular, there is the minimal ﬁnite automaton accepting the language corresponding
to f. The minimal automaton provides a canonical form for f:
two n—variable Boolean functions will have the same minimal au»
tomaton if and only if they are logically equivalent. Since each
node in the statetransition graph for a Boolean function will
have at most two successors (one for each value of )3), we can
View this graph as a binary decision diagram for the function. For example, a binary decision diagram in Figure 1 represents
f(z],zg,x3) : (all A I2 /\ 13) V (r; A $2), the one in Figure 2
represents fu (11, ...,r,,) = 1 for all inputs, and the one in Figure 3
represents f(:t1,...,r,,) : xi. (Ii—2
(16—1 0’ 1
Fig.1 A binary decision diagram q,‘ 1
accepting {010,110, 111}. 0 1
qi+1 ’ [10
i0, 1
(11 (171—1
:0, 1
9n Fig.2 A binary decision diagrams
accepting all strings. lIn—l '
:0, l
qn Fig.3 A binary decision diagram
corresponding to (Cg. 3 Boolean Operations Lat M1 =(Q11{011}7615q09 F1) and M2 =(Q'27{071}:62r 93:
F2) be the binary decision diagrams for two nvariable Boolean
functions fl and f2, 11 be the sink state in Q1, and 1.2 be the sink state in Q2. We consider the AND operation ﬁrst. The set of strings over
{0, 1} that satisfy f1 A f2 corresponds to the intersection of sets
accepted by M1 and M2. The standard construction of a ﬁnite
automaton M that accepts L(M1) ﬂ L(M2) may be used in this
case. M =(Q1XQ2 U {L}, {0, 1}, 6A, (113,43), F1 X F2), where J.
denotes the sink state for the product automaton. 6A is deﬁned as
M01142). 0) = (61011, 0), 62((12sal) if51(‘117 a) 75 L1 and 62012,“)
75 .L2, and J. otherwise. The OR operation is similar. The OR of two Boolean func
tions represented by M1 and M2 corresponds to the union of sets
accepted by M1 and M2. The standard construction for such an
M can also be used in this case. M = (Q1 X Q2 U {i}, {0,
1}, 6V, (q5,q3), (F1 X Q2) U (Q1 >< F2)), where 6V is deﬁned as
6v((111,112l» 0) = (51011.11); 62(Q2aall if 51(9110l ?‘é —L1 01' 452((12ﬂl)
75 11, and J. otherwise. The NOT operation corresponds to the set diﬂ'erence. Let U be
the set of all strings with length n, then U — L(M1) corresponds
to the negation of the Boolean function represented by M1. A
ﬁnite automaton accepting U — L(M1) can be constructed from
M; = (Qu, {0, 1}, 6g, 45’, Pg) and M1 as M = ((20 x Q1 u
{1.},{0, 1},6..,(qg,q(1)),Fu X (Q1  H», where 6., is defined in
the same manner as for the 0R operation. The EXOR operation
63 is also similar to the OR operation. The ﬁnite automaton for
this operation is given by M = (Q1 X Q2 U {L}, {0, 1}, 669,
((1,1),(13), F] X (Q2 — F2) U ((21  F1) X F2), where 56, is deﬁned
in the same manner as for the OR operation. Note that determining the state set of the ﬁnite automaton
for each of these four operations involves a product construction
M1 >< M2. Also note, that in each case the resulting automaton
M may not be minimal, even if both M1 and M2 are minimal.
Consequently, a ﬁnal minimization stage is needed. 4 Product Automaton Generation In generating the product automaton for the result of some two
argument Boolean operation applied to M1 and M2, the initial 221 M1 n M2
(Product machine) Ml n M2
(Minimized) Fig.5 Product generation and minimization. product state is given by (r13, :13) where (16 is the initial state of M1
and q3 is that of M2. The successors of this state are determined
for the inputs 0 and 1, and this process is repeated until no new
state pairs are generated. The process is shown in Figure 4. Let the initial pair be ((16,113);
Put the pair in the queue S, and allocate a new state for it;
While (5 is not empty) Do Begin
Dequeue a pair ((11,112) from S;
For symbol a 6 {0,1} Do Begin
Compute (61(qiiala52(qzyﬂ));
If this pair is new, then
add the pair to S, and allocate a new state.
Connect the wedge from a state
corresponding to (q1,q2) t0
the state corresponding to (61(q1,a),62(q2,a));
End;
End; Fig.4 Construction of the product automaton. Note that there are only two places where we need to take into
account the types of the Boolean operation: the computation
0f (51(4170),52(¢1270)) and (51(9171)752(¢1211)) The most time
consuming part of this procedure is deciding whether a pair is
new or not. By using a hash table with chaining, we can make
this test take essentially constant time. The hash function that
we use is given by hash(q1,q2) = m0d(q1 >1: (hash_size/2) + :12, hashﬁize), where (11 and q2 are integer values for the state pointers. An example illustrating this phase is shown in Figure 5, Where
the intersection of MI and M2 is computed (corresponding to
the AND operation in the original formula). M1 corresponds
to (le A 1:3) V 2:}, and M2 corresponds to (pm A 2:2 A 9:3) V
(owl AIx2) V (11 A12) V (221 Awa:2Au273). The result of the AND
operation is (or; A I:c2 A n13) V(131 A nx2 A oz3) V ($1 A 2:2). In
the example, states are generated in the order q“, 1115, ..., (121. 5 Minimization After the product generation phase, we must minimize the result
ing automaton. 1n the minimization phase, states are processed 6) level 3
/\
® ® level2
/\
(9 Q) 6\ 1......
/\ éeeoéovw Fig.7 Levels of Boolean operations for
(1:1 A 172) V (x3 A 14) V (izl /\ 9:4). starting at bottom level working upward, since the determination
of whether two states should be merged into an equivalence class
is based on the equivalence of their successor states. First, the
ﬁnal states (the bottom level nodes) are processed. Next, the
states which have an edge to the ﬁnal state are processed, and
so on. Thus the order in which the states are processed in this
phase is the reverse of the order in which they were generated
during the product phase. For the product automaton in Figure 5, the states are pro
cessed in the order of q21, (12o, ..., A114 In the following, the edge
pair of q denotes the orderedpair (6(q,0), 6(q, At ﬁrst, q21 is
processed and is registered as a unique ﬁnal state. Next, 4120 is
processed and is registered as a unique state, since the edgepair
(qgl, £121) of qgo is unique. qlg is also registered as unique. It is
impossible to reach the ﬁnal state from 1113, thus (113 is deleted. q” is marked as the same as Q19, since the edgepair of these
states are the same. (116, (115 and q” are also processed, and a
minimal ﬁnite automaton as shown in Figure 5 is obtained. The minimization algorithm is summarized in Figure 6. The
same hash function is used as in the product generation phase.
To reduce the memory consumption, we keep a global binary
decision diagram whose states represent equivalence classes of
states of the reduced automaton. For each state of the product automaton,
starting at the bottom and working upward, do Begin
Check whether the state has already been registered
as a global state;
If the state is new, then register the state
as a global state;
Otherwise, mark the state as previously registered, and store a pointer to the corresponding global state;
End; Fig.6 Minimization algorithm. 6 Parallel Implementation We now describe how the basic algorithm outlined in the previous
section can be implemented on a shared memory multiprocessor.
To illustrate the procedure we consider the following example:
f($1,f02.$39$4)=(11 /\ $2) V (33 /\ 934) V (‘151 /\ $4)
The ﬁrst step is to determine the level of each node in the parse
tree for the formula (see Figure 7). The leaf nodes of the tree
are input variables; the nonleaf nodes correspond to the Boolean 222 Product Construction Phase
Processor Minimization Phase
P1 4114
A
915 (116 Fig.8 Decomposition of an operation. operations that occur in the formula. The level of each node is
determined by the rule: 1. The level of an input variable is 0.
2. The level of a nonleaf node is maz(11,l2) + 1,
where [1 and I; are levels of its operands. Since we initially generate binary decision diagrams for input
variables, we can process operations at level 1 immediately. Af
ter the level 1 operations have been completed, we can process
Boolean operations at level 2, and so on. In general, we can
process level i nodes as soon as the level i —. 1 nodes have been
completed. Operations at the same level in the tree can be per—
formed in parallel, since they do not conﬂict. Some levels have only a few operations that can be performed
in parallel. We divide operations on such levels into several sub—
operations so that there will not be as many idle processors. The
method is as follows. In the product generation and minimization phase correspond
ing to a. Boolean operation, the 0 and 1successors of the initial
pair (qé, (13) are generated. Then the product generation and
minimization are done for these two successors. After the mini
mization for these two successors is performed, the minimization
of the root state corresponding to the initial pair is done. Thus
the product and minimization phase for each of these two suc
cessors (the 0 and lsuccessors of (qé, (1%)) can be performed
in parallel. Note that the minimization phase guarantees the
uniqueness of global states. An example of this procedure is shown in Figure 8. First,
processor P1 expands the 0 and 1—successors of the initial pair.
Processor P2 takes the 0successor (qz, qg), generates the product
automaton and minimizes this automaton. Processor P3 takes
the 1successor and does the same thing. After P2 and P3 have
completed the minimization phase for their product automata
processor P1 minimizes q14. If, in the example, we compute the 00, 01—, 10 and 11
successors of the initial pair, then the original operation can be
divided to four parts with three merges. In a similar manner we
can divide a single operation into 8 parts, 16 parts, etc. In the implementation, all processors execute the same pro
gram: 1 Table 1 Evaluation of multiplier examples on Multimax. I 4.4 sec 12.9 sec 47.4 sec 196.2 sec
3.7 sec 10.9 sec 38.4 sec 163.2 sec
3.0 sec 9.2 sec 32.4 sec 9 12 o rocessors I I rocessors 15 rocessors take one operation;
wait until the operands have been calculated;
do the operation: product generation & minimization; Operations (including divide and merge operations) are ordered
from smaller levels to larger levels. 7 Performance Evaluation Our program for constructing binary decision diagrams is im
plemented in C and uses the C—threads package [4] for parallel
programming under the Mach operating system. Interlocks are
used for process synchronization instead of general semaphores
in order to avoid the expense associated with system calls. The
program is organized so that locks are only needed for the global
hash table and the global tree nodes. Consequently contention
for shared memory is light. The performance statistics that we
describe below were obtained for an Encore Multimax with 16
processors and 96 megabytes of shared memory. Each proces
sor is a National Semiconductor 32332 and is rated at roughly 2
MIPS. Two dimensional adder array multipliers were used to evaluate
the program since the binary decision diagrams for these circuits
are known to grow quite rapidly (exponentially in the size of the
operands, in fact). Table 1 shows the execution time to construct
binary decision diagrams for multipliers with 7 to 10 bits (14 to
20 Boolean variables). In the evaluation, a hash table with 8191
entries is used for the product generation, and a hash table with
32727 entries is used for the minimization. The table shows that the minimum execution time on the Mul
timax with 15 processors is about Illtimes smaller than the exe
cution time with a single processor. The time for a single proces
sor is roughly the same as the (sequential) program for construct
ing binary decision diagrams that is described in A graph in
Figure 9 shows how the execution time varies with the number of
processors for 10 bit multiplier. The execution time is in reverse
ratio with the number of processors. Figure 9 also shows the
rate of speedup (= (the execution time using 1 processor) / (the
execution time using it processors». The rate is almost linear
with the number of processors. Other examples show almost the
same graphs. 8 Summary and Future Research This paper describes a parallel algorithm for constructing binary l.
e
o e
a 1000 Execution time for 10bit multiplier example. Execution 11m. (ole.
S
a decision diagrams. The algorithm treats binary decision graphs
as minimal ﬁnite automata. The automaton for a Boolean func
tion with AND as its main operation (0R operation) is obtained
by forming the intersection (union) of the regular sets associated
with its operands. The union and intersection operations are im
plemented by a product construction on the minimal automata
for the regular sets. After each product construction step the
automaton must be reminimized. The parallel algorithm is designed so that it is possible to ﬁnd
the minimal representations for several Boolean operations in
parallel. The level of each operation is determined. Operations
at the same level can be performed in parallel Without any com
munication between processors. If there are relatively few oper—
ations in one level, then we divide the product generation step
into several suboperations and merge the results. Preliminary
experiments show that our parallel algorithm is roughly 10 times
faster than with a single processor. We plan to use this algorithm as part of a veriﬁcation system
for ﬁnite state concurrent systems (hardware controllers, commu
nications protocols, etc.) that uses a technique called Symbolic
Model Checking [2, 3]. Since constructing binary decision dia
grams is the most time consuming part of the veriﬁcation proce
dure, we should be able to handle even larger ﬁnite state systems
in the future. References [l] Randal E. Bryant. GraphBased Algorithms for Boolean
Function Manipulation. IEEE Transactions on Computers,
C35(8):677—691, August 1986. J. R. Burch, E. M. Clarke, K. L. McMillan, and D. L. Dill. Sequential Circuit Veriﬁcation Using Symbolic Model Check
ing. In Proceedings of Design Automation Conf., 1990. J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and
J. Hwang. Symbolic Model Checking: 1020 States and Be»
yond. In Proceedings of Logic in Computer Science, 1990. E. C. Cooper. C threads. Technical Report CMUCSSB
154, Carnegie Mellon University, Pittsburgh, PA 15213, J une
1988. Allan L. Fisher and Randal E. Bryant. Performance of C05 MOS on The IFIF Workshop Benchmarkes. In Proceedings
of IMEC Conference, 1989. [2] l3] [4] 10 mo , 600 400 zoo o 2 4 s s 10 i2 1i 16
Numbers! Processor:
Spendup rare for 10bit mullile example. 10 12 14
Nurrbor oi Proclqu B 2 4 6 I 16 Fig.9 Execution time and speed—up rate for 10 bit multiplier. 223 ...
View
Full
Document
This note was uploaded on 02/29/2012 for the course CS 15453 taught by Professor Edmundm.clarke during the Spring '09 term at Carnegie Mellon.
 Spring '09
 EdmundM.Clarke

Click to edit the document details