A parallel algorithm for constructing binary decision diagrams

A parallel algorithm for constructing binary decision diagrams

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 4
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: A Parallel Algorithm for Constructing Binary Decision Diagrams Shinji Kimura Dept. of Electronics Engineering Kobe University Kobe, 657 Japan Abstract Ordered binary decision diagrams [1] are widely used for rep- resenting Boolean functions in various CAD applications. This paper gives a parallel algorithm for constructing such graphs and describes the performance of this algorithm on a 16 processor En- core Multimax. The execution statistics that we have obtained for a number of examples show that our algorithm achieves a high degree of parallelism. In particular, with fifteen processors our algorithm is almost an order of magnitude faster on some examples than the program described in When we construct a binary decision graph, our parallel algorithm follows the syn tactic structure of the Boolean formula. First, the level of each Boolean operation is determined. Operations in the same level can be performed in parallel. If there are few operations at some level, then these operations are divided into a sequence of sub- operations that can be processed in parallel. 1 Introduction The ordered binary decision diagram [1] is an acyclic graph rep— resentation for Boolean functions. Because this representation provides a canonical form (i.e. two functions are logically equiv- alent if and only if they have the same form) and is quite succinct in most cases, it has become widely used in CAD applications. However, the construction of binary decision diagrams for certain large or particularly complex Boolean functions can be very time consuming. Consequently, it is important to find ways of speed- ing up the construction process. This paper describes a parallel algorithm for this task. The algorithm has been implemented on a 16 processor Encore Multimax and tested on several standard examples. Our approach to binary decision diagrams uses some simple ideas from finite automata theory. An n-argument Boolean func- tion can identified with the set of Boolean vectors that make it true. If we associate a Boolean vector as a string, then f can be represented by a finite set of strings. Since all finite languages are regular, there is the minimal finite automaton that accepts the set. This automaton provides a canonical representation for the original Boolean function. Logical operations on Boolean func- tions can be implemented by set operations on the languages ac- cepted by the finite automata, and standard constructions from elementary automata theory can be used to build the binary de— cision diagram for the result of logical operations. In the construction of a binary decision diagram corresponding to a. Boolean function, a parse tree of the function is used, where 0The second author was partially supported by NSF grant CCR»877226»33 and by the Defense Advanced Research Projects Agency ARPA Order No. 4976. CH2909-O/90/0000/0220S01.00 © 1990 IEEE Edmund M. Clarke School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 leaf nodes correspond to input variables, and non-leaf nodes cor— respond to Boolean operations. The level of each node is defined from leaf nodes to the top of the tree, and operations at the same level are performed in parallel. If there are only a few op- erations in some level, these operations are divided into several sub-operations to extract additional parallelism. 2 Binary Decision Diagrams We start with some simple definitions on finite automata and binary decision diagrams. A string is a sequence of symbols over some alphabet Z. In this paper, the alphabet will always be 2 = {0,1}, where 0 represents False and 1 represents True. The length of a string is the number of symbols in the string. A finite automaton M is a 5-tuple (Q, )3, 45, qo, F), where Q is a finite set of states, 2 is the alphabet for strings, 6 is the state transition function from Q X E to Q, 110 is the initial state in Q, and F is a set of final states in Q. M accepts a string alumna" where each a; E E if and only if there exists a sequence of states qo,q1,...,q,, such that q,- = 6(q,-_1,a,-) and (1,, e F. The set of strings accepted by lid is called the language of M and will be denoted by For examples M =({qo.q1,qzyq3,q4.qs,L},{0.1}.6,qo.{q5}) accepts {010, 110, 111}, where 6 is defined as 6(qg,0) : (11, 6010.1) = 412, 6(qii0) =J—i 6011.1) = as, 5(qzifl) =i, 6(q2,1) = q4, 6013.0) = qs, 6013.1) =-L. 601470) = qs. 6014.1) = qs, 6(q5,0) :1, 6(q5,1) =l, 6(l,0) :l, and 6(1,1) =1. 1 is called a sink state. The representation of 6 as a directed graph is shown in Figure 1. The sink state is not shown in the figure for simplicity. A Boolean function f with n—variables is a function from {0,1}" to {0,1}. The set of elements in {0, 1}" for which f is 1 can be used to represent f. If we associate the 7i~tuple (a1, a2, ..., (1,.) with the string alaz...a,,, then each set of n~tuples from {0, 1}’L will correspond to a set of strings over S = {0,1} with length n. This correspondence allows us to associate a finite language contained in 2” = {0, 1}" with each n variable Boolean function f. Since all finite languages are regular, there is the minimal finite automaton accepting the language corresponding to f. The minimal automaton provides a canonical form for f: two n—variable Boolean functions will have the same minimal au» tomaton if and only if they are logically equivalent. Since each node in the state-transition graph for a Boolean function will have at most two successors (one for each value of )3), we can View this graph as a binary decision diagram for the function. For example, a binary decision diagram in Figure 1 represents f(z],zg,x3) : (all A I2 /\ 13) V (r; A $2), the one in Figure 2 represents fu (11, ...,r,,) = 1 for all inputs, and the one in Figure 3 represents f(:t1,...,r,,) : xi. (Ii—2 (16—1 0’ 1 Fig.1 A binary decision diagram q,‘ 1 accepting {010,110, 111}. 0 1 qi+1 ’ [10 i0, 1 (11 (171—1 :0, 1 9n Fig.2 A binary decision diagrams accepting all strings. lIn—l ' :0, l qn Fig.3 A binary decision diagram corresponding to (Cg. 3 Boolean Operations Lat M1 =(Q11{011}7615q09 F1) and M2 =(Q'27{071}:62r 93: F2) be the binary decision diagrams for two n-variable Boolean functions fl and f2, 11 be the sink state in Q1, and 1.2 be the sink state in Q2. We consider the AND operation first. The set of strings over {0, 1} that satisfy f1 A f2 corresponds to the intersection of sets accepted by M1 and M2. The standard construction of a finite automaton M that accepts L(M1) fl L(M2) may be used in this case. M =(Q1XQ2 U {L}, {0, 1}, 6A, (113,43), F1 X F2), where J. denotes the sink state for the product automaton. 6A is defined as M01142). 0) = (61011, 0), 62((12sal) if51(‘117 a) 75 L1 and 62012,“) 75 .L2, and J. otherwise. The OR operation is similar. The OR of two Boolean func- tions represented by M1 and M2 corresponds to the union of sets accepted by M1 and M2. The standard construction for such an M can also be used in this case. M = (Q1 X Q2 U {i}, {0, 1}, 6V, (q5,q3), (F1 X Q2) U (Q1 >< F2)), where 6V is defined as 6v((111,112l» 0) = (51011.11); 62(Q2aall if 51(9110l ?‘é —L1 01' 452((12fll) 75 11, and J. otherwise. The NOT operation corresponds to the set difl'erence. Let U be the set of all strings with length n, then U — L(M1) corresponds to the negation of the Boolean function represented by M1. A finite automaton accepting U — L(M1) can be constructed from M; = (Qu, {0, 1}, 6g, 45’, Pg) and M1 as M = ((20 x Q1 u {1.},{0, 1},6..,(qg,q(1)),Fu X (Q1 - H», where 6., is defined in the same manner as for the 0R operation. The EXOR operation 63 is also similar to the OR operation. The finite automaton for this operation is given by M = (Q1 X Q2 U {L}, {0, 1}, 669, ((1,1),(13), F] X (Q2 — F2) U ((21 - F1) X F2), where 56, is defined in the same manner as for the OR operation. Note that determining the state set of the finite automaton for each of these four operations involves a product construction M1 >< M2. Also note, that in each case the resulting automaton M may not be minimal, even if both M1 and M2 are minimal. Consequently, a final minimization stage is needed. 4 Product Automaton Generation In generating the product automaton for the result of some two- argument Boolean operation applied to M1 and M2, the initial 221 M1 n M2 (Product machine) Ml n M2 (Minimized) Fig.5 Product generation and minimization. product state is given by (r13, :13) where (16 is the initial state of M1 and q3 is that of M2. The successors of this state are determined for the inputs 0 and 1, and this process is repeated until no new state pairs are generated. The process is shown in Figure 4. Let the initial pair be ((16,113); Put the pair in the queue S, and allocate a new state for it; While (5 is not empty) Do Begin Dequeue a pair ((11,112) from S; For symbol a 6 {0,1} Do Begin Compute (61(qiiala52(qzyfl)); If this pair is new, then add the pair to S, and allocate a new state. Connect the wedge from a state corresponding to (q1,q2) t0 the state corresponding to (61(q1,a),62(q2,a)); End; End; Fig.4 Construction of the product automaton. Note that there are only two places where we need to take into account the types of the Boolean operation: the computation 0f (51(4170),52(¢1270)) and (51(9171)752(¢1211))- The most time- consuming part of this procedure is deciding whether a pair is new or not. By using a hash table with chaining, we can make this test take essentially constant time. The hash function that we use is given by hash(q1,q2) = m0d(q1 >1: (hash_size/2) + :12, hashfiize), where (11 and q2 are integer values for the state pointers. An example illustrating this phase is shown in Figure 5, Where the intersection of MI and M2 is computed (corresponding to the AND operation in the original formula). M1 corresponds to (-le A 1:3) V 2:}, and M2 corresponds to (pm A 2:2 A 9:3) V (owl A-Ix2) V (11 A12) V (221 A-wa:2A-u273). The result of the AND operation is (or; A -I:c2 A n13) V(131 A nx2 A oz3) V ($1 A 2:2). In the example, states are generated in the order q“, 1115, ..., (121. 5 Minimization After the product generation phase, we must minimize the result- ing automaton. 1n the minimization phase, states are processed 6) level 3 /\ ® ® level2 /\ (9 Q) 6\ 1...... /\ éeeoéovw Fig.7 Levels of Boolean operations for (1:1 A 172) V (x3 A 14) V (-izl /\ 9:4). starting at bottom level working upward, since the determination of whether two states should be merged into an equivalence class is based on the equivalence of their successor states. First, the final states (the bottom level nodes) are processed. Next, the states which have an edge to the final state are processed, and so on. Thus the order in which the states are processed in this phase is the reverse of the order in which they were generated during the product phase. For the product automaton in Figure 5, the states are pro- cessed in the order of q21, (12o, ..., A114- In the following, the edge- pair of q denotes the ordered-pair (6(q,0), 6(q, At first, q21 is processed and is registered as a unique final state. Next, 4120 is processed and is registered as a unique state, since the edge-pair (qgl, £121) of qgo is unique. qlg is also registered as unique. It is impossible to reach the final state from 1113, thus (113 is deleted. q” is marked as the same as Q19, since the edge-pair of these states are the same. (116, (115 and q” are also processed, and a minimal finite automaton as shown in Figure 5 is obtained. The minimization algorithm is summarized in Figure 6. The same hash function is used as in the product generation phase. To reduce the memory consumption, we keep a global binary decision diagram whose states represent equivalence classes of states of the reduced automaton. For each state of the product automaton, starting at the bottom and working upward, do Begin Check whether the state has already been registered as a global state; If the state is new, then register the state as a global state; Otherwise, mark the state as previously registered, and store a pointer to the corresponding global state; End; Fig.6 Minimization algorithm. 6 Parallel Implementation We now describe how the basic algorithm outlined in the previous section can be implemented on a shared memory multiprocessor. To illustrate the procedure we consider the following example: f($1,f02.$39$4)=(11 /\ $2) V (33 /\ 934) V (‘151 /\ $4) The first step is to determine the level of each node in the parse tree for the formula (see Figure 7). The leaf nodes of the tree are input variables; the non-leaf nodes correspond to the Boolean 222 Product Construction Phase Processor Minimization Phase P1 4114 A 915 (116 Fig.8 Decomposition of an operation. operations that occur in the formula. The level of each node is determined by the rule: 1. The level of an input variable is 0. 2. The level of a non-leaf node is maz(11,l2) + 1, where [1 and I; are levels of its operands. Since we initially generate binary decision diagrams for input variables, we can process operations at level 1 immediately. Af- ter the level 1 operations have been completed, we can process Boolean operations at level 2, and so on. In general, we can process level i nodes as soon as the level i —. 1 nodes have been completed. Operations at the same level in the tree can be per— formed in parallel, since they do not conflict. Some levels have only a few operations that can be performed in parallel. We divide operations on such levels into several sub— operations so that there will not be as many idle processors. The method is as follows. In the product generation and minimization phase correspond- ing to a. Boolean operation, the 0- and 1-successors of the initial pair (qé, (13) are generated. Then the product generation and minimization are done for these two successors. After the mini- mization for these two successors is performed, the minimization of the root state corresponding to the initial pair is done. Thus the product and minimization phase for each of these two suc- cessors (the 0- and l-successors of (qé, (1%)) can be performed in parallel. Note that the minimization phase guarantees the uniqueness of global states. An example of this procedure is shown in Figure 8. First, processor P1 expands the 0 and 1—successors of the initial pair. Processor P2 takes the 0-successor (qz, qg), generates the product automaton and minimizes this automaton. Processor P3 takes the 1-successor and does the same thing. After P2 and P3 have completed the minimization phase for their product automata processor P1 minimizes q14. If, in the example, we compute the 00, 01—, 10- and 11- successors of the initial pair, then the original operation can be divided to four parts with three merges. In a similar manner we can divide a single operation into 8 parts, 16 parts, etc. In the implementation, all processors execute the same pro- gram: 1 Table 1 Evaluation of multiplier examples on Multimax. I 4.4 sec 12.9 sec 47.4 sec 196.2 sec 3.7 sec 10.9 sec 38.4 sec 163.2 sec 3.0 sec 9.2 sec 32.4 sec 9 12 o rocessors I I rocessors 15 rocessors take one operation; wait until the operands have been calculated; do the operation: product generation & minimization; Operations (including divide and merge operations) are ordered from smaller levels to larger levels. 7 Performance Evaluation Our program for constructing binary decision diagrams is im- plemented in C and uses the C—threads package [4] for parallel programming under the Mach operating system. Interlocks are used for process synchronization instead of general semaphores in order to avoid the expense associated with system calls. The program is organized so that locks are only needed for the global hash table and the global tree nodes. Consequently contention for shared memory is light. The performance statistics that we describe below were obtained for an Encore Multimax with 16 processors and 96 megabytes of shared memory. Each proces- sor is a National Semiconductor 32332 and is rated at roughly 2 MIPS. Two dimensional adder array multipliers were used to evaluate the program since the binary decision diagrams for these circuits are known to grow quite rapidly (exponentially in the size of the operands, in fact). Table 1 shows the execution time to construct binary decision diagrams for multipliers with 7 to 10 bits (14 to 20 Boolean variables). In the evaluation, a hash table with 8191 entries is used for the product generation, and a hash table with 32727 entries is used for the minimization. The table shows that the minimum execution time on the Mul- timax with 15 processors is about Ill-times smaller than the exe- cution time with a single processor. The time for a single proces- sor is roughly the same as the (sequential) program for construct- ing binary decision diagrams that is described in A graph in Figure 9 shows how the execution time varies with the number of processors for 10 bit multiplier. The execution time is in reverse ratio with the number of processors. Figure 9 also shows the rate of speed-up (= (the execution time using 1 processor) / (the execution time using it processors». The rate is almost linear with the number of processors. Other examples show almost the same graphs. 8 Summary and Future Research This paper describes a parallel algorithm for constructing binary l. e o e a 1000 Execution time for 10-bit multiplier example. Execution 11m. (ole. S a decision diagrams. The algorithm treats binary decision graphs as minimal finite automata. The automaton for a Boolean func- tion with AND as its main operation (0R operation) is obtained by forming the intersection (union) of the regular sets associated with its operands. The union and intersection operations are im- plemented by a product construction on the minimal automata for the regular sets. After each product construction step the automaton must be re-minimized. The parallel algorithm is designed so that it is possible to find the minimal representations for several Boolean operations in parallel. The level of each operation is determined. Operations at the same level can be performed in parallel Without any com- munication between processors. If there are relatively few oper— ations in one level, then we divide the product generation step into several sub-operations and merge the results. Preliminary experiments show that our parallel algorithm is roughly 10 times faster than with a single processor. We plan to use this algorithm as part of a verification system for finite state concurrent systems (hardware controllers, commu- nications protocols, etc.) that uses a technique called Symbolic Model Checking [2, 3]. Since constructing binary decision dia- grams is the most time consuming part of the verification proce- dure, we should be able to handle even larger finite state systems in the future. References [l] Randal E. Bryant. Graph-Based Algorithms for Boolean Function Manipulation. IEEE Transactions on Computers, C-35(8):677—691, August 1986. J. R. Burch, E. M. Clarke, K. L. McMillan, and D. L. Dill. Sequential Circuit Verification Using Symbolic Model Check- ing. In Proceedings of Design Automation Conf., 1990. J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and J. Hwang. Symbolic Model Checking: 1020 States and Be» yond. In Proceedings of Logic in Computer Science, 1990. E. C. Cooper. C threads. Technical Report CMU-CS-SB- 154, Carnegie Mellon University, Pittsburgh, PA 15213, J une 1988. Allan L. Fisher and Randal E. Bryant. Performance of C05- MOS on The IFIF Workshop Benchmarkes. In Proceedings of IMEC Conference, 1989. [2] l3] [4] 10 mo , 600 400 zoo o 2 4 s s 10 i2 1i 16 Numbers! Processor: Spend-up rare for 10-bit mullile example. 10 12 14 Nurrbor oi Proclqu B 2 4 6 I 16 Fig.9 Execution time and speed—up rate for 10 bit multiplier. 223 ...
View Full Document

This note was uploaded on 02/29/2012 for the course CS 15-453 taught by Professor Edmundm.clarke during the Spring '09 term at Carnegie Mellon.

Page1 / 4

A parallel algorithm for constructing binary decision diagrams

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online