This preview shows page 1. Sign up to view the full content.
Unformatted text preview: AN APPROXIMATE MINIMUM DEGREE ORDERING ALGORITHM
TIMOTHY A. DAVIS , PATRICK AMESTOYy , AND IAIN S. DUFFz Computer and Information Sciences Dept., University of Florida,
Technical Report TR94039, December, 1994 (revised July 1995).
Abstract. An Approximate Minimum Degree ordering algorithm (AMD) for preordering a symmetric sparse matrix prior to numerical factorization is presented. We use techniques based on
the quotient graph for matrix factorization that allow us to obtain computationally cheap bounds
for the minimum degree. We show that these bounds are often equal to the actual degree. The
resulting algorithm is typically much faster than previous minimum degree ordering algorithms, and
produces results that are comparable in quality with the best orderings from other minimum degree
algorithms. Keywords: approximate minimum degree ordering algorithm, quotient graph,
sparse matrices, graph algorithms, ordering algorithms
AMS classi cations: 65F50, 65F05.
1. Introduction. When solving large sparse symmetric linear systems of the
form Ax = b, it is common to precede the numerical factorization by a symmetric
reordering. This reordering is chosen so that pivoting down the diagonal in order on
the resulting permuted matrix PAPT = LLT produces much less llin and work than
computing the factors of A by pivoting down the diagonal in the original order. This
reordering is computed using only information on the matrix structure without taking
account of numerical values and so may not be stable for general matrices. However,
if the matrix A is positivede nite 21], a Cholesky factorization can safely be used.
This technique of preceding the numerical factorization with a symbolic analysis can
also be extended to unsymmetric systems although the numerical factorization phase
must allow for subsequent numerical pivoting 1, 2, 16]. The goal of the preordering
is to nd a permutation matrix P so that the subsequent factorization has the least
llin. Unfortunately, this problem is NPcomplete 31], so heuristics are used.
The minimum degree ordering algorithm is one of the most widely used heuristics,
since it produces factors with relatively low llin on a wide range of matrices. Because
of this, the algorithm has received much attention over the past three decades. The
algorithm is a symmetric analogue of Markowitz' method 26] and was rst proposed
by Tinney and Walker 30] as algorithm S2. Rose 27, 28] developed a graph theoretical model of Tinney and Walker's algorithm and renamed it the minimum degree
algorithm, since it performs its pivot selection by selecting from a graph a node of
minimum degree. Later implementations have dramatically improved the time and
memory requirements of Tinney and Walker's method, while maintaining the basic
idea of selecting a node or set of nodes of minimum degree. These improvements have
reduced the memory complexity so that the algorithm can operate within the storage
of the original matrix, and have reduced the amount of work needed to keep track of
the degrees of nodes in the graph (which is the most computationally intensive part
Computer and Information Sciences Department University of Florida, Gainesville, Florida,
USA. phone: (904) 3921481, email: davis@cis.u .edu. Support for this project was provided by
the National Science Foundation (ASC9111263 and DMS9223088). Portions of this work were
supported by a postdoctoral grant from CERFACS.
y ENSEEIHTIRIT, Toulouse, France. email: amestoy@enseeiht.fr.
z Rutherford Appleton Laboratory, Chilton, Didcot, Oxon. 0X11 0QX England, and European Center for Research and Advanced Training in Scienti c Computation (CERFACS), Toulouse,
France. email: isd@letterbox.rl.ac.uk.
1 2 P. AMESTOY, T. A. DAVIS, , AND I. S. DUFF of the algorithm). This work includes that of Du and Reid 10, 13, 14, 15]; George
and McIntyre 23]; Eisenstat, Gursky, Schultz, and Sherman 17, 18]; George and Liu
19, 20, 21, 22]; and Liu 25]. More recently, several researchers have relaxed this
heuristic by computing upper bounds on the degrees, rather than the exact degrees,
and selecting a node of minimum upper bound on the degree. This work includes
that of Gilbert, Moler, and Schreiber 24], and Davis and Du 7, 8]. Davis and Du
use degree bounds in the unsymmetricpattern multifrontal method (UMFPACK), an
unsymmetric Markowitzstyle algorithm. In this paper, we describe an approximate
minimum degree ordering algorithm based on the symmetric analogue of the degree
bounds used in UMFPACK.
Section 2 presents the original minimum degree algorithm of Tinney and Walker
in the context of the graph model of Rose. Section 3 discusses the quotient graph
(or element graph) model and the use of that model to reduce the time taken by
the algorithm. In this context, we present our notation for the quotient graph, and
present a small example matrix and its graphs. We then use the notation to describe
our approximate degree bounds in Section 4. The Approximate Minimum Degree
(AMD) algorithm and its time complexity is presented in Section 5. In Section 6,
we rst analyse the performance and accuracy of our approximate degree bounds on
a set of test matrices from a wide range of disciplines. The AMD algorithm is then
compared with other established codes that compute minimum degree orderings.
2. Elimination graphs. The nonzero pattern of a symmetric nbyn matrix,
A, can be represented by a graph G0 = (V 0; E0), with nodes V 0 = f1; :::; ng and
edges E 0. An edge (i; j ) is in E 0 if and only if aij 6= 0. Since A is symmetric, G0 is
undirected.
The elimination graph, Gk = (V k ; E k ), describes the nonzero pattern of the submatrix still to be factorized after the rst k pivots have been chosen. It is undirected,
since the matrix remains symmetric as it is factorized. At step k, the graph Gk depends on Gk 1 and the selection of the kth pivot. To nd Gk , the kth pivot node p
is selected from V k 1. Edges are added to E k 1 to make the nodes adjacent to p in
Gk 1 a clique (a fully connected subgraph). This addition of edges ( llin) means that
we cannot know the storage requirements in advance. The edges added correspond to
llin caused by the kth step of factorization. A llin is a nonzero entry Lij , where
(PAPT )ij is zero. The pivot node p and its incident edges are then removed from the
graph Gk 1 to yield the graph Gk . Let AdjG (i) denote the set of nodes adjacent to
i in the graph Gk . Throughout this paper, we will use the superscript k to denote a
graph, set, or other structure obtained after the rst k pivots have been chosen. For
simplicity, we will drop the superscript when the context is clear.
The minimum degree algorithm selects node p as the kth pivot such that the
degree of p, tp jAdjG 1 (p)j, is minimized (where j:::j denotes the size of a set or the
number of nonzeros in a matrix). The minimum degree algorithm is a nonoptimal
greedy heuristic for reducing the number of new edges ( llins) introduced during the
factorization. We have already noted that the optimal solution is NPcomplete 31].
By minimizing the degree, the algorithm minimizes the upper bound on the llin
caused by the kth pivot. Selecting p as pivot creates at most (t2 tp )=2 new edges in
p
G.
3. Quotient graphs. In contrast to the elimination graph, the quotient graph
models the factorization of A using an amount of storage that never exceeds the
storage for the original graph, G0 21]. The quotient graph is also referred to as the
k k APPROXIMATE MINIMUM DEGREE 3 generalized element model 13, 14, 15, 29]. An important component of a quotient
graph is a clique. It is a particularly economic structure since a clique is represented
by a list of its members rather than by a list of all the edges in the clique. Following
the generalized element model, we refer to nodes removed from the elimination graph
as elements (George and Liu refer to them as eliminated nodes). We use the term
variable to refer to uneliminated nodes.
The quotient graph, G k = (V k ; V k ; E k ; Ek ), implicitly represents the elimination
graph Gk , where G 0 = G0. For clarity, we drop the superscript k in the following. The
nodes in G consist of variables (the set V ), and elements (the set V ). The edges are
divided into two sets: edges between variables E V V , and between variables and
elements E V V . There are no edges between elements since they are removed
by element absorption. The sets V 0 and E 0 are empty.
We use the following set notation (A, E , and L) to describe the quotient graph
model and our approximate degree bounds. Let Ai be the set of variables adjacent to
variable i in G , and let Ei be the set of elements adjacent to variable i in G (we refer
to Ei as element list i). That is, if i is a variable in V , then
Ai fj : (i; j ) 2 E g V ; Ei fe : (i; e) 2 E g V ;
and
AdjG (i) Ai Ei V V :
The set Ai refers to a subset of the nonzero entries in row i of the original matrix A
(thus the notation A). That is, A0 fj : aij 6= 0g, and Ak Ak 1, for 1 k n.
i
i
i
Let Le denote the set of variables adjacent to element e in G . That is, if e is an
element in V , then we de ne
Le AdjG (e) = fi : (i; e) 2 E g V :
The edges E and E in the quotient graph are represented explicitly as the sets Ai
and Ei for each variable in G , and the sets Le for each element in G . We will use A, E ,
and L to denote three sets containing all Ai , Ei , and Le, respectively, for all variables
i and all elements e. George and Liu 21] show that the quotient graph takes no more
storage than the original graph (jAk j + jE k j + jLkj jA0 j for all k).
The quotient graph G and the elimination graph G are closely related. If i is a
variable in G, it is also a variable in G , and
(3.1) AdjG (i) = Ai ! e2Ei Le n fig; where the \n" is the standard set subtraction operator.
When variable p is selected as the kth pivot, element p is formed (variable p is removed from V and added to V ). The set Lp = AdjG (p) is found using Equation (3.1).
The set Lp represents a permuted nonzero pattern of the kth column of L (thus the
notation L). If i 2 Lp, where p is the kth pivot, and variable i will become the mth
pivot (for some m > k), then the entry Lmk will be nonzero. Equation (3.1) implies
that Le n fpg Lp for all elements e adjacent to variable p. This means that all variables adjacent to an element e 2 Ep are adjacent to the element p and these elements 4 P. AMESTOY, T. A. DAVIS, , AND I. S. DUFF
Fig. 3.1. Elimination graph, quotient graph, and matrix for rst three steps. G 0 G 2 1 G 2 5 2 G
5 5 9 7 3 6 9 7 10 8 4 1 10 8 3 9 7 3 10 3 8 4 6 4 5
9 7 10 6 8 6
4 (a) Elimination graph 0 1 2 2 2 5 3 2 5 2 5 5 9 7 3 6 9 7 3 6 9 7 3 6 9 7 3 6 10 8 4 1 10 8 4 1 10 8 4 1 10 8 4 1 (b) Quotient graph 1 1
2 1
2 3 3
4 2
3 4
5 1
2 3
4 5
6 6
7 7
8 9
10 6
7 8
9 5
6 7
8 4
5 8
9 10 9
10 10 (c) Factors and active submatrix e 2 Ep are no longer needed. They are absorbed into the new element p and deleted
15], and reference to them is replaced by reference to the new element p. The new
element p is added to the element lists, Ei, for all variables i adjacent to element p.
Absorbed elements, e 2 Ep , are removed from all element lists. The sets Ap and Ep ,
and Le for all e in Ep , are deleted. Finally, any entry j in Ai , where both i and j are
in Lp , is redundant and is deleted. The set Ai is thus disjoint with any set Le for
e 2 Ei . In other words, Ak is the pattern of those entries in row i of A that are not
i
modi ed by steps 1 through k of the Cholesky factorization of PAPT . The net result
is that the new graph G takes the same, or less, storage than before the kth pivot was
selected. 3.1. Quotient graph example. We illustrate the sequence of elimination graphs
and quotient graphs of a 10by10 sparse matrix in Figures 3.1 and 3.2. The example
is ordered so that a minimum degree algorithm recommends pivoting down the diagonal in the natural order (that is, the permutation matrix is the identity). In Figures
3.1 and 3.2, variables and elements are shown as thinlined and heavylined circles,
respectively. In the matrices in these gures, diagonal entries are numbered, unmodi ed original nonzero entries (entries in A) are shown as a solid squares. The solid
squares in row i form the set Ai . The variables in current unabsorbed elements (sets
Le) are indicated by solid circles in the columns of L corresponding to the unabsorbed 5 APPROXIMATE MINIMUM DEGREE
Fig. 3.2. Elimination graph, quotient graph, and matrix for steps 4 to 7. G 4 G 5 G 6 G7 5
9 7 10 8 9 7 10 6 8 9 7 9 10 6 8 10 8 (a) Elimination graph 2 4 5
9 7 8 5 3 10 9 6 5 9 4 8 9 6
4 7 10 7 10 6 8 9 7 10 6 8 (b) Quotient graph 1 1
2 1
2 3 3
4 2
3 4
5 1
2 3
4 5
6 6
7 7
8 9
10 6
7 8
9 5
6 7
8 4
5 8
9 10 9
10 10 (c) Factors and active submatrix elements. The solid circles in row i form the set Ei . Entries that do not correspond
to edges in the quotient graph are shown as an . Figure 3.1 shows the elimination
graph, quotient graph, and the matrix prior to elimination (in the left column) and
after the rst three steps (from left to right). Figure 3.2 continues the example for
the next four steps.
Consider the transformation of the graph G 2 to the graph G 3 . Variable 3 is
selected as pivot. We have L3 = A3 = f5; 6; 7g (a simple case of Equation (3.1)). The
new element 3 represents the pairwise adjacency of variables 5, 6, and 7. The explicit
edge (5,7) is now redundant, and is deleted from A5 and A7 .
Also consider the transformation of the graph G 4 to the graph G 5 . Variable 5 is
selected as pivot. The set A5 is empty and E5 = f2; 3g. Following Equation (3.1), L5 = (A5 L2 L3 ) n f5g
= (; f5; 6; 9g f5; 6; 7g) n f5g
= f6; 7; 9g;
which is the pattern of column 5 of L (excluding the diagonal). Since the new element
5 implies that variables 6, 7, and, 9 are pairwise adjacent, elements 2 and 3 do not
add any information to the graph. They are removed, having been \absorbed" into
element 5. Additionally, the edge (7, 9) is redundant, and is removed from A7 and 6 P. AMESTOY, T. A. DAVIS, , AND I. S. DUFF A9 . In G 4, we have A6 = ;
E6 = f2; 3; 4g
A7 = f9; 10g
E7 = f3; 4g :
A9 = f7; 8; 10g E9 = f2g
After these transformations, we have in G 5 ,
A6 = ;
E6 = f4; 5g
A7 = f10g
E7 = f4; 5g ;
A9 = f8; 10g E9 = f5g
and the new element in G 5 ,
L5 = f6; 7; 9g:
3.2. Indistinguishable variables and external degree. Two variables i and
j are indistinguishable in G if AdjG (i) fig = AdjG (j ) fj g. They will have the same degree until one is selected as pivot. If i is selected, then j can be selected
next without causing any additional llin. Selecting i and j together is called mass
elimination 23]. Variables i and j are replaced in G by a supervariable containing
both i and j , labeled by its principal variable (i, say) 13, 14, 15]. Variables that
are not supervariables are called simple variables. In practice, new supervariables are
constructed at step k only if both i and j are in Lp (where p is the pivot selected at
step k). In addition, rather than checking the graph G for indistinguishability, we use
the quotient graph G so that two variables i and j are found to be indistinguishable
if AdjG (i) fig = AdjG (j ) fj g. This comparison is faster than determining if
two variables are indistinguishable in G, but may miss some identi cations because,
although indistinguishability in G implies indistinguishability in G, the reverse is not
true.
We denote the set of simple variables in the supervariable with principal variable
i as i, and de ne i = fig if i is a simple variable. When p is selected as pivot at the
kth step, all variables in p are eliminated. The use of supervariables greatly reduces
the number of degree computations performed, which is the most costly part of the
algorithm. Nonprincipal variables and their incident edges are removed from the
quotient graph data structure when they are detected. The set notation A and L refers
either to a set of supervariables or to the variables represented by the supervariables,
depending on the context. In degree computations and when used in representing
elimination graphs, the sets refer to variables; otherwise they refer to supervariables.
In Figure 3.2, detected supervariables are circled by dashed lines. Nonprincipal
variables are left inside the dashed supervariables. These are, however, removed from
the quotient graph. The last quotient graph in Figure 3.2 represents the selection of
pivots 7, 8, and 9, and thus the right column of the gure depicts G7, G 9 , and the
matrix after the ninth pivot step.
The external degree di ti jij + 1 of a principal variable i is
(3.2) di = jAdjG (i) n ij = jAi n ij + ! e2Ei Le n i ; since the set Ai is disjoint from any set Le for e 2 Ei . At most (d2 di)=2 llins occur
i
if all variables in i are selected as pivots. We refer to ti as the true degree of variable i.
Selecting the pivot with minimum external degree tends to produce a better ordering
than selecting the pivot with minimum true degree 25] (also see Section 6.2). APPROXIMATE MINIMUM DEGREE 7 Algorithm 1 (Minimum degree algorithm, based on quotient graph)
V = f1:::ng
V =;
for i = 1 to n do
Ai = fj : aij = 0 and i = j g
6
6
Ei = ;
di = jAi j
i = fig
end for
k=1
while k n do
mass elimination:
select variable p 2 V that minimizes dp
Lp = Ap Se2E Le n p
for each i 2 Lp do
remove redundant entries:
Ai = (Ai n Lp ) n p
element absorption:
Ei = (Ei n Ep ) fpg
compute externalSdegree:
di = jAi n ij + e2E Le n i
end for
supervariable detection, pairs found via hash function:
for each pair i and j 2 Lp do
if i and j are indistinguishable then
remove the supervariable j:
i=i j
di = di jjj
V = V n fj g
Aj = ;
Ej = ;
end if
end for
p i convert variable p to element p: V = (V fpg) n Ep
V = V n fpg
Ap = ;
Ep = ;
k = k + jpj end while 3.3. Quotientgraphbased minimum degree algorithm. A minimum degree algorithm based on the quotient graph is shown in Algorithm 1. It includes
element absorption, mass elimination, supervariables, and external degrees. Supervariable detection is simpli ed by computing a hash function on each variable, so that
not all pairs of variables need be compared 3]. Algorithm 1 does not include two 8 P. AMESTOY, T. A. DAVIS, , AND I. S. DUFF important features of Liu's Multiple Minimum Degree algorithm (MMD): incomplete
update 17, 18] and multiple elimination 25]. With multiple elimination, an independent set of pivots with minimum degree is selected before any degrees are updated. If
a variable is adjacent to two or more pivot elements, its degree is computed only once.
A variable j is outmatched if AdjG (i) AdjG (j ). With incomplete degree update,
the degree update of the outmatched variable j is avoided until variable i is selected
as pivot. These two features further reduce the amount of work needed for the degree
computation in MMD. We will discuss their relationship to the AMD algorithm in
the next section.
The time taken to compute di using Equation (3.2) by a quotientgraphbased
minimum degree algorithm is
(jAi j + (3.3) X e2Ei jLej); which is (jAdjG (i)j) if all variables are simple.1 This degree computation is the
most costly part of the minimum degree algorithm. When supervariables are present,
the time taken is in the best case proportional to the degree of the variable in the
\compressed" elimination graph, where all nonprincipal variables and their incident
edges are removed.
4. Approximate degree. Having now discussed the data structures and the
standard minimum degree implementations, we now consider our approximation for
the minimum degree and indicate its lower complexity.
We assume that p is the kth pivot, and that we compute the bounds only for
supervariables i 2 Lp. Rather than computing the exact external degree, di , our
Approximate Minimum Degree algorithm (AMD) computes an upper bound 7, 8],
k (4.1) 8 n k;
9
> k1
>
< di + jLp n ij;
=
k
X
:
di = min >
jAi n ij + jLp n ij +
jLe n Lp j >
:
;
e2E nfpg
i The rst two terms (n k, the size of the active submatrix, and dk 1 + jLp n ij,
i
the worst case llin) are usually not as tight as the third term in Equation (4.1).
Algorithm 2 computes jLe n Lp j for all elements e in the entire quotient graph. The
set Le splits into two disjoint subsets: the external subset Le n Lp and the internal
subset Le \ Lp . If Algorithm 2 scans element e, the term w(e) is initialized to jLej
and then decremented once for each variable i in the internal subset Le \ Lp , and, at
the end of Algorithm 2, we have w(e) = jLej jLe \ Lp j = jLe n Lpj. If Algorithm 2
does not scan element e, the term w(e) is less than zero. Combining these two cases,
we obtain
w
if w(e)
(4.2)
jLe n Lp j = jL(eej) otherwise 0 ; for all e 2 V :
1 Asymptotic complexity notation is de ned in 6]. We write f (n) =
(g(n)) if there exist
positive constants c1 , c2 , and n0 such that 0 c1 g(n) f (n) c2 g(n) for all n > n0 . Similarly,
f (n) = (g(n)) if there exist positive constants c and n0 such that 0 cg(n) f (n) for all n > n0 ;
and f (n) = O(g(n)) if there exist positive constants c and n0 such that 0 f (n) cg(n) for all
n > n0 . APPROXIMATE MINIMUM DEGREE 9 Algorithm 2 (Computation of jLe n Lp j for all e 2 V )
assume w(1 : : :n) < 0
for each supervariable i 2 Lp do
for each element e 2 Ei do
if (w(e) < 0) then w(e) = jLej
w(e) = w(e) jij end for
end for Algorithm 2 is followed by a second loop to compute our upper bound degree, di
for each supervariable i 2 Lp , using Equations (4.1) and (4.2). The total time for
Algorithm 2 is
( X i2L jEij): p The second loop to compute the upper bound degrees takes time
(4.3) ( X i2L (jAij + jEij)); p which is thus equal to the total asymptotic time.
Multiple elimination 25] improves the minimumdegree algorithm by updating the
degree of a variable only once for each set of independent pivots. Incomplete degree
update 17, 18] skips the degree update of outmatched variables. We cannot take full
advantage of the incomplete degree update since it avoids the degree update for some
supervariables adjacent to the pivot element. With our technique (Algorithm 2), we
must scan the element lists for all supervariables i in Lp . If the degree update of one
of the supervariables is to be skipped, its element list must still be scanned so that the
external subset terms can be computed for the degree update of other supervariables
in Lp . The only advantage of multiple elimination or incomplete degree update would
be to skip the second loop that computes the upper bound degrees for outmatched
variables or supervariables for which the degree has already been computed.
If the total time in Equation (4.3) is amortized across the computation of all
supervariables i 2 Lp, then the time taken to compute di is
(jAi j + jEi j) = O(jA0j);
i
which is (jAdjG (i)j) if all variables are simple. Computing our bound takes time
proportional to the degree of the variable in the quotient graph, G . This is much faster
than the time taken to compute the exact external degree (see Equation (3.3)).
4.1. Accuracy of our approximate degrees. Gilbert, Moler, and Schreiber
24] also use approximate external degrees that they can compute in the same time
b
as our degree bound d. In our notation, their bound di is
k b
di = jAi n ij + X e2Ei jLe n ij: 10 P. AMESTOY, T. A. DAVIS, , AND I. S. DUFF Since many pivotal variables are adjacent to two or fewer elements when selected,
b
Ashcraft and Eisenstat 4] have suggested a combination of d and d, e
d= d if jEij = 2 :
b
d otherwise e
b
Computing d takes the same time as d or d, except when jEij = 2. In this case, it
e, whereas computing d or d takes (jAij) time.
b
takes O(jAij + jLej) time to compute d
In the Yale Sparse Matrix Package 17] the jLe n Lp j term for the Ei = fe; pg case
is computed by scanning Le once. It is then used to compute di for all i 2 Lp for
e
which Ei = fe; pg. This technique can also be used to compute d, and thus the time
e
to compute d is O(jAi j + jLej) and not (jAi j + jLej).
Theorem 1: Relationship between external degree and the three apeb
proximate degree bounds. The equality di = di = di = di holds when jEi j 1.
ei di holds when jEij = 2. Finally, the inequality
b
The inequality di = di = d
eb
di di di = di holds when jEij > 2.
Proof:
b
The bound di is equal to the exact degree when variable i is adjacent to at most
one element (jEij 1). The accuracy of their bound is una ected by the size of Ai ,
since entries are removed from A that fall within the pattern L of an element. Thus,
b
if there is just one element (the current element p, say), the bound di is tight. If jEij
is two (the current element, p, and a prior element e, say), we have b
di = jAi n ij + jLp n ij + jLe n ij = di + j(Le \ Lp ) n ij: b
b
The bound di counts entries in the set (Le \ Lp ) n i twice, and so di will be an
overestimate in the possible (even likely) case that a variable j 6= i exists that is
e
eb
adjacent to both e and p. Combined with the de nition of d, we have di = di = di
ei di when jEij = 2, and di di = di when jEij > 2.
b
eb
when jEi j 1, di = d
b
If jEij 1 our bound di is exact for the same reason that di is exact. If jEij is two
we have
di = jAi n ij + jLp n ij + jLe n Lpj = di :
No entry is in both Ai and any element L, since these redundant entries are removed
from Ai . Any entry in Lp does not appear in the external subset (Le n Lp ). Thus, no
b
entry is counted twice, and di = di when jEij 2. Finally, consider both di and di
when jEi j > 2. We have
di = jAi n ij + jLp n ij +
and X e2Ei nfpg b
di = jAi n ij + jLp n ij + jLe n Lp j X
e2Ei nfpg jLe n ij: Since these degree bounds are only used when computing the degree of a supervariable
b
i 2 Lp , we have i Lp. Thus, di di when jEij > 2.
2 APPROXIMATE MINIMUM DEGREE 11 eb
Combining the three inequalities in Theorem 1, the inequality di di di di
holds for all values of jEij.
Note that, if a variable i is adjacent to two elements or less then our bound is
equal to the exact external degree. This is very important, since most variables of
minimum degree are adjacent to two elements or less. Additionally, our degree bounds
take advantage of element absorption, since the bound depends on jEij after elements
are absorbed.
4.2. Degree computation example. We illustrate the computation of our approximate external degree bound in Figures 3.1 and 3.2. Variable 6 is adjacent to three
elements in G 3 and G 4. All other variables are adjacent to two or less elements. In
G 3 , the bound d6 is tight, since the two sets jL1 n L3j and jL2 n L3 j are disjoint.
In graph G 4, the current pivot element is p = 4. We compute
d6 = jAi n ij + jLp n ij + (
=
=
=
= X e2Ei nfpg jLe n Lpj) j; n f6gj + jf6; 7; 8gn f6gj + (jL2 n L4j + jL3 n L4 j)
jf7; 8gj + (jf5; 6; 9g n f6; 7; 8gj + jf5; 6; 7gn f6; 7; 8gj)
jf7; 8gj + (jf5; 9gj + jf5gj) 5: The exact external degree of variable 6 is d6 = 4, as can be seen in the elimination
graph G4 on the left of Figure 3.2(a). Our bound is one more than the exact external
degree, since the variable 5 appears in both L2 n L4 and L3 n L4, but is one less than
b
the bound di which is equal to 6 in this case. Our bound on the degree of variable
6 is again tight after the next pivot step, since elements 2 and 3 are absorbed into
element 5.
5. The approximate minimum degree algorithm. The Approximate Minimum Degree algorithm is identical to Algorithm 1, except that the external degree, di ,
is replaced with di, throughout. The bound on the external degree, di , is computed
using Algorithm 2 and Equations (4.1) and (4.2). In addition to absorbing elements
in Ep , any element with an empty external subset (jLe n Lpj = 0) is also absorbed
into element p, even if e is not adjacent to p. This \aggressive" element absorption
improves the degree bounds by reducing jEj.
As in many other minimum degree algorithms, we use linked lists to assist the
search for a variable of minimum degree. List d holds all supervariables i with degree
bound di = d. Maintaining this data structure takes time proportional to the total
number of degree computations, or O(jLj).
Computing the pattern of each pivot element, Lp, takes a total of O(jLj) time
overall, since each element is used in the computation of at most one other element,
and the total sizes of all elements constructed is O(jLj).
The AMD algorithm is based on the quotient graph data structure used in the
MA27 minimum degree algorithm 13, 14, 15]. Initially, the sets A are stored, followed
by a small amount of elbow room. When the set Lp is formed, it is placed in the elbow
room (or in place of Ap if jEp j = 0). Garbage collection occurs if the elbow room is
exhausted. During garbage collection, the space taken by Ai and Ei is reduced to
exactly jAij + jEi j for each supervariable i (which is less than or equal to jA0j) and
i
the extra space is reclaimed. The space for Ae and Ee for all elements e 2 V is fully
reclaimed, as is the space for Le of any absorbed elements e. Each garbage collection 12 P. AMESTOY, T. A. DAVIS, , AND I. S. DUFF takes time that is proportional to the size of the workspace (normally (jAj)). In
practice, elbow room of size n is su cient.
During the computation of our degree bounds, we compute the following hash
function for supervariable detection 3], 80
9
< X X1
=
Hash(i) = :@
j + eA mod (n 1); + 1;
j2A e2E
i i which increases the degree computation time by a small constant factor. We place
each supervariable i in a hash bucket according to Hash(i), taking time O(jLj) overall.
If two or more supervariables are placed in the same hash bucket, then each pair of
supervariables i and j in the hash bucket are tested for indistinguishability. If the
hash function results in no collisions then the total time taken by the comparison is
O(jAj).
Ashcraft 3] uses this hash function as a preprocessing step on the entire matrix
(without the mod(n 1) term, and with an O(jV j log jV j) sort instead of jV j hash
buckets). In contrast, we use this function during the ordering, and only hash those
variables adjacent to the current pivot element.
For example, variables 7, 8, and 9 are indistinguishable in G5 , in Figure 3.2(a).
The AMD algorithm would not consider variable 8 at step 5, since it is not adjacent
to the pivot element 5 (refer to quotient graph G 5 in Figure 3.2(b)). AMD would
not construct 7 = f7; 9g at step 5, since 7 and 9 are distinguishable in G 5. It would
construct 7 = f7; 8; 9g at step 6, however.
The total number of times the approximate degree di of variable i is computed
during elimination is no more than the number of nonzero entries in row k of L, where
variable i is the kth pivot. The time taken to compute di is O(jA0j), or equivalently
i
O(j(PAPT )k j), the number of nonzero entries in row k of the permuted matrix.
The total time taken by the entire AMD algorithm is thus bounded by the degree
computation,
(5.1) O n
X k=1 jLk j j(PAPT )k ! j: This bound assumes no (or few) supervariable hash collisions and a constant number of
garbage collections. In practice these assumptions seem to hold, but the asymptotic
time would be higher if they did not. In many problem domains, the number of
nonzeros per row of A is a constant, independent of n. For matrices in these domains,
our AMD algorithm takes time O(jLj) (with the same assumptions).
6. Performance results. In this section, we present the results of our experiments with AMD on a wide range of test matrices. We rst compare the degree
e
b
computations discussed above (t, d, d, d, and d), as well as an upper bound on the
true degree, t d + jij 1. We then compare the AMD algorithm with other established minimum degree codes (MMD and MA27).
6.1. Test Matrices. We tested all degree bounds and codes on all matrices
in the Harwell/Boeing collection of type PUA, RUA, PSA, and RSA 11, 12] (at
orion.cerfacs.fr or numerical.cc.rl.ac.uk), all nonsingular matrices in Saad's
SPARSKIT2 collection (at ftp.cs.umn.edu), all matrices in the University of Florida
collection (available from ftp.cis.ufl.edu in the directory pub/umfpack/matrices), APPROXIMATE MINIMUM DEGREE 13 Table 6.1 Selected matrices in test set Matrix
RAEFSKY3
VENKAT01
BCSSTK32
EX19
BCSSTK30
CT20STIF
NASASRB
OLAF
RAEFSKY1
CRYSTK03
RAEFSKY4
CRYSTK02
BCSSTK33
BCSSTK31
EX11
FINAN512
RIM
BBMAT
EX40
WANG4
LHR34
WANG3
LHR71
ORANI678
PSMIGR1
APPU n nz 21,200
62,424
44,609
12,005
28,924
52,329
54,870
16,146
3,242
24,696
19,779
13,965
8,738
35,588
16,614
74,752
22,560
38,744
7,740
26,068
35,152
26,064
70,304
2,529
3,140
14,000 733,784
827,684
985,046
123,937
1,007,284
1,323,067
1,311,227
499,505
145,517
863,241
654,416
477,309
291,583
572,914
540,167
261,120
862,411
1,274,141
225,136
75,564
608,830
75,552
1,199,704
85,426
410,781
1,789,392 Percentage of Description 0.00
0.71
0.20
1.57
0.66
0.77
0.06
0.41
0.00
0.00
0.00
0.00
0.00
0.60
0.04
1.32
2.34
5.81
17.45
15.32
7.69
15.29
8.47
6.68
6.65
15.64 uid/structure interaction, turbulence
unstructured 2D Euler solver
structural eng., automobile chassis
2D developing pipe ow (turbulent)
structural eng., o shore platform
structural eng., CT20 engine block
shuttle rocket booster
NASA test problem
incompressible ow, pressuredriven pipe
structural eng., crystal vibration
buckling problem for container model
structural eng., crystal vibration
structural eng., auto steering mech.
structural eng., automobile component
CFD, 3D cylinder & at plate heat exch.
economics, portfolio optimization
chemical eng., uid mechanics problem
CFD, 2D airfoil with turbulence
CFD, 3D die swell problem on square die
3D MOSFET semicond. (30x30x30 grid)
chemical eng., light hydrocarbon recovery
3D diode semiconductor (30x30x30 grid)
chemical eng., light hydrocarbon recovery
Australian economic model
US countybycounty migration
NASA test problem (random matrix) jEp j > 2 jEi j > 2
13.4
15.7
27.3
29.4
31.8
33.2
35.0
35.2
38.9
40.9
41.4
42.0
42.6
43.1
43.3
46.6
63.2
64.4
64.7
78.3
78.7
79.2
81.1
86.9
91.0
94.4 and several other matrices from NASA and Boeing. Of those 378 matrices, we present
results below on those matrices requiring 500 million or more oatingpoint operations
for the Cholesky factorization, as well as the ORANI678 matrix in the Harwell/Boeing
collection and the EX19 in Saad's collection (a total of 26 matrices). The latter two
are bestcase and worstcase examples from the set of smaller matrices.
For the unsymmetric matrices in the test set, we rst used the maximumtransversal algorithm MC21 from the Harwell Subroutine Library 9] to reorder the matrix so
that the permuted matrix has a zerofree diagonal. We then formed the symmetric
pattern of the permuted matrix plus its transpose. This is how a minimum degree
ordering algorithm is used in MUPS 1, 2]. For these matrices, Table 6.1 lists the
statistics for the symmetrized pattern.
Table 6.1 lists the matrix name, the order, the number of nonzeros in lower
triangular part, two statistics obtained with an exact minimumdegree ordering (using
d), and a description. In column 4, we report the percentage of pivots p such that
jEp j > 2. Column 4 shows that there is only a small percentage of pivots selected
using an exact minimum degree ordering that have more than two elements in their
adjacency list. Therefore, we can expect a good quality ordering with an algorithm
based on our approximate degree bound. In column 5, we indicate how often a
degree di is computed when jEij > 2 (as a percentage of the total number of degree
updates). Table 6.1 is sorted according to this degree update percentage. Column 5
thus reports the percentage of \costly" degree updates performed by a minimum
degree algorithm based on the exact degree. For matrices with relatively large values 14 P. AMESTOY, T. A. DAVIS, , AND I. S. DUFF in column 5, signi cant time reductions can be expected with an approximate degree
based algorithm.
Since any minimum degree algorithm is sensitive to tiebreaking issues, we randomly permuted all matrices and their adjacency lists 21 times (except for the random
APPU matrix, which we ran only once). All methods were given the same set of 21
randomized matrices. We also ran each method on the original matrix. On some matrices, the original matrix gives better ordering time and llin results for all methods
than the best result obtained with the randomized matrices. The overall comparisons
are not however dependent on whether original or randomized matrices are used. We
thus report only the median ordering time and llin obtained for the randomized
matrices.
The APPU matrix is a random matrix used in a NASA benchmark, and is thus
not representative of sparse matrices from real problems. We include it in our test
set as a pathological case that demonstrates how well AMD handles a very irregular
problem. Its factors are about 90% dense. It was not practical to run the APPU
matrix 21 times because the exact degree update algorithms took too much time.
6.2. Comparing the exact and approximate degrees. To make a valid comparison between degree update methods, we modi ed our code for the AMD algorithm
so that we could compute the exact external degree (d), our bound (d), Ashcraft and
e
b
Eisenstat's bound (d), Gilbert, Moler, and Schreiber's bound (d), the exact true dee
gree (t), and our upper bound on the true degree (t). The six codes based on d, d, d,
b
d, t, and t (columns 3 to 8 of Table 2) di er only in how they compute the degree.
Since aggressive absorption is more di cult when using some bounds than others,
we switched o aggressive absorption for these six codes. The actual AMD code (in
column 2 of Table 2) uses d with aggressive absorption.
Table 6.2 lists the median number of nonzeros below the diagonal in L (in thousands) for each method. Results 20% higher than the lowest median jLj in the table (or
higher) are underlined. Our upper bound on the true degree (t) and the exact true degree (t) give nearly identical results. As expected, using minimum degree algorithms
based on external degree noticeably improves the quality of the ordering (compare
eb
columns 3 and 7, or columns 4 and 8). From the inequality d d d d, we
would expect a similar ranking in the quality of ordering produced by these methods.
Table 6.2 con rms this. The bound d and the exact external degree d produce nearly
identical results. Comparing the AMD results and the d column, aggressive absorption tends to result in slightly lower llin, since it reduces jEj and thus improves the
e
accuracy of our bound. The d bound is often accurate enough to produce good results, but can fail catastrophically for matrices with a high percentage of approximate
b
pivots (see column 4 in Table 6.1). The less accurate d bound produces notably worse
results for many matrices.
Comparing all 378 matrices, the median jLj when using d is never more than 9%
higher than the median llin obtained when using the exact external degree, d (with
the exception of the FINAN512 matrix). The llin results for d and d are identical
for nearly half of the 378 matrices. The approximate degree bound d thus gives a very
reliable estimation of the degree in the context of a minimum degree algorithm.
The FINAN512 matrix is highly sensitive to tiebreaking variations. Its graph
consists of two types of nodes: \constraint" nodes and \linking" nodes 5]. The
constraint nodes form independent sparse subgraphs, connected together via a tree of
linking nodes. This matrix is a pathological worstcase matrix for any minimumdegree APPROXIMATE MINIMUM DEGREE 15 Table 6.2 Median llin results of the degree update methods Number of nonzeros below diagonal in L, in thousands
e
b
AMD
d
d
d
d
t
t
RAEFSKY3 4709 4709 4709
4709
5114 4992 4992
5798
6399 6245 6261
VENKAT01 5789 5771 5789
5080 5081 5079
5083
5721 5693 5665
BCSSTK32
EX19
319
319
319
318
366
343
343
BCSSTK30
3752 3751 3753
3759
4332 4483 4502
CT20STIF 10858 10758 10801 11057 13367 12877 12846
NASASRB 12282 12306 12284 12676 14909 14348 14227
OLAF
2860 2858 2860
2860
3271 3089 3090
RAEFSKY1 1151 1151 1151
1151
1318 1262 1262
CRYSTK03 13836 13836 13836 13836 17550 15507 15507
7685
9294 8196 8196
RAEFSKY4 7685 7685 7685
CRYSTK02
6007 6007 6007
6007
7366 6449 6449
BCSSTK33
2624 2624 2624
2640
3236 2788 2787
5115 5096 5132
5225
6194 6079 6057
BCSSTK31
6014 6016 6014
6014
7619 6673 6721
EX11
4778 4036 6042 11418 11505 8235 8486
FINAN512
RIM
3948 3898 3952
3955
4645 4268 4210
BBMAT
19673 19880 19673 21422 37820 21197 21445
1418 1386 1417
1687
1966 1526 1530
EX40
WANG4
6547 6808 6548
6566
7871 7779 7598
LHR34
3618 3743 3879 11909 27125 4383 4435
WANG3
6545 6697 6545
6497
7896 7555 7358
7933 8127 8499 28241 60175 9437 9623
LHR71
ORANI678
147
147
146
150
150
147
146
3020 3025 3011
3031
3176 2966 2975
PSMIGR1
APPU
87648 87613 87648 87566 87562 87605 87631
Matrix method. All constraint nodes should be ordered rst, but linking nodes have low
degree and tend to be selected rst, which causes high llin. Using a tree dissection
algorithm, Berger, Mulvey, Rothberg, and Vanderbei 5] obtain an ordering with only
1.83 million nonzeros in L.
Table 6.3 lists the median ordering time (in seconds on a SUN SPARCstation
10) for each method. Ordering time twice that of the minimum median ordering
b
time listed in the table (or higher) is underlined. Computing the d bound is often
the fastest, since it requires a single pass over the element lists instead of the two
passes required for the d bound. It is, however, sometimes slower than d because it
can generate more llin, which increases the ordering time (see Equation 5.1). The
ordering time of the two exact degree updates (d and t) increases dramatically as the
percentage of \costly" degree updates increases (those for which jEij > 2).
Garbage collection has little e ect on the ordering time obtained. In the above
runs, we gave each method elbow room of size n. Usually a single garbage collection
occurred. At most two garbage collections occurred for AMD, and at most three for
the other methods (aggressive absorption reduces the memory requirements).
6.3. Comparing algorithms. In this section, we compare AMD with two other
established minimum degree codes: Liu's Multiple Minimum Degree (MMD) code
25] and Du and Reid's MA27 code 15]. MMD stores the element patterns L in a
fragmented manner and requires no elbow room 20, 21]. It uses the exact external
degree, d. MMD creates supervariables only when two variables i and j have no 16 P. AMESTOY, T. A. DAVIS, , AND I. S. DUFF
Table 6.3 Median ordering time of the degree update methods Matrix AMD
RAEFSKY3 1.05
4.07
VENKAT01
4.67
BCSSTK32
EX19
0.87
BCSSTK30
3.51
CT20STIF
6.62
7.69
NASASRB
OLAF
1.83
RAEFSKY1 0.27
CRYSTK03
3.30
RAEFSKY4 2.32
CRYSTK02
1.49
BCSSTK33
0.91
4.55
BCSSTK31
2.70
EX11
15.03
FINAN512
RIM
5.74
BBMAT
27.80
1.04
EX40
WANG4
5.45
LHR34
19.56
WANG3
5.02
46.03
LHR71
ORANI678
5.49
10.61
PSMIGR1
APPU
41.75 d 1.10
4.95
5.64
1.12
5.30
8.66
11.03
2.56
0.34
4.84
2.90
2.34
1.36
7.53
4.06
34.11
10.38
115.75
1.56
11.45
109.10
10.45
349.58
196.01
334.27
2970.54 Ordering time, in seconds d 1.09
4.11
4.54
0.89
3.55
6.54
7.73
1.90
0.28
3.08
2.18
1.55
1.05
4.92
2.77
14.45
5.69
27.44
1.10
5.56
25.62
5.49
58.25
8.13
10.07
39.83 e
d 1.05
4.47
4.91
1.01
3.65
7.07
9.23
2.16
0.32
3.68
2.45
1.64
0.99
5.68
3.00
17.79
6.12
42.17
1.09
6.98
45.36
6.52
129.85
6.97
14.20
43.20 b
d 1.02
3.88
4.35
0.86
3.51
6.31
7.78
1.83
0.25
3.14
2.08
1.45
0.85
4.56
2.60
15.84
5.72
23.02
0.95
5.21
43.70
4.81
121.96
7.23
8.16
40.64 t 1.15
4.32
5.55
1.09
4.38
8.63
11.78
2.33
0.35
5.23
3.12
2.04
1.62
7.41
4.23
46.49
10.01
129.32
1.46
11.59
125.41
11.02
389.70
199.01
339.28
3074.44 t 1.09
3.85
4.48
0.87
3.38
6.45
7.99
1.78
0.28
3.30
2.07
1.52
0.91
4.92
2.89
18.58
5.58
28.33
1.12
5.88
24.73
5.02
60.40
8.45
9.94
38.93 adjacent variables and exactly two adjacent elements (Ei = Ej = fe; pg, and Ai =
Aj = ;, where p is the current pivot element). A hash function is not required. MMD
takes advantage of multiple elimination and incomplete update.
MA27 uses the true degree, t, and the same data structures as AMD. It detects
supervariables whenever two variables are adjacent to the current pivot element and
have the same structure in the quotient graph (as does AMD). MA27 uses the true
degree as the hash function for supervariable detection, and does aggressive absorption. Neither AMD nor MA27 take advantage of multiple elimination or incomplete
update.
Structural engineering matrices tend to have many rows of identical nonzero pattern. Ashcraft has found that the total ordering time of MMD can be signi cantly
improved by detecting these initial supervariables before starting the elimination 3].
We implemented Ashcraft's precompression algorithm, and modi ed MMD to allow
for initial supervariables. We call the resulting code CMMD (\compressed" MMD).
Precompression has little e ect on AMD, since it nds these supervariables when their
degrees are rst updated. The AMD algorithm on compressed matrices together with
the cost of precompression was never faster than AMD.
Table 6.4 lists the median number of nonzeros below the diagonal in L (in thousands) for each code. Results 20% higher than the lowest median jLj in the table
(or higher) are underlined. AMD, MMD, and CMMD nd orderings of about the
same quality. MA27 is slightly worse because it uses the true degree (t) instead of the
external degree (d). APPROXIMATE MINIMUM DEGREE 17 Table 6.4 Median llin results of the four codes Matrix Number of nonzeros below diagonal
in L, in thousands
AMD MMD CMMD MA27
RAEFSKY3 4709 4779
4724
5041
VENKAT01 5789 5768
5811
6303
BCSSTK32
5080 5157
5154
5710
EX19
319
322
324
345
3752 3788
3712
4529
BCSSTK30
CT20STIF 10858 11212
10833 12760
NASASRB 12282 12490
12483 14068
2860 2876
2872
3063
OLAF
1177
1255
RAEFSKY1 1151 1165
14066 15496
CRYSTK03 13836 13812
RAEFSKY4 7685 7539
7582
8245
CRYSTK02
6007 5980
6155
6507
2624 2599
2604
2766
BCSSTK33
5115 5231
5216
6056
BCSSTK31
EX11
6014 5947
6022
6619
FINAN512
4778 8180
8180
8159
RIM
3948 3947
3914
4283
19673 19876
19876 21139
BBMAT
1418 1408
1401
1521
EX40
WANG4
6547 6619
6619
7598
LHR34
3618 4162
4162
4384
WANG3
6545 6657
6657
7707
7933 9190
9190
9438
LHR71
147
147
147
147
ORANI678
PSMIGR1
3020 2974
2974
2966
APPU
87648 87647
87647 87605 Considering the entire set of 378 matrices, AMD produces a better median llin
than MMD, CMMD, and MA27 for 62%, 61%, and 81% of the matrices, respectively.
AMD never generates more than 7%, 7%, and 4% more nonzeros in L than MMD,
CMMD, and MA27, respectively. We have shown empirically that AMD produces an
ordering at least as good as these other three methods for this large test set.
If the apparent slight di erence in ordering quality between AMD and MMD is
statistically signi cant, we conjecture that it has more to do with earlier supervariable
detection (which a ects the external degree) than with the di erences between the
external degree and our upper bound.
Table 6.5 lists the median ordering time (in seconds on a SUN SPARCstation 10)
for each method. The ordering time for CMMD includes the time taken by the precompression algorithm. Ordering time twice that of the minimum median ordering
time listed in the table (or higher) is underlined. On certain classes of matrices,
typically those from structural analysis applications, CMMD is signi cantly faster
than MMD. AMD is the fastest method for all but the EX19 matrix. For the other
352 matrices in our full test set, the di erences in ordering time between these various
methods is typically less. If we compare the ordering time of AMD with the other
methods on all matrices in our test set requiring at least a tenth of a second of
ordering time, then AMD is slower than MMD, CMMD, and MA27 only for 6, 15,
and 8 matrices respectively. For the full set of matrices, AMD is never more than 30%
slower than these other methods. The best and worst cases for the relative runtime 18 P. AMESTOY, T. A. DAVIS, , AND I. S. DUFF
Table 6.5 Median ordering time of the four codes Matrix AMD
RAEFSKY3 1.05
4.07
VENKAT01
BCSSTK32
4.67
EX19
0.87
BCSSTK30
3.51
6.62
CT20STIF
7.69
NASASRB
OLAF
1.83
RAEFSKY1 0.27
3.30
CRYSTK03
RAEFSKY4 2.32
CRYSTK02
1.49
BCSSTK33
0.91
4.55
BCSSTK31
2.70
EX11
FINAN512
15.03
RIM
5.74
27.80
BBMAT
EX40
1.04
5.45
WANG4
LHR34
19.56
WANG3
5.02
LHR71
46.03
ORANI678
5.49
PSMIGR1
10.61
APPU
41.75 Ordering time, in seconds
MMD CMMD
MA27
2.79
1.18
1.23
9.01
4.50
5.08
12.47
5.51
6.21
0.69
0.83
1.03
7.78
3.71
4.40
26.00
9.59
9.81
22.47
11.28
12.75
5.67
4.41
2.64
0.82
0.28
0.40
10.63
3.86
5.27
5.24
2.36
2.91
3.89
1.53
2.37
2.24
1.32
1.31
11.60
7.76
7.92
7.45
5.05
3.90
895.23
897.15
40.31
9.09
8.11
10.13
200.86
201.03
134.58
2.13
2.04
1.30
10.79
11.60
9.86
139.49
141.16
77.83
10.37
10.62
8.23
396.03
400.40
215.01
124.99
127.10
124.66
186.07
185.74
229.51
5423.23 5339.24 2683.27 of AMD for the smaller matrices are included in Table 6.5 (the EX19 and ORANI678
matrices).
7. Summary. We have described a new upper bound for the degree of nodes
in the elimination graph that can be easily computed in the context of a minimum
degree algorithm. We have demonstrated that this upperbound for the degree is more
accurate than all previously used degree approximations. We have experimentally
shown that we can replace an exact degree update by our approximate degree update
and obtain almost identical llin.
An Approximate Minimum Degree (AMD) based on external degree approximation has been described. We have shown that the AMD algorithm is highly competitive with other ordering algorithms. It is typically faster than other minimum
degree algorithms, and produces comparable results to MMD (which is also based on
external degree) in terms of llin. AMD typically produces better results, in terms
of llin and computing time, than the MA27 minimum degree algorithm (based on
true degrees).
8. Acknowledgments. We would like to thank John Gilbert for outlining the
b
di di portion of the proof to Theorem 1, Joseph Liu for providing a copy of the
MMD algorithm, and Cleve Ashcraft and Stan Eisenstat for their comments on a
draft of this paper.
REFERENCES APPROXIMATE MINIMUM DEGREE 19 1] P. R. Amestoy, Factorization of large sparse matrices based on a multifrontal approach
in a multiprocessor environment, INPT PhD Thesis TH/PA/91/2, CERFACS, Toulouse,
France, 1991.
2] P. R. Amestoy, M. Dayde, and I. S. Duff, Use of level 3 BLAS in the solution of full and
sparse linear equations, in High Performance Computing: Proceedings of the International
Symposium on High Performance Computing, Montpellier, France, 22{24 March, 1989,
J.L. Delhaye and E. Gelenbe, eds., Amsterdam, 1989, North Holland, pp. 19{31.
3] C. Ashcraft, Compressed graphs and the minimum degree algorithm, SIAM Journal on Scienti c Computing, (1995, to appear).
4] C. Ashcraft and S. C. Eisenstat. personal communication.
5] A. Berger, J. Mulvey, E. Rothberg, and R. Vanderbei, Solving multistage stochachastic programs using tree dissection, Tech. Report SOR9707, Program in Statistics and
Operations Research, Princeton University, Princeton, New Jersey, 1995.
6] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, MIT Press,
Cambridge, Massachusets, and McGrawHill, New York, 1990.
7] T. A. Davis and I. S. Duff, Unsymmetricpattern multifrontal methods for parallel sparse
LU factorization, Tech. Report TR91023, CIS Dept., Univ. of Florida, Gainesville, FL,
1991.
8]
, An unsymmetricpattern multifrontal method for sparse LU factorization, Tech. Report
TR94038, CIS Dept., Univ. of Florida, Gainesville, FL, 1994. (submitted to the SIAM
Journal on Matrix Analysis and Applications in March 1993, revised).
9] I. S. Duff, On algorithms for obtaining a maximum transversal, ACM Transactions on Mathematical Software, 7 (1981), pp. 315{330.
10] I. S. Duff, A. M. Erisman, and J. K. Reid, Direct Methods for Sparse Matrices, London:
Oxford Univ. Press, 1986.
11] I. S. Duff, R. G. Grimes, and J. G. Lewis, Sparse matrix test problems, ACM Trans. Math.
Softw., 15 (1989), pp. 1{14.
, Users' guide for the HarwellBoeing sparse matrix collection (Release 1), Tech. Report
12]
RAL92086, Rutherford Appleton Laboratory, Didcot, Oxon, England, Dec. 1992.
13] I. S. Duff and J. K. Reid, A comparison of sparsity orderings for obtaining a pivotal sequence
in Gaussian elimination, Journal of the Institute of Mathematics and its Applications, 14
(1974), pp. 281{291.
, MA27 { A set of Fortran subroutines for solving sparse symmetric sets of linear equa14]
tions, Tech. Report AERE R10533, HMSO, London, 1982.
, The multifrontal solution of inde nite sparse symmetric linear equations, ACM Trans.
15]
Math. Softw., 9 (1983), pp. 302{325.
16]
, The multifrontal solution of unsymmetric sets of linear equations, SIAM J. Sci. Statist.
Comput., 5 (1984), pp. 633{641.
17] S. C. Eisenstat, M. C. Gursky, M. H. Schultz, and A. H. Sherman, Yale sparse matrix package, I: The symmetric codes, International Journal for Numerical Methods in
Engineering, 18 (1982), pp. 1145{1151.
18] S. C. Eisenstat, M. H. Schultz, and A. H. Sherman, Algorithms and data structures
for sparse symmetric Gaussian elimination, SIAM Journal on Scienti c and Statistical
Computing, 2 (1981), pp. 225{237.
19] A. George and J. W. H. Liu, A fast implementation of the minimum degree algorithm using
quotient graphs, ACM Transactions on Mathematical Software, 6 (1980), pp. 337{358.
, A minimal storage implementation of the minimum degree algorithm, SIAM J. Numer.
20]
Anal., 17 (1980), pp. 282{299.
21]
, Computer Solution of Large Sparse Positive De nite Systems, Englewood Cli s, New
Jersey: PrenticeHall, 1981.
, The evolution of the minimum degree ordering algorithm, SIAM Review, 31 (1989),
22]
pp. 1{19.
23] A. George and D. R. McIntyre, On the application of the minimum degree algorithm to
nite element systems, SIAM J. Numer. Anal., 15 (1978), pp. 90{111.
24] J. R. Gilbert, C. Moler, and R. Schreiber, Sparse matrices in MATLAB: design and
implementation, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 333{356.
25] J. W. H. Liu, Modi cation of the minimumdegree algorithm by multiple elimination, ACM
Trans. Math. Softw., 11 (1985), pp. 141{153.
26] H. M. Markowitz, The elimination form of the inverse and its application to linear programming, Management Science, 3 (1957), pp. 255{269.
27] D. J. Rose, Symmetric Elimination on Sparse Positive De nite Systems and the Potential
Flow Network Problem, PhD thesis, Applied Math., Harvard Univ., 1970. 20 P. AMESTOY, T. A. DAVIS, , AND I. S. DUFF 28] , A graphtheoretic study of the numerical solution of sparse positive de nite systems of
linear equations, in Graph Theory and Computing, R. C. Read, ed., New York: Academic
Press, 1973, pp. 183{217.
29] B. Speelpenning, The generalized element method, Tech. Report Technical Report UIUCDCSR78946, Dept. of Computer Science, Univ. of Illinois, Urbana, IL, 1978.
30] W. F. Tinney and J. W. Walker, Direct solutions of sparse network equations by optimally
ordered triangular factorization, Proc. of the IEEE, 55 (1967), pp. 1801{1809.
31] M. Yannakakis, Computing the minimum llin is NPcomplete, SIAM J. Algebraic and
Discrete Methods, 2 (1981), pp. 77{79. Note: all University of Florida technical reports in this list of references are
available in postscript form via anonymous ftp to ftp.cis.ufl.edu in the directory
cis/techreports. ...
View Full
Document
 Spring '08
 Staff

Click to edit the document details