This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: IEEE TRANSACTIONS ON AUTOMATIC comon von Arc24, no. 6, DECEMBER 191â€˜sÂ» A Hessenbergâ€”Schur Method for the Problem
AX + XB : C G. H. GOLUB, S. NASH, AND C. VAN LOAN Abstractâ€”One of the most effective methods for solving the matrix
equation AX +XB= C is the Bandsâ€”Stewart algorithm. Key to this
technique is the orthogonal reduction of A and B to triangular form using
the QR algorithm for eigenvalues. A new method is proposed which differs
from the Bandsâ€”Stewart algorithm in that A is only reduced to Hessem
berg form. The resulting algorithm is between 30 and '70 percent faster
depending upon the dimensions of the matrices A and B. The stability of
the new method ls demonstrated through a roundoff enor analysis and
supported by numerical tests. Finally, it is shown how the techniques
described can be applied and generalized to other matrix equation prob
lems. I. Imopucnon Let A E .R "X" and B E 8"â€œ be given matrices and define the linear
transformation o: R "' x "â€”+ R "â€™ x " by Â¢(X)AX+XB. (1.1) This linear transformation is nonsingular if and only if A and â€” B have
no eigenvalues in common which we shall hereafter assume. Linear
equations of the form Â¢(X)=AX+XB=C (1.2) arise in many problem areas and numerous algorithms have been proâ€”
posed [4], [10]. Among them, the Bartelsâ€”Stewart algorithm [1] has
enjoyed considerable success [2]. In this paper we discuss a modification
of their technique which is just as accurate and considerably faster.
This new method is called the â€œHessenbergâ€”Schur algorithmâ€ and like
the BartelsLStewart algorithm is an example of a â€œtransformation
method.â€ Such methods are based upon the equivalence of the problems AX+XB=C and (Uâ€˜1AU)(U"'XV)+(U_'XV)(V"BV} U"CV
and generally involve the following four steps. Step I: Transform A and B into â€œsimpleâ€ form via the similarity
transformations A1= Uâ€˜ 'AU and B, = Vâ€˜ 'BV. Step 2: Solve UF= CV for F . Sim 3: Solve the transformed system A, Y+ YB, = F for Y. Step 4: Solve X V U 1" for X. A brief look at the effect of roundoff error in Steps 2 and 4 serves as a
nice introduction to both the Barrelsâ€”Stewart and Hessenbergâ€”Schur
algorithms. In these steps linear systems are solved which involve the transforma
tion matrices U and V. Suppose Gaussian elimination with partial
pivoting is used for this purpose and that the computations are done on
a computer whose ï¬‚oatingâ€”point numbers have t base ,3 digits in the
mantissa. Using the standard Wilkinson inverse error analysis [12], [13] it
follows that relative errors of order u[x2(U)+K2{ V)] can be expected to
contaminate the computed solution )tâ€˜r where uï¬" Manuscript received December 18. 1978: revised July 26, 19?9. Paper recommended by
A. .l. Laub. Chairman or the Computational Methods and Discrete Systems Committee.
The work of G. H. Golub was supported. in part by DOE Contth EY76S0343326
PA330. G. H. Golub and S. Nash are with the Department of Computer Science, Stanford
University. Stanford. CA 94305. C. Van Loan is with the Departmenl of Computer Solence, Cornell University, Ithaca,
NY 14353. is the relative machine precision and a2{ ) is defined by â€œzf W)â€ H Wllzll Wâ€˜ lllz m ,ma, _yâ€™â€y_
xTx yum {Wy)7(wy)' When a2( W) is large with respecr to u, then we say that W is â€œillcondi
tioned." Unfortunately, several of the possible reductions in Step 1 can lead to
illconditioned U and V matrices. For example, if A and B are diagona
lizable, then there exist U and V so that ' Emax
xvii] ulAuâ€”diag(a1,a2, â€˜ t .am)=A1
V 18V==diag(,8l,32.'   .18...)= Br The matrix Y=(yg) in Step 3 is then prescribed by the simple formulas
yo. fU/(al. + g). If we apply this algorithm to the problem A_ 1234567891 3515935621
0 12340318263 3_ 03458963425 0
0.652l859685 03450509462 Cg 5348636323 1095604458
2232161079 1.579129214 and use HPï¬? arithmetic {u =10'Â°), we find A3. l.003948200 0399995000
0399992700 1.000000% â€˜ Now in this example, u[pÂ¢2(U)+x2(I/}]=10â€˜3 and so we should not be
surprised to learn that to full working precision ' X:[1.000000000 1000000000
LWODOOODO â€™ 1.11:0000000 Conclusion: We should avoid illconditioned transformation matrices.
Methods which involve the computation of Jordan or companion forms
in Step 1 do not do this (cf. [6], [9]}. This leads us to consider transformation methods which rely on
orthogonal U and V. (Recall that U TU = I implies x2051)  i.) In the next
two sections we describe two such techniques: one old and one new. The
first of these is the Bartelsâ€”Stewart algorithm. This method involves the
orthogonal reduction of A and B to triangular form using the QR
algorithm. The main point of this paper is to show how this algorithm
can be streamlined by only reducing A to Hessenberg form. The result
ing algorithm is described in Section III and its roundoff properties are
shown to be very desirable in Sections 1V and V. The superior efficiency
of the new method for a large class of problems is substantiated in
Section VI where we report on several numerical tests. Finally, we
conclude by shooting how the techniques developed can be extended to
other matrix problems. II. THE BARTELsâ€”Srnwmr ALGORITHM The crux of the Barrelsâ€”Stewart algorithm [1] is to use the QR
algorithm to compute the real Schur decompositions
VTB TVâ€”S UTAUER (2.1) where R and S are upper quasiâ€”triangular and U and V are orthogonal.
(A quasitriangular matrix is triangular with possible nonzero 2><2
blocks along the diagonal.) From our remarks in Section l. the reductions (2.1) [end to the
transformed system RY+ YSTâ€” F (F UTCV, r UTXV). (2.2) 00189286/79/1200â€”0909500.75 Â©1919 IEEE 91:) Assuming J'th is zero. then it follows that n (R Helmfr  2 W, (2.3) jk+1
Where Y'lJâ€™i if: i lyn] and F=lfl lfz i ile Thusiyk can
be found from yk+1,   ,y,, by solving an upper quasiâ€”triangular system, a very easy computation requiring m2 / 2 operations once the righthand
side is known. If awâ€œl is nonzero. then y, and y,_, are â€œsimulta
neouslyâ€ found in an analogous fashion. If we assume that the Schur decompositions in (2.1] require 10m3 and
10:13 operations, respectively, then the overall workcount for the
BartelsStewart algorithm is given by W35(m,n)=l0m3+10n3+2.5[m2n+mn21.
The technique requires 2m2+2rt2+mn storage assuming the data are overwritten. III. THE HESSENBERGâ€”SCHUâ€˜R ALGORITHM In this section we describe a new algorithm, called the Hessen
bergâ€”Schur algorithm, which differs from the Bartelsâ€”Stewart method in
that the decompositions (2.1) are replaced by H UTAU
S VTBTV H upper Hessenberg, U orthogonal (3 l) S quasiupper triangular, V orthogonal. A matrix H(hg) is upper Hessenberg if liq0 for all i)j+l. The
orthogonal reduction of A to upper Hessenberg form can be accom
plished with Householder matrices in gma operations. See [12, p. 34?] for
a description of this algorithm. The reductions (3.1} lead to a system of
the form HY+ YST=F (3.2} which may be solved in a manner similar to what is done in the
Bartelsâ€”Stewart algorithm. In particular, assume that n+1;   ,y" have
been computed. If 5*. U, .0, then yk can be determined by solving the m X m Hessenâ€”
berg system (H+5kk1)yk â€œfrâ€˜_ 2 Jk+l shy}. (3.3)
When Gaussian elimination with partial pivoting is used for this pur
pose, m2 operations are needed once the rightâ€”hand side is known. If sh k_1 is nonzero, then by equating columns k â€” l and k in (3.2) we
find Hirinlynuyaayniâ€˜21:: â€˜*;,â€™;;']=m.lm
â€”_ E [shuyj  swj]a[g  w]. (3.4)
Jk+1 This is a 2mby2m linear system for y,â€˜ and yk_ .. By suitany ordering
the variables, the matrix of coefficients is upper triangular with two
nonzero subdiagonals. Using Gaussian elimination with partial pivoting,
this system can be solved in (Smll operations once the righthand side is
formed. Unfortunately, a 12m2 workspace is required to carry out the
calculations. Part of this increase in storage is compensated for by the fact that the
orthogonal matrix U can be stored in factored form below the diagonal
H [12, p. 350]. This implies that we do not need an M Kim array for U as
in the Bartelsâ€”Stewart algorithm. Summarizing the Hessenbergâ€”Schur
algorithm and the associated work counts we have the follov. 1 1g. 1) Reduce A to upper Hessenberg and BT to quasiupper triangular: H  U TA U (store U in factored form) gÂ»?
S  VTBV (save V) 103'!3 IEEE TRANSACHONS 0N AUEOMATIC comnor, VOL. arc24, N0. 6. DECEMBER 1979 2) Update the righthand side: 17 UTCV mzn + m2
3) Back substitute for Y: HY+ YSTF 3m2n+Ã©mn2
4) Obtain solution: X UYVI mzn + m2 wï¬‚s(m,n)  g m3 +10n3 +Sm7â€˜n + gmÂ»).
To obtain the operation count associated with the determination of Y,
we assumed that S has It / 2 2x2 bumps along its diagonal. (This is the
â€œworstâ€ case.) Unlike the work count for the Bartelsttewarl algorithm, wï¬‚stzmm} is
not symmetric in m and it. Indeed, scrutiny of wk$(m,rt} reveals that it
clearly pays to have m ï¬n. This can always be assured, for if mÃ©rr, we
merely apply the Hessenbergâ€”Schur algorithm to the transposed probâ€”
lem BTXT+XTA 7 CT.
Comparing wâ€(m, n) and wï¬‚sï¬mn) we find
3 2 3
wï¬‚dmln) â€˜ l+3(n/m)+ 5(n/m) +6(n/m) â€”â€”â€”â€”â€” (3.5)
WNWâ€) 15+%(n/m)+%(h/m)1+GUI/m)3
which indicates that substantial savings accrue when the Hessen
bergâ€”Schur method is favored. For example, if m =4n, then wï¬‚s(m,n) =
0.30wss(m,n}. The storage requirements of the new method are a little greater than
those for the Bartelsâ€”Stewart algorithm: A(m><m) for the original A and subsequently H and U
B(n>< n) for the original B and subsequently S V01 X n) for the orthogonal matrix V C (m x n) for the original C and subsequently Y and X
Wamâ€™) for handling the possible system {3.4). IV. A PERT'URBATICIN ANALYSIS In the next section we shall assess the effect of rounding errors on the
Hessenbergâ€”Schur algorithm. The assessment will largely be in the form
of a bound on the relative error in the computed solution X . To ensure a
correct interpretation of our results, it is first necessary to investigate the
amount of error which we can expect any algorithm to generate given
finite precision arithmetic. To do this we need to make some observations about the sensitivity of
the underlying problem Â¢(X)= C . This system of equations can be
written in the form First: (4.!)
where
P(r,Â®A)+(BTÂ®rm) (4.2)
and
X"'Â°Â¢(X)â€˜{x1tux21u' â€˜ â€˜ u utilizvxzza' ' â€˜ xm2v' ' ' 'xlm' ' ' Ami],
C'Vcctc)â€˜(ctticzn' â€˜ â€˜ â€™cmlvcllï¬‚cna' ' â€˜ â€˜cm2l' ' â€˜ 'clnv' ' ' â€˜CM)Tâ€˜ 'Iere. the Kronecker product W82 of two matrices W and Z is the
ock matrix whose (iJ) block is ivyZ. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. ticâ€”24, N0. 6, DECEMBER 1979 9â€œ Based on our knowledge of linear system sensitivity, we know that if P
is iiiconditioned, then small changes in A, B. and/ or C can induce
relatively large changes in the solution. To relate this to the transforma
tion is, we need to define a norm on the space of linear transformations
from Râ€™â€""' to Rmâ€œ : llftX)llr_ :Rmxï¬‚ RMXH
f * IIXIIF llfll = XERmâ€™â€ Here, the Frobenius norm H i H p is defined by H WlÂ§r=2ufwy2. Notice
that for the linear transformation in defined by (1.1) we have lÂ¢ll = llPllz < llAl2+ 3I2 where P is defined by (4.2). If Â«p is nonsingular, then llÂ¢(X}F "I
llelF Now consider solving AX+XB= C on a computer having machine
precision n. In general, rounding errors of order uAF, uBF, and
u]C  , will normally be present in A, B, and C, respectively, before any
algorithm even begins execution. Hence, the â€œbestâ€ thing we can say
about a computed solution X is that it satisfies Ilv'lll' =llPâ€”lll2' Keilan (A+E)i?+r(s+r)=(c+c) (4.3)
where
IIEIIFWIIAIIF {414)
IIFIIF'WIIBH; (4A5)
IlGllr<HllCllp (46) How accurate would such an XA be? By applying standard linear system
perturbation analysis [3], it is possible to establish the following result. Theorem: Assume that AX+ XB= C, (A + E}X+ X(B+ F)(C + G),
and (4.4), (4.5), and {4.6) hold. If ï¬‚Z)AZ+ 23 is nonsingular, C is
nonzero, and ulllAllp+IIBIJplIIÂ¢1iJ< 1/2. (4.7)
then
Xâ€”A?
Li c win/4w us llrllleâ€˜lll (4.8)
11an For the 2 x2 example given in Section I, the upper bound in (4.8) has
a value of about 10". This indicates that an AX + X3 = C problem can
be very wellconditioned even if the eigenvector matrices for A and B
are poorly conditioned. We conclude this section with the remark that the bound in (4.8) can
roughly be attained. Setting 1 2 8â€”1 0 3+8 6
A 3â€” =
[I] I] [I 3] C [1+6 4]
it is easy to verify that an Â¢IÂ¢" =0(1/6) and
u 1 l
x [I I].
(Think of 6 as a small positive constant.) Now if " " = u 0
AX+XB C+[U 0] it is easy to show )EX+["/B 0].
U 0 Thus, if Hepâ€”1H is large, then small changes in A, B, or C can induce
relatively large changes in X. In general, given the random nature of roundoff error, we conclude
from the analysis in this section that errors of order tineâ€”1H can be
expected to contaminate the computed solution no matter what algo~
rithm we employ to solve AX+XB== C. An estimate of Her 1H is there
fore something to be valued in practice. An expensive though reliable
method for doing this would be to compute the singular value decom
position of P(IHÂ®A)+(BTÂ®1,,,) using EISPACK [l I]. A far cheaper
alternative and one which we are currently investigating involves the
condition estimator developed in [[4]. V. ROUNDOFF ERROR ANALYSIS or THE HESSENBERG~SCHUR
Alï¬ORITlEM We now take a detailed look at the roundoff errors associated with the
Hessenbergâ€”Schur algorithm. This amounts to applying some standard
results from Wilkinson [12}. His inverse error analysis pertaining to
orthogonal matrices can be applied to the computations Hm UTA U,
S = VTB TV, F = U TC V, and X = UYVT while his wellknown analysis of
Gaussian elimination and backsubstitution can used in connection with
the determination of 1â€™. (Y is essentially obtained by using Gaussian
elimination and back substitution to solve the system [(13% H )+(SÂ®
Im)]vec{ Y)vec(F).) Denoting computed quantities with the â€œhatâ€ nota
tion, we can account for the rounding errors in the Hessenbergâ€”Schur
algorithm in the following way: i} U,+Â£,  U,â€™U,=r, â€œsure: (5.1)
ï¬ V, + E, V314: r, â€œsun, (6 (5.2)
3 UHA + EQUI IIEIIIF <Â£A II; (5.3)
Â§ VFIB+52JTV1 HEzllpsclwiIF (54)
ï¬= UF(C+E3)V1 llEsllrï¬‚llCllr (5â€˜5)
(f+ 5.))? =f â€œEtna can in: (56)
fâ€” U1(1â€˜+Es)V.T llEsllrï¬‚llfâ€™llr (5?)
where
f=(1,Â®H)+(SÂ®rm) (5.8}
)7 weed) (5.9)
f vch?) (5.10) and e is a small multiple of the machine precision a. (We have used the
Znorm for convenience. A straightforward error analysis shows that if INâ€œ 1IIÂ¢t2+otuAui+ 1:3 Hz] c 1/2. (5,â€œ)
then
Xâ€”J?
ll <(9Â£+2Â£2)â€Â¢â€”IIIHIAHF+IIBHFL (5â€™12) Inequality (5.12) indicates that errors n9 worse in magnitude than
0(Â¢' IÂ¢) will contaminate the computed X . Since a is a small multiple
of the machine precision u, we see that (5.12) is essentially the same
result as {4.8) which was established under the "ideal" assumptions
(4.3)â€”[4.6). Likewise, assumption (5.11) corresponds to assumption (4.7].
Conclusion: the roundoff properties of the Hessenbergâ€”Schur algorithm
are as good as can be expected from any algorithm designed to solve
AX + X B = C . We finish this section with two remarks. First, the entire analysis is
applicable to the Bartelsâ€”Stewart algorithm. We simply replace (5.3}
with 13 U.T(..i+is,)n1 llElllF<(llAll.F (53â€™) where 1; is now quasivtriangular instead of Hessenberg.
_ Our second remark concerns another standard by which the quality of
X can be judged. In some applications, one may be more interested in 912 the norm of the residual lAJI+JIBâ€” (3â€œ; than the relative error. An
analysis similar to that above reveals that if (5.l)â€”(5.l 1) hold, then â€œAhisâ€”curâ€”c(lowselitnAlmusual/rip (5.13) Notice that the bound does not involve Ho â€˜ 1. VI. THE FORTRAN CODES AND THEIR Paarosmmcs A collection of Fortran subroutines have been written which impleâ€”
ment the Hessenbergâ€”Schur algorithm. Here is a summary of what the
chief routines in the package do. AXXBCâ€”This is the main calling subroutine and the only one which
the user â€œsees.â€ It assumes m>n. ORTHESâ€”This subroutine reduces a matrix to upper Hessenberg
form using Householder matrices. All the information pertaining to the
reduction is stored below the main diagonal of the reduced matrix. ORTRANâ€”This subroutine is used to explicitly form the orthogonal
matrix obtained by ORTHES. HQR2â€”This subroutine reduces an upper Hessenberg matrix to up
per quasitriangular form using the QR algorithm. (It is an adaptation of
the EISPACK routine having the same name [11].) TRANSFâ€”This subroutine computes products of the form U TC V
and U YVT where U and V are orthogonal. NSOLVE, HESOLV, BACKSBâ€”These routines combine to solve
upper Hessenberg systems using Gaussian elimination with partial pivot
mg. NZSOLV, HZSOLV, BACKSBâ€”These routines combine to solve the
ZmbyZm block Hessenberg systems encountered whenever S has a
2by2 bump. The above codes are designed to handle double precision A, B, and C
and require about 23 000 bytes of storage. This amount of memory is put
into perspective with the remark that when a 25 x25 problem is solved, the program itself accounts for oneâ€”half of the total storage. To assess the effectiveness of our subroutines we ran two sets of tests.
In the first set we compared the execution times for our method and the
Bartelsâ€”Stewart algorithm. For a given value of n/m, approximately 2}
randomly generated examples were run ranging in dimension from 10 to
50. The timing ratios were then averaged. Table I summarizes what we
found. Although the predicted savings (second column) are a little
greater than those actually obtained (third column), the results certainly
confirm the superior efficiency of the Hessenberg~Schur algorithm. We also compared the accuracy of the two methods on the same class
of examples and found them indistinguishable. This is to be expected
because the favorable error analysis in the previous section applies to
both algorithms. The second class of test problems was designed to examine the
behaVior of the algorithm on illconditioned AX +XB= C examples.
This was accomplished by letting A and B have the form A diag(1,2.3.   .m)+Nm B2"â€˜iâ€™,,â€”diag(rt.rtâ€”l,,l)+NnT
where
0
1 0 O
Nk 1 1 ka. llAIO Notice that there is a coalescence among the eigenvalues of A and â€” B
as 1 gets large. This enables us to control the sensitivity of the transforâ€”
mation Â«X )AX + XB. (In particular, it is easy to show that â€œabâ€”1H >
2'.) To facilitate the checking of errors, C is chosen so that the solution X
is the matrix whose entries are each one. Using the same computer and
compiler as above (u=16_'3). we obtained the results for an m=iD,
n 4 problem, as shown in Table II. lEBB TRANSACTIONS ON Au'tomxric CONTROL, vor. into24, N0. 6, DECEMBER 1979 TABLE I
TIMINGS HS Execution Time
â€œâ€” (Average)
BS Execution Time .84 .TU .54 .35 All computations were performed using long arithmetic on the IBM 370K168 with the
Fortran H compiler. OPT2. TABLE II
ERROR AND REsmuius Hailitscn}. :' .' ll 7â€”
! x .!,.l A II,. +_. B :lFl 8.2 x 10"16 x 10â€”15 x 10â€œ16 â€”16 10 10â€”16 1016 The quantity Ho" I is the reciprocal of the smallest singular value of
the matrix P=(I4Â®A)+(BTÂ®1]D) and for this modestly sized problem
was found using the subroutine SVD in EISPACK [l l]. The results of Table II affirm the key results (5.12) and (5.13). In
particular, we see that small residuals are obtained regardless of the
norm of Â¢_1. In contrast, the accuracy of )2 deteriorates as Henâ€”1H
becomes large. We conclude this section with the remark that the Hessenbergâ€”Schur
algorithm offers no advantage over the Barteisâ€”Stewart method for the
important case when B=AT, i.e., the Lyapunov problem. This is beâ€”
cause the latter algorithm requires only one Schur decomposition to
solve AX+ XA 7: C. VII. EXTENSIONS TO OTHER MATRIX EQUATION PROBLEMS In this final section we indica...
View
Full Document
 Fall '07
 qgsdxjhf
 Numerical Analysis, Matrices, Triangular matrix, Hessenberg, Hessenberg matrix

Click to edit the document details