Unformatted text preview: Computing RankRevealing QR Factorizations of Dense Matrices
Christian H. Bischof Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S. Cass Ave., Argonne, IL 60439. [email protected] and Gregorio QuintanaOrt Departamento de Informatica, Universidad Jaime I, Campus Penyeta Roja, 12071 Castellon, Spain. [email protected] We develop algorithms and implementations for computing rankrevealing QR (RRQR) factorizations of dense matrices. First, we develop an e cient block algorithm for approximating an RRQR factorization,employinga windowed version of the commonlyused Golub pivoting strategy, safeguarded by incremental condition estimation. Second, we develop e ciently implementable variants of guaranteed reliable RRQR algorithms for triangular matrices originally suggested by Chandrasekaran and Ipsen, and by Pan and Tang. We suggest algorithmic improvements with respect to condition estimation, termination criteria, and Givens updating. By combining the block algorithm with one of the triangular postprocessing steps, we arrive at an e cient and reliable algorithm for computing an RRQR factorization of a dense matrix. Experimental results on IBM RS/6000 and SGI R8000 platforms show that this approach performs up to three times faster than the less reliable QR factorization with column pivoting as it is currently implemented in LAPACK, and comes within 15% of the performance of the LAPACK block algorithm for computing a QR factorization without any column exchanges. Thus, we expect this routine to be useful in many circumstances where numerical rank de ciency cannot be ruled out but currently has been ignored because of the computational cost of dealing with it. Categories and Subject Descriptors: G.1.3 Numerical Analysis]: Numerical Linear Algebra; G.4 Mathematical Software] Additional Key Words and Phrases: rankrevealingorthogonalfactorization, numericalrank, block algorithm, QR factorization, leastsquares systems This work was supported by the Applied and Computational Mathematics Program, Advanced Research Projects Agency, under contracts DM28E04120 and P95006. Bischof was also supported by the Mathematical, Information, and Computational Sciences Division subprogram of the O ce of Computational and Technology Research, U. S. Department of Energy, under Contract W31109Eng38. Quintana also received support through the European ESPRIT Project 9072{GEPPCOM and the Spanish Research Agency CICYT under grant TIC911157C0302. During part of this work, Quintana was a research fellow of the Spanish Ministry of Education and Science of the Valencian Governmentat the Universidad Politecnica de Valencia and a visiting scientist at the Mathematics and Computer Science Division at Argonne National Laboratory. 2 We brie y summarize the properties of a rankrevealing QR (RRQR) factorization. Let A be an m n matrix (w.l.o.g. m n) with singular values : : : n 0; (1) 1 2 and de ne the numerical rank r of A with respect to a threshold as follows: 1 < 1: Also, let A have a QR factorization of the form 11 R (2) A P = QR = Q R0 R12 ; 22 where P is a permutation matrix, Q has orthonormal columns, R is upper triangular, and R11 is of order r. Further, let (A) denote the twonorm condition number of a matrix A. We then say that (2) is an RRQR factorization of A if the following properties are satis ed: (R11) 1 = r and kR22k2 = max (R22) r+1 : (3) Whenever there is a welldetermined gap in the singularvalue spectrum between r and r+1 , and hence the numerical rank r is well de ned, the RRQR factorization (2) reveals the numerical rank of A by having a wellconditioned leading submatrix R11 and a trailing submatrix R22 of small norm. We also note that the matrix ?1 R P T R11 I 12 ; ? which can be easily computed from (2), is usually a good approximation of the nullvectors, and a few steps of subspace iteration su ce to compute nullvectors that are correct to working precision Chan and Hansen 1992]. The RRQR factorization is a valuable tool in numerical linear algebra because it provides accurate information about rank and numerical nullspace. Its main use arises in the solution of rankde cient leastsquares problems, for example, in geodesy Golub et al. 1986], computeraided design Grandine 1989], nonlinear leastsquares problems More 1978], the solution of integral equations Elden and Schreiber 1986], and the calculation of splines Grandine 1987]. Other applications arise in beam forming Bischof and Shro 1992], spectral estimation Hsieh et al. 1991], and regularization Hansen 1990; Hansen et al. 1992; Walden 1991]. Stewart 1990] suggested another alternative to the singular value decomposition, a complete orthogonal decomposition called URV decomposition. This factorization decomposes R A = U R011 R12 V T ; 22 where U and V are orthogonal and both kR12k2 and kR22k2 are of the order r+1 . In particular, compared with RRQR factorizations, URV decompositions employ a general orthogonal matrix V instead of the permutation matrix P . URV decompositions are more expensive to compute, but they are well suited for nullspace updating. RRQR factorizations, on the other hand, are more suited for the leastsquares
r r+1 1. INTRODUCTION 3 setting, since one need not store the orthogonal matrix V (the other orthogonal matrix is usually applied to the righthand side \on the y"). Of course, RRQR factorizations can be used to compute an initial URV decomposition, where U = Q and V = P . We brie y review the history of RRQR algorithms. From the interlacing theorem for singular values Golub and Loan 1983, Corollary 8.3.3], we have min (R(1 : k; 1 : k)) k (A) and max (R(k + 1 : n; k + 1 : n)) k+1(A) : (4) Hence, to satisfy condition (3), we need to pursue two tasks: Task 1. Find a permutation P that maximizes min(R11). Task 2. Find a permutation P that minimizes max (R22). Golub 1965] suggested what is commonly called the \QR factorization with column pivoting." Given a set of already selected columns, this algorithm chooses as the next pivot column the one that is \farthest away" in the Euclidean norm from the subspace spanned by the columns already chosen Golub and Loan 1983, p.168, P.6.4{5]. This intuitive strategy addresses task 1. While this greedy algorithm is known to fail on the socalled Kahan matrices Golub and Loan 1989, p. 245, Example 5.5.1], it works well in practice and forms the basis of the LINPACK Dongarra et al. 1979] and LAPACK Anderson et al. 1992a; Anderson et al. 1994b] implementations. Recently, QuintanaOrt , Sun, and Bischof 1995] developed an implementation of the Golub algorithm that allows half of the work to be performed with BLAS3 kernels. Bischof also had developed restrictedpivoting variants of the Golub strategy to enable the use of BLAS3 type kernels Bischof 1989] for almost all of the work and to reduce communication cost on distributedmemory machines Bischof 1991]. One approach to task 2 is based, in essence, on the following fact, which is proved in Chan and Hansen 1992]. Lemma 1. For any R 2 IRn n and any W = W1 2 IRn p with a nonsingular W2 W2 2 IRp p , we have kR(n ? p + 1: n; n ? p + 1: n)k2 kRW k2 kW2?1k2: (5) This means that if we can determine a matrix W with p linearly independent columns, all of which lie approximately in the nullspace of R (i.e., kR W k2 is small), and if W2 is well conditioned such that ( min (W2 ))?1 = kW2?1 k2 is not large, we are guaranteed that the elements of the bottom right p p block of R will be small. Algorithms based on computing wellconditioned nullspace bases for A include these by Golub, Klema, and Stewart 1976], Chan 1987], and Foster 1986]. Other algorithms addressing task 2 are these by Stewart 1984] and Gragg and Stewart 1976]. Algorithms addressing task 1 include those of Chan and Hansen 1994] and Golub, Klema, and Stewart 1976]. In fact, the latter achieves both task 1 and task 2 and, therefore, reveals the rank, but it is too expensive in comparison with the others. Bischof and Hansen combined a restrictedpivoting strategy with Chan's algorithm Chan 1987] to arrive at an algorithm for sparse matrices Bischof and Hansen 4 where p1 and p2 are loworder polynomials in n and r (versus an exponential factor in Chan's algorithm). Chandrasekaran and Ipsen 1994] were the rst to develop RRQR algorithms that satisfy (8) and (9). Their paper also reviews and provides a common framework for the previously devised strategies. In particular, they introduce the socalled uni cation principle, which says that running a task1 algorithm on the rows of the inverse of the matrix yields a task2 algorithm. They suggest hybrid algorithms that alternate between task1 and task2 steps to re ne the separation of the singular values of R. Pan and Tang 1992] and Gu and Eisenstat 1992] presented di erent classes of algorithms for achieving (8) and (9), addressing the possibility of nontermination of the algorithms because of oatingpoint inaccuracies. The goal of our work was to develop an e cient and reliable RRQR algorithm and implementation suitable for inclusion in a numerical library such as LAPACK. Speci cally, we wished to develop an implementation that was both reliable and close in performance to the QR factorization without any pivoting. Such an implementation would provide algorithm developers with an e cient tool for addressing potential numerical rank de ciency by minimizing the computational penalty for addressing potential rank de ciency. Our strategy involves the following ingredients: an e cient block algorithm for computing an approximate RRQR factorization, based on the work by Bischof 1989], and e cient implementations of RRQR algorithms well suited for triangular matrices, based on the work by Chandrasekaran and Ipsen 1994] and Pan and Tang 1992]. These algorithms seemed better suited for triangular matrices than those suggested by Gu and Eisenstat 1992]. We nd that 1991] and also developed a block variant of Chan's algorithm Bischof and Hansen 1992]. A Fortran 77 implementation of Chan's algorithm was provided by Reichel and Gragg 1990]. Chan's algorithm Chan 1987] guaranteed i p (R(1 : i; 1 : i)) i (6) n(n ? i + 1)2n?i min and p (7) i max (R(i : n; i : n)) i n(n ? i + 1)2n?i : That is, as long as the rank of the matrix is close to n, the algorithm is guaranteed to produce reliable bounds, but reliability may decrease with the rank of the matrix. Hong and Pan 1992] then showed that there exists a permutation matrix P such that for the triangular factor R partitioned as in (2), we have jjR22jj2 r+1 (A)p1 (r; n) (8) and 1 (9) min (R11) r (A) p (r; n) ;
2 5 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. P = I; foreach i 2 f1; : :: ; ng do resi = ka(:;i)k2 end do for i = 1 to min(m; n) do end foreach end for Let i pvt n be such that respvt is maximal P (:;i) $ P (:;pvt) ; a(: ; i) $ a(: ;pvt) ; respvt := resi ; ui ; a(i : m; i)] := genhh(a(i : m ; i)); a(i : m;i +1: n) := apphh(ui ; a(i : m; i +1: n)); foreach j 2 fi + 1; : :: ; ng do p resj := res2 ? a(i;j )2 ; j Fig. 1. The QR Factorization Algorithm with Traditional Column Pivoting in most cases the approximate RRQR factorization computed by the block algorithm is very close to the desired RRQR factorization, requiring little postprocessing, and the almost entirely BLAS3 preprocessing algorithm performs considerably faster than the QR factorization with column pivoting and close to the performance of the QR factorization without pivoting. The paper is structured as follows. In the next section, we review the block algorithm for computing an approximate RRQR factorization based on a restrictedpivoting approach. In Section 3, we describe our modi cations to Chandrasekaran and Ipsen's \HybridIII" algorithm and Pan and Tang's \Algorithm 3." Section 4 presents our experimental results on IBM RS/6000 and SGI R8000 platforms. In Section 5, we summarize our results. In this section, we describe a block QR factorization algorithm that employs a restricted pivoting strategy to approximately compute an RRQR factorization, employing the ideas described by Bischof 1989]. We compute Q by a sequence of Householder matrices H H (u) = I ? 2uuT ; kuk2 = 1: (10) For any given vector x, we can choose a vector u so that H (u)x = e1 , where e1 is the rst canonical unit vector and j j = kxk2 (see, for example, Golub and Loan 1989, p. 196]). The application of a Householder matrix B := H (u)A involves a matrixvector product z := AT u and a rankone update B := A ? 2uz T . Figure 1 describes the Golub Householder QR factorization algorithm with traditional column pivoting Golub 1965] for computing the QR decomposition of an m n matrix A. The primitive operation u; y] := genhh(x) computes u such that y = H (u)x is a multiple of e1 , while the primitive operation B := apphh(u; A) overwrites B with H (u)A. After step i is completed, the values resj ; j = i + 1; : : :; n are the length of the projections of the j th column of the currently permuted AP onto the orthogonal complement of the subspace spanned by the rst i columns of AP . The values resj can be updated easily and do not have to be recomputed at every step, although
2. A BLOCK QR FACTORIZATION WITH RESTRICTED PIVOTING 6 XX % @@ XX % \ee @@ BLOCK d XX % \ e @ o XXpivot\ UP @ n % \\\ee XX e XX %.eDATE \\ %.. e \ \ Fig. 2. Restricting Pivoting for a Block Algorithm roundo errors may make it necessary to recompute resj = k(a(i : m; j ))k2 ; j = i + 1; : : :; n periodically Dongarra et al. 1979, p. 9.17] (we suppressed this detail in line 9 of Figure 1). The bulk of the computationalwork in this algorithmis performed in the apphh kernel, which relies on matrixvector operations. However, on today's cachebased architectures (ranging from workstations to supercomputers) matrixmatrix operations perform much better. Matrixmatrix operations are exploited by using socalled block algorithms, whose toplevel unit of computation is matrix blocks instead of vectors. Such algorithms play a central role, for example, in the LAPACK implementations Anderson et al. 1992a; Anderson et al. 1994b]. LAPACK employs the socalled compact WY representation of products of Householder matrices Schreiber and Van Loan 1989], which expresses the product Q = H1 Hnb of a series of m m Householder matrices (10) as Q = I + Y TY T ; (11) where Y is an m nb matrix and T is an nb nb upper triangular matrix. Stable implementations for generating Householder vectors as well as forming and applying compact WY factors are provided in LAPACK. To arrive at a block QR factorization algorithm, we would like to avoid updating part of A until several Householder transformations have been computed. This strategy is impossible with traditional pivoting, since we must update resj before we can choose the next pivot column. While we can modify the traditional approach to do half of the work using block transformations, this is the best we can do (these issues are discussed in detail in QuintanaOrt et al. 1995]). Therefore, we instead limit the scope of pivoting as suggested in Bischof 1989]. Thus, we do not have to update the remaining columns until we have computed enough Householder transformations to make a block update worthwhile. The idea is graphically depicted in Figure 2. At a given stage we are done with the columns to the left of the pivot window. We then try to select the next pivot columns exclusively from the columns in the pivot window, not touching the part of A to the right of the pivot window. Only when we have combined the Householder vectors de ned by the next batch of pivot columns into a compact WY factor do we apply this block update to the columns on the right. Since the leading block of R is supposed to approximate the large singular values 7 of A, we must be able to guard against pivot columns that are close to the span of columns already selected. That is, given the upper triangular matrix Ri de ned by the rst i columns of QT AP and a new column v determined by the new candidate pivot column, we must determine whether Ri+1 = Ri v 0 has a condition number that is larger than a threshold , which de nes what we consider a rankde cient matrix. We approximate 1 (12) max (Ri+1 ) bmax (Ri+1 ) n 3 1maxi kR(1 : k; k)k2 ; k which is easy to compute. To cheaply estimate min(Ri+1 ), we employ incremental condition estimation (ICE) Bischof 1990; Bischof and Tang 1991]. Given a good estimate bmin (Ri) = 1=kxk2 de ned by a large norm solution x to RT x = d ; kdk2 = i v , incremental condition estimation, with only 3k ops, 1 and a new column computes s and c, s2 + c2 = 1, such that sx (13) min (Ri+1) bmin(Ri+1 ) = 1=k c k2 : A stable implementation of ICE based on the formulation in Bischof and Tang 1991] is provided by the LAPACK routine xLAIC1.1 ICE is an order of magnitude cheaper than other condition estimators (see, for example, Higham 1986]). Moreover, it is considerably more reliable than simply using j j as an estimate for min (Ri+1) (see, for example, Bischof 1991]). We also de ne (R ) (14) b(Ri) bmax(R i) : bmin i The restricted block pivoting algorithm proceeds in four phases: Phase 1: Pivoting of largest column into rst position. This phase is motivated by the fact that the norm of the largest column of A is usually a good estimate for 1(A). Phase 2: Block QR factorization with restricted pivoting. Given a desired block size nb and a window size ws, ws nb, we try to generate nb Householder transformations by applying the Golub pivoting strategy only to the columns in the pivot window, using ICE to assess the impact of a column selection on the condition number. When the pivot column chosen from the pivot window would lead to a leading triangular factor whose condition number exceeds , we mark all remaining columns in the pivot window (k, say) as \rejected," pivot them to the end of the matrix, generate a block transformation (of width not more than nb), apply it to the remainder of the matrix, and then reposition the pivot window to encompass
one of the four di erent precision instantiations: SLAIC1, DLAIC1, CLAIC1, or ZLAIC1. 1 Here as in the sequel we use the conventionthat the pre x \x" genericallyrefers to the appropriate 8 the next ws notyetrejected columns. When all columns have been either accepted as part of the leading triangular factor or rejected at some stage of the algorithm, this phase stops. Assuming we have included r2 columns in the leading triangular factor, we have at this point computed an r2 r2 upper triangular matrix Rr2 = R(1 : r2; 1 : r2) that satis es (15) b(Rr2 ) : That is, r2 is our estimate of the numerical rank with respect to the threshold at this point. In our experiments, we chose (16) ws = nb + maxf10; nb + 0:05ng : 2 This choice tries to ensure a suitable pivot window and \loosens up" a bit as the matrix size increases. A pivot window that is too large will reduce performance because of the overhead in generating block orthogonal transformations and the larger number of unblocked operations. On the other hand, a pivot window that is too small will reduce the pivoting exibility and thus increase the likelihood that the restricted pivoting strategy will fail to produce a good approximate RRQR factorization. In our experiments, the choice of w had only a small impact (not more than 5%) on overall performance and negligible impact on the numerical behavior. Phase 3: Traditional pivoting strategy among \rejected" columns. Since phase 2 rejects all remaining columns in the pivot window when the pivot candidate is rejected, a column may have been pivoted to the end that should not have been rejected. Hence, we now continue with the traditional Golub pivoting strategy on the remaining n ? r2 columns, updating (14) as an estimate of the condition number. This phase ends at column r3, say, where (17) b(Rr3 ) ; and the inclusion of the next pivot column would have pushed the condition number beyond the thr...
View
Full Document
 Fall '08
 Chandrasekara
 Computer Science, Golub, QR factorization, Bischof

Click to edit the document details