Ans 5 - v2

# Ans 5 - v2 - Computing Rank-Revealing QR Factorizations of...

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Computing Rank-Revealing QR Factorizations of Dense Matrices Christian H. Bischof Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S. Cass Ave., Argonne, IL 60439. bischof@mcs.anl.gov and Gregorio Quintana-Ort Departamento de Informatica, Universidad Jaime I, Campus Penyeta Roja, 12071 Castellon, Spain. gquintan@inf.uji.es. We develop algorithms and implementations for computing rank-revealing QR (RRQR) factorizations of dense matrices. First, we develop an e cient block algorithm for approximating an RRQR factorization,employinga windowed version of the commonlyused Golub pivoting strategy, safeguarded by incremental condition estimation. Second, we develop e ciently implementable variants of guaranteed reliable RRQR algorithms for triangular matrices originally suggested by Chandrasekaran and Ipsen, and by Pan and Tang. We suggest algorithmic improvements with respect to condition estimation, termination criteria, and Givens updating. By combining the block algorithm with one of the triangular postprocessing steps, we arrive at an e cient and reliable algorithm for computing an RRQR factorization of a dense matrix. Experimental results on IBM RS/6000 and SGI R8000 platforms show that this approach performs up to three times faster than the less reliable QR factorization with column pivoting as it is currently implemented in LAPACK, and comes within 15% of the performance of the LAPACK block algorithm for computing a QR factorization without any column exchanges. Thus, we expect this routine to be useful in many circumstances where numerical rank de ciency cannot be ruled out but currently has been ignored because of the computational cost of dealing with it. Categories and Subject Descriptors: G.1.3 Numerical Analysis]: Numerical Linear Algebra; G.4 Mathematical Software] Additional Key Words and Phrases: rank-revealingorthogonalfactorization, numericalrank, block algorithm, QR factorization, least-squares systems This work was supported by the Applied and Computational Mathematics Program, Advanced Research Projects Agency, under contracts DM28E04120 and P-95006. Bischof was also supported by the Mathematical, Information, and Computational Sciences Division subprogram of the O ce of Computational and Technology Research, U. S. Department of Energy, under Contract W-31-109-Eng-38. Quintana also received support through the European ESPRIT Project 9072{GEPPCOM and the Spanish Research Agency CICYT under grant TIC-91-1157-C03-02. During part of this work, Quintana was a research fellow of the Spanish Ministry of Education and Science of the Valencian Governmentat the Universidad Politecnica de Valencia and a visiting scientist at the Mathematics and Computer Science Division at Argonne National Laboratory. 2 We brie y summarize the properties of a rank-revealing QR (RRQR) factorization. Let A be an m n matrix (w.l.o.g. m n) with singular values : : : n 0; (1) 1 2 and de ne the numerical rank r of A with respect to a threshold as follows: 1 < 1: Also, let A have a QR factorization of the form 11 R (2) A P = QR = Q R0 R12 ; 22 where P is a permutation matrix, Q has orthonormal columns, R is upper triangular, and R11 is of order r. Further, let (A) denote the two-norm condition number of a matrix A. We then say that (2) is an RRQR factorization of A if the following properties are satis ed: (R11) 1 = r and kR22k2 = max (R22) r+1 : (3) Whenever there is a well-determined gap in the singular-value spectrum between r and r+1 , and hence the numerical rank r is well de ned, the RRQR factorization (2) reveals the numerical rank of A by having a well-conditioned leading submatrix R11 and a trailing submatrix R22 of small norm. We also note that the matrix ?1 R P T R11 I 12 ; ? which can be easily computed from (2), is usually a good approximation of the nullvectors, and a few steps of subspace iteration su ce to compute nullvectors that are correct to working precision Chan and Hansen 1992]. The RRQR factorization is a valuable tool in numerical linear algebra because it provides accurate information about rank and numerical nullspace. Its main use arises in the solution of rank-de cient least-squares problems, for example, in geodesy Golub et al. 1986], computer-aided design Grandine 1989], nonlinear least-squares problems More 1978], the solution of integral equations Elden and Schreiber 1986], and the calculation of splines Grandine 1987]. Other applications arise in beam forming Bischof and Shro 1992], spectral estimation Hsieh et al. 1991], and regularization Hansen 1990; Hansen et al. 1992; Walden 1991]. Stewart 1990] suggested another alternative to the singular value decomposition, a complete orthogonal decomposition called URV decomposition. This factorization decomposes R A = U R011 R12 V T ; 22 where U and V are orthogonal and both kR12k2 and kR22k2 are of the order r+1 . In particular, compared with RRQR factorizations, URV decompositions employ a general orthogonal matrix V instead of the permutation matrix P . URV decompositions are more expensive to compute, but they are well suited for nullspace updating. RRQR factorizations, on the other hand, are more suited for the least-squares r r+1 1. INTRODUCTION 3 setting, since one need not store the orthogonal matrix V (the other orthogonal matrix is usually applied to the right-hand side \on the y"). Of course, RRQR factorizations can be used to compute an initial URV decomposition, where U = Q and V = P . We brie y review the history of RRQR algorithms. From the interlacing theorem for singular values Golub and Loan 1983, Corollary 8.3.3], we have min (R(1 : k; 1 : k)) k (A) and max (R(k + 1 : n; k + 1 : n)) k+1(A) : (4) Hence, to satisfy condition (3), we need to pursue two tasks: Task 1. Find a permutation P that maximizes min(R11). Task 2. Find a permutation P that minimizes max (R22). Golub 1965] suggested what is commonly called the \QR factorization with column pivoting." Given a set of already selected columns, this algorithm chooses as the next pivot column the one that is \farthest away" in the Euclidean norm from the subspace spanned by the columns already chosen Golub and Loan 1983, p.168, P.6.4{5]. This intuitive strategy addresses task 1. While this greedy algorithm is known to fail on the so-called Kahan matrices Golub and Loan 1989, p. 245, Example 5.5.1], it works well in practice and forms the basis of the LINPACK Dongarra et al. 1979] and LAPACK Anderson et al. 1992a; Anderson et al. 1994b] implementations. Recently, Quintana-Ort , Sun, and Bischof 1995] developed an implementation of the Golub algorithm that allows half of the work to be performed with BLAS-3 kernels. Bischof also had developed restricted-pivoting variants of the Golub strategy to enable the use of BLAS-3 type kernels Bischof 1989] for almost all of the work and to reduce communication cost on distributed-memory machines Bischof 1991]. One approach to task 2 is based, in essence, on the following fact, which is proved in Chan and Hansen 1992]. Lemma 1. For any R 2 IRn n and any W = W1 2 IRn p with a nonsingular W2 W2 2 IRp p , we have kR(n ? p + 1: n; n ? p + 1: n)k2 kRW k2 kW2?1k2: (5) This means that if we can determine a matrix W with p linearly independent columns, all of which lie approximately in the nullspace of R (i.e., kR W k2 is small), and if W2 is well conditioned such that ( min (W2 ))?1 = kW2?1 k2 is not large, we are guaranteed that the elements of the bottom right p p block of R will be small. Algorithms based on computing well-conditioned nullspace bases for A include these by Golub, Klema, and Stewart 1976], Chan 1987], and Foster 1986]. Other algorithms addressing task 2 are these by Stewart 1984] and Gragg and Stewart 1976]. Algorithms addressing task 1 include those of Chan and Hansen 1994] and Golub, Klema, and Stewart 1976]. In fact, the latter achieves both task 1 and task 2 and, therefore, reveals the rank, but it is too expensive in comparison with the others. Bischof and Hansen combined a restricted-pivoting strategy with Chan's algorithm Chan 1987] to arrive at an algorithm for sparse matrices Bischof and Hansen 4 where p1 and p2 are low-order polynomials in n and r (versus an exponential factor in Chan's algorithm). Chandrasekaran and Ipsen 1994] were the rst to develop RRQR algorithms that satisfy (8) and (9). Their paper also reviews and provides a common framework for the previously devised strategies. In particular, they introduce the so-called uni cation principle, which says that running a task-1 algorithm on the rows of the inverse of the matrix yields a task-2 algorithm. They suggest hybrid algorithms that alternate between task-1 and task-2 steps to re ne the separation of the singular values of R. Pan and Tang 1992] and Gu and Eisenstat 1992] presented di erent classes of algorithms for achieving (8) and (9), addressing the possibility of nontermination of the algorithms because of oating-point inaccuracies. The goal of our work was to develop an e cient and reliable RRQR algorithm and implementation suitable for inclusion in a numerical library such as LAPACK. Speci cally, we wished to develop an implementation that was both reliable and close in performance to the QR factorization without any pivoting. Such an implementation would provide algorithm developers with an e cient tool for addressing potential numerical rank de ciency by minimizing the computational penalty for addressing potential rank de ciency. Our strategy involves the following ingredients: |an e cient block algorithm for computing an approximate RRQR factorization, based on the work by Bischof 1989], and |e cient implementations of RRQR algorithms well suited for triangular matrices, based on the work by Chandrasekaran and Ipsen 1994] and Pan and Tang 1992]. These algorithms seemed better suited for triangular matrices than those suggested by Gu and Eisenstat 1992]. We nd that 1991] and also developed a block variant of Chan's algorithm Bischof and Hansen 1992]. A Fortran 77 implementation of Chan's algorithm was provided by Reichel and Gragg 1990]. Chan's algorithm Chan 1987] guaranteed i p (R(1 : i; 1 : i)) i (6) n(n ? i + 1)2n?i min and p (7) i max (R(i : n; i : n)) i n(n ? i + 1)2n?i : That is, as long as the rank of the matrix is close to n, the algorithm is guaranteed to produce reliable bounds, but reliability may decrease with the rank of the matrix. Hong and Pan 1992] then showed that there exists a permutation matrix P such that for the triangular factor R partitioned as in (2), we have jjR22jj2 r+1 (A)p1 (r; n) (8) and 1 (9) min (R11) r (A) p (r; n) ; 2 5 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. P = I; foreach i 2 f1; : :: ; ng do resi = ka(:;i)k2 end do for i = 1 to min(m; n) do end foreach end for Let i pvt n be such that respvt is maximal P (:;i) $P (:;pvt) ; a(: ; i)$ a(: ;pvt) ; respvt := resi ; ui ; a(i : m; i)] := genhh(a(i : m ; i)); a(i : m;i +1: n) := apphh(ui ; a(i : m; i +1: n)); foreach j 2 fi + 1; : :: ; ng do p resj := res2 ? a(i;j )2 ; j Fig. 1. The QR Factorization Algorithm with Traditional Column Pivoting |in most cases the approximate RRQR factorization computed by the block algorithm is very close to the desired RRQR factorization, requiring little postprocessing, and |the almost entirely BLAS-3 preprocessing algorithm performs considerably faster than the QR factorization with column pivoting and close to the performance of the QR factorization without pivoting. The paper is structured as follows. In the next section, we review the block algorithm for computing an approximate RRQR factorization based on a restrictedpivoting approach. In Section 3, we describe our modi cations to Chandrasekaran and Ipsen's \Hybrid-III" algorithm and Pan and Tang's \Algorithm 3." Section 4 presents our experimental results on IBM RS/6000 and SGI R8000 platforms. In Section 5, we summarize our results. In this section, we describe a block QR factorization algorithm that employs a restricted pivoting strategy to approximately compute an RRQR factorization, employing the ideas described by Bischof 1989]. We compute Q by a sequence of Householder matrices H H (u) = I ? 2uuT ; kuk2 = 1: (10) For any given vector x, we can choose a vector u so that H (u)x = e1 , where e1 is the rst canonical unit vector and j j = kxk2 (see, for example, Golub and Loan 1989, p. 196]). The application of a Householder matrix B := H (u)A involves a matrix-vector product z := AT u and a rank-one update B := A ? 2uz T . Figure 1 describes the Golub Householder QR factorization algorithm with traditional column pivoting Golub 1965] for computing the QR decomposition of an m n matrix A. The primitive operation u; y] := genhh(x) computes u such that y = H (u)x is a multiple of e1 , while the primitive operation B := apphh(u; A) overwrites B with H (u)A. After step i is completed, the values resj ; j = i + 1; : : :; n are the length of the projections of the j th column of the currently permuted AP onto the orthogonal complement of the subspace spanned by the rst i columns of AP . The values resj can be updated easily and do not have to be recomputed at every step, although 2. A BLOCK QR FACTORIZATION WITH RESTRICTED PIVOTING 6 XX % @@ XX % \ee @@ BLOCK d XX % \ e @ o XXpivot\ UP- @ n % \\\ee XX e XX %.eDATE \\ %.. e \ \ Fig. 2. Restricting Pivoting for a Block Algorithm roundo errors may make it necessary to recompute resj = k(a(i : m; j ))k2 ; j = i + 1; : : :; n periodically Dongarra et al. 1979, p. 9.17] (we suppressed this detail in line 9 of Figure 1). The bulk of the computationalwork in this algorithmis performed in the apphh kernel, which relies on matrix-vector operations. However, on today's cache-based architectures (ranging from workstations to supercomputers) matrix-matrix operations perform much better. Matrix-matrix operations are exploited by using so-called block algorithms, whose top-level unit of computation is matrix blocks instead of vectors. Such algorithms play a central role, for example, in the LAPACK implementations Anderson et al. 1992a; Anderson et al. 1994b]. LAPACK employs the so-called compact WY representation of products of Householder matrices Schreiber and Van Loan 1989], which expresses the product Q = H1 Hnb of a series of m m Householder matrices (10) as Q = I + Y TY T ; (11) where Y is an m nb matrix and T is an nb nb upper triangular matrix. Stable implementations for generating Householder vectors as well as forming and applying compact WY factors are provided in LAPACK. To arrive at a block QR factorization algorithm, we would like to avoid updating part of A until several Householder transformations have been computed. This strategy is impossible with traditional pivoting, since we must update resj before we can choose the next pivot column. While we can modify the traditional approach to do half of the work using block transformations, this is the best we can do (these issues are discussed in detail in Quintana-Ort et al. 1995]). Therefore, we instead limit the scope of pivoting as suggested in Bischof 1989]. Thus, we do not have to update the remaining columns until we have computed enough Householder transformations to make a block update worthwhile. The idea is graphically depicted in Figure 2. At a given stage we are done with the columns to the left of the pivot window. We then try to select the next pivot columns exclusively from the columns in the pivot window, not touching the part of A to the right of the pivot window. Only when we have combined the Householder vectors de ned by the next batch of pivot columns into a compact WY factor do we apply this block update to the columns on the right. Since the leading block of R is supposed to approximate the large singular values 7 of A, we must be able to guard against pivot columns that are close to the span of columns already selected. That is, given the upper triangular matrix Ri de ned by the rst i columns of QT AP and a new column v determined by the new candidate pivot column, we must determine whether Ri+1 = Ri v 0 has a condition number that is larger than a threshold , which de nes what we consider a rank-de cient matrix. We approximate 1 (12) max (Ri+1 ) bmax (Ri+1 ) n 3 1maxi kR(1 : k; k)k2 ; k which is easy to compute. To cheaply estimate min(Ri+1 ), we employ incremental condition estimation (ICE) Bischof 1990; Bischof and Tang 1991]. Given a good estimate bmin (Ri) = 1=kxk2 de ned by a large norm solution x to RT x = d ; kdk2 = i v , incremental condition estimation, with only 3k ops, 1 and a new column computes s and c, s2 + c2 = 1, such that sx (13) min (Ri+1) bmin(Ri+1 ) = 1=k c k2 : A stable implementation of ICE based on the formulation in Bischof and Tang 1991] is provided by the LAPACK routine xLAIC1.1 ICE is an order of magnitude cheaper than other condition estimators (see, for example, Higham 1986]). Moreover, it is considerably more reliable than simply using j j as an estimate for min (Ri+1) (see, for example, Bischof 1991]). We also de ne (R ) (14) b(Ri) bmax(R i) : bmin i The restricted block pivoting algorithm proceeds in four phases: Phase 1: Pivoting of largest column into rst position. This phase is motivated by the fact that the norm of the largest column of A is usually a good estimate for 1(A). Phase 2: Block QR factorization with restricted pivoting. Given a desired block size nb and a window size ws, ws nb, we try to generate nb Householder transformations by applying the Golub pivoting strategy only to the columns in the pivot window, using ICE to assess the impact of a column selection on the condition number. When the pivot column chosen from the pivot window would lead to a leading triangular factor whose condition number exceeds , we mark all remaining columns in the pivot window (k, say) as \rejected," pivot them to the end of the matrix, generate a block transformation (of width not more than nb), apply it to the remainder of the matrix, and then reposition the pivot window to encompass one of the four di erent precision instantiations: SLAIC1, DLAIC1, CLAIC1, or ZLAIC1. 1 Here as in the sequel we use the conventionthat the pre x \x" genericallyrefers to the appropriate 8 the next ws not-yet-rejected columns. When all columns have been either accepted as part of the leading triangular factor or rejected at some stage of the algorithm, this phase stops. Assuming we have included r2 columns in the leading triangular factor, we have at this point computed an r2 r2 upper triangular matrix Rr2 = R(1 : r2; 1 : r2) that satis es (15) b(Rr2 ) : That is, r2 is our estimate of the numerical rank with respect to the threshold at this point. In our experiments, we chose (16) ws = nb + maxf10; nb + 0:05ng : 2 This choice tries to ensure a suitable pivot window and \loosens up" a bit as the matrix size increases. A pivot window that is too large will reduce performance because of the overhead in generating block orthogonal transformations and the larger number of unblocked operations. On the other hand, a pivot window that is too small will reduce the pivoting exibility and thus increase the likelihood that the restricted pivoting strategy will fail to produce a good approximate RRQR factorization. In our experiments, the choice of w had only a small impact (not more than 5%) on overall performance and negligible impact on the numerical behavior. Phase 3: Traditional pivoting strategy among \rejected" columns. Since phase 2 rejects all remaining columns in the pivot window when the pivot candidate is rejected, a column may have been pivoted to the end that should not have been rejected. Hence, we now continue with the traditional Golub pivoting strategy on the remaining n ? r2 columns, updating (14) as an estimate of the condition number. This phase ends at column r3, say, where (17) b(Rr3 ) ; and the inclusion of the next pivot column would have pushed the condition number beyond the threshold. We do not expect many columns (if any) to be selected in this phase. It is mainly intended as a cheap safeguard against possible failure of the initial restricted-pivoting strategy. Phase 4: Block QR factorization without pivoting on remaining columns. The columns not yet factored (columns r3 + 1 : n) are with great probability linearly dependent on the previous ones, since they have been rejected in both phase 2 and phase 3. Hence, it is unlikely that any kind of column exchanges among the remaining columns would change our rank estimate, and the standard BLAS-3 block QR factorization as implemented in the LAPACK routine xGEQRF is the fastest way to complete the triangularization. After the completion of phase 4, we have computed a QR factorization that satis es b(Rr3 ) ; 9 and for any column y in R(:; r3 + 1 : n) we have b( R0r3 ; y) > : This result suggests that this QR factorization is a good approximation to an RRQR factorization and r3 is a good estimate of the rank. However, this QR factorization does not guarantee to reveal the numerical rank correctly. Thus, we back up this algorithm with the guaranteed reliable RRQR implementations introduced in the next two sections. In 1991, Chandrasekaran and Ipsen 1994] introduced a uni ed framework for RRQR algorithms and developed an algorithm guaranteed to satisfy (8) and (9) and thus to properly reveal the rank. Their algorithm assumes that the initial matrix is triangular and thus is well suited as a postprocessing step to the algorithm presented in the preceding section. Shortly thereafter, Pan and Tang 1992] introduced another guaranteed reliable RRQR algorithm for triangular matrices. In the following subsections, we describe our improvements and implementations of these algorithms. We implement a variant of what Pan and Tang 1992] call \Algorithm 3." Pseudocode for our algorithm is shown in Figure 3. It assumes as input an upper triangular matrix R. R (i; j ); i < j , denotes a right cyclic permutation that exchanges columns i and j , in other words, i ! i + 1; : : :; j ? 1 ! j; j ! i, whereas L ; i < j denotes a left cyclic permutation that exchanges columns i and j , in i;j other words, j i; i i + 1; : : :; j ? 1 j . In the algorithm, triu(A) denotes the upper triangular factor R in a QR factorization A = QR of A. As can be seen from Figure 3, we use this notation as shorthand for retriangularizations of R after column exchanges. p Given a value for k, and a so-called f -factor 0 < f 1= k + 1, the algorithm is guaranteed to halt and produce a triangular factorization that satis es pf (A) (18) min (R11) k(n ? k + 1) k p (k + 1)(n ? k) (19) max (R22) k+1(A) : f Our implementation incorporates the following features: (1) Incremental condition estimation is used to arrive at estimates for smallest singular values and vectors. Thus, (line 5) and v (line 9) of Figure 3 can be computed inexpensively from u (line 2). The use of ICE signi cantly reduces implementation cost. (2) The QR factorization update (line 4) must be performed only when the if-test (line 6) is false. Thus, we delay it if possible. 3.1 The RRQR Algorithm by Pan and Tang 3. POSTPROCESSING ALGORITHMS FOR AN APPROXIMATE RRQR FACTORIZATION 10 Algorithm PT3M(f,k) 1. i = k + 1; accepted col = 0; = I ; 2. u := left singular vector corresponding to min(R(1: k; 1: k)) 3. while ( accepted col n ? k ) do R 4. R := triu(R R+1;i ) ; := k k+1;i 5. Compute min (R(1: k + 1; 1: k + 1)) 6. if ( f jR(k + 1;k + 1)j) then 7. accepted col := accepted col + 1; 8. else 9. v := right singular vector corresponding to 10. Find index q, 1 q k + 1, such that: jvq j = maxj jvj j L ; accepted col = 0; 11. R := triu(R L +1 ) ; := q;k q;k+1 12. u := left singular vector corresponding to min (R(1: k; 1: k)) 13. end if 14. if (i == n) then i := k + 1 else i := i + 1 end if 15. end while Fig. 3. Variant of Pan/Tang RRQR Algorithm (3) For the algorithm to terminate, all columns need to be checked, and no new permutations must occur. In Pan and Tang's algorithm, rechecking of columns after a permutation always starts at column k + 1. We instead begin checking at the column right after the one that just caused a permutation. Thus, we rst concentrate on the columns that have not just been \worked over." (4) The left cyclic shift permutes the triangular matrix into an upper Hessenberg form, which is then retriangularized with Givens rotations. Applying Givens rotations to rows of R in the obvious fashion (as done, for example, in Reichel and Gragg 1990]) is expensive in terms of data movement, because of the column-oriented nature of Fortran data layout. Thus, we apply Givens rotations in an aggregated fashion, updating matrix strips (R(1 : jb; (j ? 1)b + 1 : jb)) of width b with all previously computed Givens rotations. Similarly, the right cyclic shift introduces a \spike" in column j , which is eliminated with Givens rotations in a bottom-up fashion. To aggregate Givens rotations, we rst compute all rotations only touching the \spike" and the diagonal of R, and then apply all of them one block column at a time. In our experiments, we choose the width b of the matrix strips to be the same as the blocksize nb of the preprocessing. Compared with a straightforward implementation of Pan and Tang's \Algorithm 3," improvements (1) through (3) on average decreased runtime by a factor of ve on 200 200 matrices on an Alliant FX/80. When retriangularizations were frequent, improvement (4) had the most noticeable impact, resulting in a twofold to fourfold performance gain on matrices of order 500 and 1000 on an IBM RS/6000-370. Pan and Tang introduced the f -factor to prevent cycling of the algorithm. The higher f is, the tighter are the bounds in (18) and (19), and the better the approximations to the k and k + 1st singular values of R. However, if f is too large, it introduces more column exchanges and therefore more iterations; and, because of p round-o errors, it might present convergence problems. We used f = 0:9= k + 1 11 Algorithm Hybrid-III-sf(f,k) 1. = I 2. repeat 3. Golub-I-sf(f,k) 4. Golub-I-sf(f,k+1) 5. Chan-II-sf(f,k+1) 6. Chan-II-sf(f,k) 7. until none of the four subalgorithms modi ed the column ordering Fig. 4. Variant of Chandrasekaran/Ipsen Hybrid-III algorithm Algorithm Golub-I-sf(f,k) 1. Find smallest index j , k j n, such that 2. kR(k: j; j )k2 = maxk i n kR(k: i; i)k2 3. if f kR(k: j; j )k2 >j R(k;k) j then R 4. R := triu(R R ); := k;j k;j 5. end if Fig. 5. \f-factor" Variant of Golub-I Algorithm in our work. Chandrasekaran and Ipsen introduced algorithms that achieve bounds (18) and (19) with f = 1. We implemented a variant of the so-called Hybrid-III algorithm, pseudocode for which is shown in Figures 4{6. Compared with the original Hybrid-III algorithm, our implementation incorporates the following features: (1) We employ the Chan-II strategy (an O(n2) algorithm) instead of the so-called Stewart-II strategy (an O(n3) algorithm because of the need for the inversion of R(1 : k; 1 : k)) that Ipsen and Chandrasekaran employed in their experiments. (2) The original Hybrid-III algorithm contained two subloops, with the rst one looping over Golub-I(k) and Chan-II(k) till convergence, the second one looping over Golub-I(k+1) and Chan-II(k+1). We present a di erent loop ordering in our variant, one that is simpler and seems to enhance convergence. On matrices that required considerable postprocessing, the new loop ordering required about 7% fewer steps for 1000 1000 matrices (one step being a call to Golub-I or Chan-II) than Chandrasekaran and Ipsen's original algorithm. In addition, the Algorithm Chan-II-sf(k) 1. v := right singular vector corresponding to min (R(1: k; 1: k)). 2. Find largest index j , 1 j k, such that: jvj j = max1 i k jvi j 3. if f jvj j > jvk j then L 4. R := triu(R L ); := j;k j;k 5. end if Fig. 6. \f-factor" Variant of Chan-II Algorithm 3.2 The RRQR Algorithm by Chandrasekaran and Ipsen 12 new ordering speeds detection of convergence, as shown below. (3) As in our implementationof the Pan/Tang algorithm, we use ICE for estimating singular values and vectors, and the e cient \aggregated" Givens scheme for the retriangularizations. (4) We employ a generalization of the f -factor technique to guarantee termination in the presence of rounding errors. The pivoting method assigns to every column a \weight," namely, kR(k: i; i)k2 in Golub-I(k) and vi in Chan-II(k), where v is the right singular vector corresponding to the smallest singular value of R(1: k; 1: k). To ensure termination, Chandrasekaran and Ipsen suggested pivoting a column only when its weight exceeded that of the current column by at least n2 , where is the computer precision; they did not analyze the impact of this change on the bounds obtained by the algorithm. In contrast, we use a multiplicative tolerance factor f like that of Pan and Tang; the analysis in Quintana-Ort and Quintana-Ort 1996] proves that our algorithm achieves the bounds 2 pf (A); and (20) min (R11) k(n ? k + 1) k p (k + 1)(n ? k) (21) max (R22) k+1(A) : f2 These bounds are identical to (18) and (19), except that an f 2 instead of an f enters into the equation and that now 0 < f 1. We used f = 0:5 in our implementation. We claimed before that the new loop ordering can avoid unnecessary steps when the algorithm is about to terminate. To illustrate, we apply Chandrasekaran and Ipsen's original ordering to a matrix that almost reveals the rank: 1. 2. 3. 4. 5. 6. 7. Golub-I(k) Chan-II(k) Golub-I(k) Final permutation occurs here. Now the rank is revealed. Another iteration of inner k-loop since permutation occurred. Chan-II(k) Golub-I(k+1) Inner loop for k + 1 Chan-II(k+1) Golub-I(k) Another iteration of the main loop since permutation occurred in last pass. 8. Chan-II(k) 9. Golub-I(k+1) 10. Chan-II(k+1) Termination In contrast, the Hybrid-III-sf algorithm terminates in four steps: 13 repeat call Hybrid-III-sf(f,k) or PT3M(f,k) Algorithm RRQR(f,k) := (R(1: k; 1: k)) := (R(1: k + 1; 1: k + 1)) if (( ) and ( > )) then rank := k; stop else if ( ( ) and ( ) )then k := k + 1 else if ( ( ) and ( ) )then k := k ? 1 end if Fig. 7. Algorithm for Computing Rank-Revealing QR Factorization 1. 2. 3. 4. Golub-I-sf(k) Final permutation Golub-I-sf(k+1) Chan-II-sf(k+1) Chan-II-sf(k) Termination 3.3 Determining the Numerical Rank As Stewart 1993] pointed out, both the Chandrasekaran/Ipsen and Pan/Tang algorithms, as well as our versions of those algorithms, do not reveal the rank of a matrix per se. Rather, given an integer k, they compute tight estimates for k (A) min (R(1: k; 1: k)) and k+1 (A) max (R(k + 1: n; k + 1: n)). To obtain the numerical rank with respect to a given threshold , given an initial estimate for the rank (as provided, for example, by the algorithm described in Section 2), we employ the algorithm shown in Figure 7. In our actual implementation, and are computed in Hybrid-III-sf or PT3M. We report in this section experimental results with the double-precision implementations of the algorithms presented in the preceding section. We consider the following codes: DGEQPF. The implementation of the QR factorization with column pivoting provided in LAPACK. DGEQPB. An implementation of the \windowed" QR factorization scheme described in Section 2. DGEQPX. DGEQPB followed by an implementation of the variant of the Chandrasekaran/Ipsen algorithm described in Subsections 3.2 and 3.3. DGEQPY. DGEQPB followed by an implementation of the variant of the Pan/Tang algorithm described in Subsections 3.1 and 3.3. DGEQRF. The block QR factorization without any pivoting provided in LAPACK. In the implementation of our algorithms, we make heavy use of available LAPACK infrastructure. The code used in our experiments, including test and timing 4. EXPERIMENTAL RESULTS 14 drivers and test matrix generators, is available as rrqr.tar.gz in pub/prism on ftp.super.org. We tested matrices of size 100; 150; 250; 500, and 1000 on an IBM RS/6000 Model 370 and SGI R8000. In each case, we employed the vendor-supplied BLAS in the ESSL and SGIMATH libraries, respectively. We employed 18 di erent matrix types to test the algorithms, with various singular value distributions and numerical rank ranging from 3 to full rank. Details of the test matrix generation are beyond the scope of this paper, and we give only a brief synopsis here. For details, the reader is referred to the code. Test matrices 1 through 5 were designed to exercise column pivoting. Matrix 6 was designed to test the behavior of the condition estimation in the presence of clusters for the smallest singular value. For the other cases, we employed the LAPACK matrix generator xLATMS, which generates random symmetric matrices by multiplyinga diagonal matrix with prescribed singular values by random orthogonal matrices from the left and right. For the break1 distribution, all singular values are 1.0 except for one. In the arithmetic and geometric distributions, they decay from 1.0 to a speci ed smallest singular value in an arithmetic and geometric fashion, respectively. In the \reversed" distributions, the order of the diagonal entries was reversed. For test cases 7 though 12, we used xLATMS to generate a matrix of order n + 1 with smallest singular value 5.0e-4, and then interspersed random 2 linear combinations of these \full-rank" columns to pad the matrix to order n. For test cases 13 through 18, we used xLATMS to generate matrices of order n with the smallest singular value being 2.0e-7. We believe this set to be representative of matrices that can be encountered in practice. We report in this section on results for matrices of size n = 1000, noting that identical qualitative behavior was observed for smaller matrix sizes. We decided to report on the largest matrix sizes because the possibility for failure in general increases with the number of numerical steps involved. Numerical results obtained on the three platforms agreed to machine precision. For this case, we list in Table 1 the numerical rank r with respect to a condition threshold of = 105, the largest singular value max , the r-th singular value r , the (r + 1)st singular value r+1 , and the smallest singular value min for our test cases. Figures 8 and 9 display the ratio 1= := (b(R r ) ; (22) r) where b(R) as de ned in (14) is the computed estimate of the condition number of R after DGEQPB (Figure 8) and DGEQPX and DGEQPY (Figure 9). Thus, is the ratio between the ideal condition number and the estimate of the condition number of the leading triangular factor identi ed in the RRQR factorization. If this ratio is close to 1, and b is a good condition estimate, our RRQR factorizations do a good job of capturing the \large" singular values of A. Since the pivoting strategy (and hence the numerical behavior of DGEQPB) is potentially a ected by the block size chosen, Figures 8 and 9 contain seven panels, each of which shows the results obtained with the 18 test matrices and a block size ranging from 1 to 4.1 Numerical Reliability 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Description r max r r+1 min Matrix with rank min(2m;n) ? 1 499 1.0e0 1.0e0 2.0e-7 1.2e-19 A(:; 2 : min(m; n)) has full rank 999 1.0e0 5.0e-4 6.7e-19 6.7e-19 R(A) = R(A(:; 2 : min(m; n))) Full rank 1000 1.0e0 5.0e-4 5.0e-4 5.0e-4 A(:; 1 : 3) small in norm 997 2.9e+1 5.0e-4 2.4e-4 4.2e-5 A(:; 4 : n) of full rank A(:; 1 : 3) small in norm 3 1.0e0 5.0e-4 5.5e-14 7.6e-21 R(A) = R(A(:; 1 : 3)) 5 smallest sing. values clustered 1000 1.0e0 7.0e-4 7.0e-4-3 7.0e-4 Break1 distribution 501 1.0e0 5.0e-4 1.7e-15 1.0e-26 Reversed break1 distribution 501 1.0e0 5.0e-4 1.7e-15 1.2e-27 Geometric distribution 501 1.0e0 5.0e-4 3.3e-16 1.9e-35 Reversed geometric distribution 501 1.0e0 5.0e-4 3.2e-16 5.4e-35 Arithmetic distribution 501 1.0e0 5.0e-4 9.7e-16 1.4e-34 Reversed arithmetic distribution 501 1.0e0 5.0e-4 9.7e-16 1.2e-34 Break1 distribution 999 1.0e0 1.0e0 2.0e-7 2.0e-7 Reversed break1 distribution 999 1.0e0 1.0e0 2.0e-7 2.0e-7 Geometric distribution 746 1.0e0 5.0e-5 9.9e-6 2.0e-7 Reversed geometric distribution 746 1.0e0 5.0e-5 9.9e-6 2.0e-7 Arithmetic distribution 999 1.0e0 1.0e-1 2.0e-7 2.0e-7 Reversed arithmetic distribution 999 1.0e0 1.0e-1 2.0e-7 2.0e-7 Table 1. Test Matrix Types (r = rank for n = 1000) 24 (shown in the top of each panel). We see that except for matrix type 1 in Figure 8, the block size does not play much of a rule numerically, although close inspection reveals subtle variations in both Figure 8 and 9. With block size 1, DGEQPB just becomes the standard Golub pivoting strategy. Thus, the rst panel in Figure 8 corroborates the experimentally robust behavior of this algorithm. We also see that except for matrix type 1, the restricted pivoting strategy employed in DGEQPB does not have much impact on numerical behavior. For matrix type 1, however, it performs much worse. Matrix 1 is constructed by generating n ? 1 independent columns and generating the leading 2 n + 1 as random linear combinations of those columns, scaled by 1 , where is the 4 2 machine precision. Thus, the restricted pivoting strategy, in its myopic view of the matrix, gets stuck (so to speak) in these columns. The postprocessing of these approximate RRQR factorizations, on the other hand, remedies potential shortcomings in the preprocessing step. As can be seen from Figure 9, the inaccurate factorization of matrix 1 is corrected, while the other (in essence correct) factorizations get improved only slightly. Except for small variations, DGEQPX and DGEQPY deliver identical results. We also computed the exact condition number of the leading triangular submatrices identi ed in the triangularizations by DGEQPB, DGEQPX, and DGEQPY, and compared it with our condition estimate. Figure 10 shows the ratio of the exact condition number to the estimated condition number of the leading triangular 16 10 1 1 10 0 5 8 12 16 20 24 Optimal cond_no. / Estimated cond_no. 10 -1 10 -2 10 -3 10 -4 10 -5 10 -6 20 40 60 Tests 80 100 120 Fig. 8. Ratio between Optimal and Estimated Condition Number for DGEQPB 10 1 1 5 8 12 16 20 24 Optimal cond_no. / Estimated cond_no. 10 0 10 -1 10 -2 -- QPX .. QPY -3 10 20 40 60 Tests 80 100 120 Fig. 9. Ratio between Optimal and Estimated Condition Number for DGEQPX (solid line) and DGEQPY (dashed) 17 10 1 1 5 8 12 16 20 24 Exact cond_of_R11 / Estimated cond_of_R11 10 0 10 -1 -- QPB -. QPX ... QPY 10 -2 20 40 60 Tests 80 100 120 Fig. 10. Ratio between Exact and Estimated Condition Number of Leading Triangular Factor for DGEQPB (dashed), DGEQPX (dashed-dotted), and DGEQPY (dotted) factor. We observe excellent agreement, within an order of magnitude in all cases. Hence, the \spikes" for test matrices 13 and 14 in Figures 8 and 9 are not due to errors in our estimators. Rather, they show that all algorithms have di culties when confronted with dense clusters of singular values. We also note that in this context, the notion of rank is numerically ill de ned, since there is no sensible place to draw the line. The \rank" derived via the SVD is 746 in both cases, and our algorithms deliver estimates between 680 and 710, with minimal changes in the condition number of their corresponding leading triangular factors. In summary, these results show that DGEQPX and DGEQPY are reliable algorithms for revealing numerical rank. They produce RRQR factorizations whose leading triangular factors accurately capture the desired part of the spectrum of A, and thus reliable and numerically sensible rank estimates. Thus, the RRQR factorization takes advantage of the e ciency and simplicity of the QR factorization, yet it produces information that is almost as reliable as that computed by means of the more expensive singular value decomposition. In this section we report on the performance of the LAPACK codes DGEQPF and DGEQRF as well as the new DGEQPB, DGEQPX, and DGEQPY codes. For these codes, as well as all others presented in this section, the M op rate was obtained by dividing the number of operations required for the unblocked version of DGEQRF by the runtime. This normalized M op rate readily allows for timing comparisons. We report on matrix sizes 100, 250, 500, and 1000, using block sizes (nb ) of 1, 5, 8, 12, 16, 20, and 24. Figures 11 and 12 show the M op performance (averaged over the 18 matrix 4.2 Computing Performance 18 n = 100 50 65 n = 250 60 45 55 Performance (in Mflops) 40 Performance (in Mflops) 10 20 Block size 30 50 35 45 40 30 35 25 30 20 0 25 0 10 20 Block size 30 n = 500 75 70 65 75 n = 1000 70 65 Performance (in Mflops) 60 55 50 45 40 35 30 0 Performance (in Mflops) 10 20 Block size 30 60 55 50 45 40 35 0 10 20 Block size 30 Fig. 11. Performance versus Block Size on IBM RS/6000-370: DGEQPF ( ), DGEQRF (|), DGEQPB (- -), DGEQPX (- -x), DGEQPY (- -+) types) versus block size on the IBM and SGI platforms. The dotted line denotes the performance of DGEQPF, the solid one that of DGEQRF and the dashed one that of DGEQPB; the and + symbols indicate DGEQPX and DGEQPY, respectively. On all three machines, the performance of the two new algorithms for computing RRQR is robust with respect to variations in the block size. The two new block algorithms for computing RRQR factorization are, except for small matrices on the SGI, faster than LAPACK's DGEQPF for all matrix sizes. We note that the SGI has a data cache of 4 MB, while the IBM platform has only a 32 KB data cache. Thus, matrices up to order 500 t into the SGI cache, but matrices of order 1000 do not. Therefore, for matrices of size 500 or less, we observe limited bene ts from the better inherent data locality of the BLAS 3 implementationon this computer. These results also show that DGEQPX and DGEQPY exhibit comparable performance. Figures 13 through 14 o er a closer look at the performance of the various test 19 n = 100 100 160 150 90 140 80 n = 250 Performance (in Mflops) Performance (in Mflops) 10 20 Block size 30 130 120 110 100 90 70 60 50 40 80 30 0 70 0 10 20 Block size 30 n = 500 190 180 160 170 160 150 140 130 120 110 60 100 90 0 40 0 140 180 n = 1000 Performance (in Mflops) Performance (in Mflops) 10 20 Block size 30 120 100 80 10 20 Block size 30 Fig. 12. Performance versus Block Size on SGI R8000: DGEQPF ( ), DGEQRF (|), DGEQPB (- -), DGEQPX (- -x), DGEQPY (- -+) 20 60 50 Performance (in Mflops) 40 30 20 10 0 2 4 6 8 10 Matrix type 12 14 16 18 Fig. 13. Performance versus Matrix Type on an IBM RS/6000-370 for n = 250 and nb = 16: DGEQPF ( ), DGEQRF (|), DGEQPB (- -), DGEQPX (x), DGEQPY (+) matrices. We chose nb = 16 and n = 250 as a representative example. Similar behavior was observed in the other cases. We see that on the IBM platforms (Figure 13), the performance of DGEQRF and DGEQPF does not depend on the matrix type. We also see that, except for matrix types 1, 5, 15, and 16, the postprocessing of the initial approximate RRQR factorization takes very little time, with DGEQPX and DGEQPY performing similarly. For matrix type 1, considerable work is required to improve the initial QR factorization. For matrix types 5 and 15, the performances of DGEQPX and DGEQPY di er noticeably on the IBM platform, but there is no clear winner. We also note that matrix type 5 is suitable for DGEQPB, since the independent columns are up front and thus are revealed quickly, and the rest of the matrix is factored with DGEQRF. The SGI platform (Figure 14) o ers a di erent picture. The performance of all algorithms shows more dependence on the matrix type, and DGEQPB performs worse on matrix type 5 than on all others. Nonetheless, except for matrix 1, DGEQPX and DGEQPY do not require much postprocessing e ort. The pictures for other matrix sizes are similar. The cost for DGEQPX and DGEQPY decreases as the matrix size increases, except for matrix type 1, where it increases as expected. We also note that Figures 11 and 12 would have looked even more favorable for our algorithm had we omitted matrix 1 or chosen the median (instead of the average) performance. Figure 15 shows the percentage of the actual amount of ops spent in monitoring the rank in DGEQPB and in postprocessing the initial QR factorization for di erent matrix sizes on the IBM RS/6000. We show only matrix types 2 through 18, since the behavior of matrix type 1 is rather di erent: in this special case, roughly 21 140 120 Performance (in Mflops) 100 80 60 40 20 0 2 4 6 8 10 Matrix type 12 14 16 18 Fig. 14. Performance versus Matrix Type on an SGI R8000 for n = 250 and nb = 16: DGEQPF ( ), DGEQRF (|), DGEQPB (- -), DGEQPX (x), DGEQPY (+) DGEQPX 10 9 8 10 9 8 DGEQPY % in flops of pivoting 6 5 4 3 2 1 0 % in flops of pivoting 5 10 Matrix type 15 7 7 6 5 4 3 2 1 0 5 10 Matrix type 15 Fig. 15. Cost of Pivoting (in % of ops) versus Matrix Types of Algorithms DGEQPX and DGEQPY on an IBM RS/6000-370 for Matrix Sizes 100 (+), 250 (x), 500 (*) and 1000 (o). 50% of the overall ops is expended in the postprocessing. Note that the actual performance penalty due to these operations is, while small, still considerably higher than the op count suggests. This is not surprising given the relatively ne-grained nature of the condition estimation and postprocessing operations. One may wonder whether the use of DGEQRF to compute the initial QR factorization would lead to better results, since DGEQRF is the fastest QR factorization 22 algorithm. This is not the case, since DGEQRF does not provide any rank preordering, and thus performance gains from DGEQRF are annihilated in the postprocessing steps. For example, for matrices of order 250 on an IBM RS/6000-370, the average M op rate, excluding matrix 5, was 4.5, with a standard deviation of 1.4. The percentage of ops spent in postprocessing in these cases was on average 76.8%, with a standard deviation of 6.7. For matrix 5, we are lucky, since the matrix is of low rank and all independent columns are at the front of the matrix. Thus, we spend only 3% in postprocessing, obtaining a performance of 49.1 M ops overall. In all other cases, though, considerable e ort is expended in the postprocessing phase, leading to overall disappointing performance. These results show that the preordering done by DGEQPB is essential for the e ciency of the overall algorithm. In this paper, we presented rank-revealing QR (RRQR) factorization algorithms that combine an initial QR factorization employing a restricted pivoting scheme with postprocessing steps based on variants of algorithms suggested by Chandrasekaran and Ipsen and Pan and Tang. The restricted-pivoting strategy results in an initial QR factorization that is almost entirely based on BLAS-3 kernels, yet still achieves at a good approximation of an RRQR factorization most of the time. To guarantee the reliability of the initial RRQR factorization and improve it if need be, we improved an algorithm suggested by Pan and Tang, relying heavily on incremental condition estimation and \blocked" Givens rotation updates for computational e ciency. As an alternative, we implemented a version of an algorithm by Chandrasekaran and Ipsen, which among other improvements uses the f -factor technique suggested by Pan and Tang to avoid cycling in the presence of roundo errors. Numerical experiments on eighteen di erent matrix types with matrices ranging in size from 100 to 1000 on IBM RS/6000 and SGI R8000 platforms show that this approach produces reliable rank estimates while outperforming the (less reliable) QR factorization with column pivoting, the currently most common approach for computing an RRQR factorization of a dense matrix. We thank Xiaobai Sun, Peter Tang, and Enrique S. Quintana-Ort for stimulating discussions on the subject. References Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., DuCroz, J., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., and Sorensen, D. 1992a . Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., DuCroz, J., Greenbaum, A., Hammarling, S., McKenney, A., Ostrouchov, S., and Sorensen, D. 1994b . Bischof, C. H. 1989 . 5. CONCLUSIONS ACKNOWLEDGMENTS LAPACK User's Guide. SIAM, Philadelphia. LAPACK User's Guide Release 2.0. SIAM, Philadelphia. A block QR factorization algorithm using restricted pivoting. In Proceedings SUPERCOMPUTING '89 (Baltimore, Md., 1989), pp. 248{256. ACM Press. Bischof, C. H. 1990 . Incremental condition estimation. SIAM Journal on Matrix Analysis and Applications 11, 2, 312{322. 23 A parallel QR factorization algorithm with controlled local pivoting. SIAM Journal on Scienti c and Statistical Computing 12, 1, 36{57. Bischof, C. H. and Hansen, P. C. 1991 . Structure-preserving and rank-revealing QR factorizations. SIAM Journal on Scienti c and Statistical Computing 12, 6 (November), 1332{1350. Bischof, C. H. and Hansen, P. C. 1992 . A block algorithm for computing rank-revealing QR factorizations. Numerical Algorithms 2, 3-4, 371{392. Bischof, C. H. and Shroff, G. M. 1992 . On updating signal subspaces. IEEE Transactions on Signal Processing 40, 1, 96{105. Bischof, C. H. and Tang, P. T. P. 1991 . Robust incremental condition estimation. Preprint MCS-P225-0391, Mathematics and Computer Science Division, Argonne National Laboratory. Chan, T. F. 1987 . Rank revealing QR factorizations. Linear Algebra and Its Applications 88/89, 67{82. Chan, T. F. and Hansen, P. C. 1992 . Some applications of the rank revealing QR factorization. SIAM Journal on Scienti c and Statistical Computing 13, 3, 727{741. Chan, T. F. and Hansen, P. C. 1994 . Low-rank revealing QR factorizations. Numerical Linear Algebra and Applications 1, 1, 33{44. Chandrasekaran, S. and Ipsen, I. 1994 . On rank-revealing QR factorizations. SIAM Journal on Matrix Analysis and Applications 15, 2, 592{622. Dongarra, J. J., Bunch, J. R., Moler, C. B., and Stewart, G. W. 1979 . LINPACK Users' Guide. SIAM, Philadelphia. Elden, L. and Schreiber, R. 1986 . An application of systolic arrays to linear discrete ill-posed problems. SIAM Journal on Scienti c and Statistical Computing 7, 892{903. Foster, L. V. 1986 . Rank and null space calculations using matrix decomposition without column interchanges. Linear Algebra and Its Applications 74, 47{71. Golub, G. H. 1965 . Numerical methods for solving linear least squares problems. Numerische Mathematik 7, 206{216. Golub, G. H., Klema, V., and Stewart, G. W. 1976 . Rank degeneracy and least squares problems. Technical Report TR{456, University of Maryland, Dept. of Computer Science. Golub, G. H. and Loan, C. F. V. 1983 . Matrix Computations. The Johns Hopkins University Press, Baltimore. Golub, G. H. and Loan, C. F. V. 1989 . Matrix Computations (2nd ed.). The Johns Hopkins University Press, Baltimore. Golub, G. H., Manneback, P., and Toint, P. L. 1986 . A comparison between some direct and iterative methods for certain large scale geodetic least-squares problem. SIAM Journal on Scienti c and Statistical Computing 7, 799{816. Gragg, W. B. and Stewart, G. W. 1976 . A stable variant of the secant method for solving nonlinear equations. SIAM Journal on Numerical Analysis 13, 6, 889{903. Grandine, T. A. 1987 . An iterative method for computing multivariate C 1 piecewise polynomial interpolants. Computer Aided Geometric Design 4, 307{319. Grandine, T. A. 1989 . Rank de cient interpolation and optimal design: An example. Technical Report SCA{TR{113 (February), Boeing Computer Services, Engineering and Scienti c Services Division. Gu, M. and Eisenstat, S. 1992 . A stable and e cient algorithm for the rank-one modi cation of the symmmetric eigenproblem. Technical Report YALEU/DCS/RR-916, Yale University, Department of Computer Science. Hansen, P. C. 1990 . Truncated SVD solutions to discrete ill-posed problems with illdetermined numerical rank. SIAM Journal on Matrix Analysis and Applications 11, 3, 503 { 518. Hansen, P. C., Sekii, T., and Shibahashi, H. 1992 . The modi ed truncated SVD-method for regularization in general form. SIAM Journal on Scienti c Computing 13, 1142{1150. Higham, N. J. 1986 . E cient algorithms for computing the condition number of a tridiagonal matrix. SIAM Journal on Scienti c and Statistical Computing 7, 150{165. Bischof, C. H. 1991 . 24 The rank revealing QR decomposition and SVD. Mathematics of Computation 58, 213{232. Hsieh, S. F., Liu, K. J. R., and Yao, K. 1991 . Comparisons of truncated QR and SVD methods for AR spectral estimations. In R. J. Vaccaro Ed., SVD and Signal Processing II (Amsterdam, 1991), pp. 403{418. Elsevier Science Publishers. More, J. 1978 . The Levenberg-Marquardtalgorithm: Implementationand theory. In G. A. Watson Ed., Proceedings of the Dundee Conference on Numerical Analysis (Berlin, 1978). Springer Verlag. Pan, C.-T. and Tang, P. T. P. 1992 . Bounds on singular values revealed by QR factorizaton. Technical Report MCS-P332-1092, Mathematics and Computer Science Division, Argonne National Laboratory. Quintana-Ort , G. and Quintana-Ort , E. S. 1996 . Guaranteeing termination of Chandrasekaran & Ipsen's algorithm for computing rank-revealing QR factorizations. Preprint MCS-P564-0196, Mathematics and Computer Science Division, Argonne National Laboratory. Quintana-Ort , G., Sun, X., and Bischof, C. H. 1995 . A BLAS-3 version of the QR factorization with column pivoting. Preprint MCS-P551-1295, Mathematics and Computer Science Division, Argonne National Laboratory. Reichel, L. and Gragg, W. 1990 . Fortran subroutines for updating the QR factorization. ACM Transactions on Mathematical Software 16, 369{377. Schreiber, R. and Van Loan, C. 1989 . A storage e cient WY representation for products of Householder transformations. SIAM Journal on Scienti c and Statistical Computing 10, 1, 53{57. Stewart, G. W. 1984 . Rank degeneracy. SIAM Journal on Scienti c and Statistical Computing 5, 403{413. Stewart, G. W. 1990 . An updating algorithm for subspace tracking. Technical Report CS-TR-2494, University of Maryland, Department of Computer Science. Stewart, G. W. 1993 . Determining rank in the presence of error. Technical Report CSTR-2972, Dept. of Computer Science, University of Maryland. Walden, B. 1991 . Using a fast signal processor to solve the inverse kinematic problem with special emphasis on the singularity problem. Ph. D. thesis, Linkoping University, Dept. of Mathematics. Hong, Y. P. and Pan, C.-T. 1992 . ...
View Full Document

## This note was uploaded on 01/14/2011 for the course ECE 210a taught by Professor Chandrasekara during the Fall '08 term at UCSB.

Ask a homework question - tutors are online