Two sided Matrix Algorithms

The triangular and the rectangular parts separately

This preview shows pages 13–14. Sign up to view the full content.

the triangular and the rectangular parts separately is penalized by the extra function call and matrix setup. Subroutine setup costs are minimized by the use of our superscalar GEMM routine for problems that fit into the level-1 cache, that is, problems solved by the tr* subsystem solvers. In the algorithm, this is controlled by the parameter blks , the same parameter used in the recursive algorithms to switch from the recursive blocked algorithms to the subsystem solvers. In the implementation, fix-up code for handling any 2 × 2 diagonal blocks of the quasitriangular matrices is implemented using level-1 BLAS routines. For the symmetric case, ˜ C C - AXA T , X = X T , and C = C T , the SLICOT MB01RD subroutine is used [SLICOT 2001]. It reduces the complexity from 2 M 3 to (3 / 2) M 3 , by calculating the upper triangular part of the symmetric matrix triu( ˜ C ) = V + triu( B ) + stril( B ) T + diag( V ) + diag(triu( B )), where B = ATA T , T = triu( X ) - 1 2 diag( X ), and V = triu( C ) - 1 2 diag( C ), and triu, stril, and diag denote the upper triangular, the strictly lower triangular, and the diagonal parts, respectively. The MB01RD routine is implemented using level-3 BLAS routines and requires no workspace. However, for some problems, the complexity can be lowered even more, at the cost of extra workspace. In rtrglydt , Algorithm 2, the temporary result W = A 12 X 22 in the operation C 12 = axb ( - A 12 X 22 A T 22 ) can be reused in the oper- ation C 11 = axb ( - A 12 X 22 A T 22 ), and analogously for the E matrix. Although this triples the workspace needed, it lowers the overall complexity of the algorithm from (25 / 6) N 3 to (23 / 6) N 3 . 5. PERFORMANCE RESULTS OF RECURSIVE BLOCKED ALGORITHMS The recursive blocked algorithms for solving two-sided triangular matrix equa- tions have been implemented in Fortran 90, using the facilities for recursive calls of subprograms, dynamic memory allocation, and threads. In this sec- tion, sample performance results of these implementations executing on IBM Power, MIPS, and Intel Pentium processor-based systems are presented. We have selected a few processors representing different vendors and different architecture characteristics and memory hierarchies. The details of the proces- sors, memory subsystems, and operating systems are all described in Part I [Jonsson and K˚agstr¨om 2002]. The only difference from Part I is the Power3 machine, which is running at 375 MHz, giving each processor a theoretical peak rate of 1500 Mflops/s. The performance results (measured in Mflops/s) are com- puted using the flopcounts presented in Table I of Section 3.6. The results are displayed in graphs and tables, where the performances of different variants of the recursive blocked implementations are visualized together with existing routines in the state-of-the-art SLICOT library [SLICOT 2001]. Moreover, the relative performances (measured as speedup) between different algorithms in- cluding sequential as well as parallel implementations are presented in tables.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern