18.335 Fall 2008
Performance Experiments
with Matrix Multiplication
Steven G. Johnson
Hardware: 2.66GHz Intel Core 2 Duo
64-bit mode, double precision, gcc 4.1.2
optimized BLAS dgemm: ATLAS 3.6.0
http:/math-atlas.sourceforge.net/
A trivial problem?
Cp =mA
18.335 Problem Set 4 Solutions
Problem 1: Hessenberg ahead! (10+10+10 points)
(a) (Essentially the same recurrence is explained in equation 30.8 in the text.) You can derive this re
currence relation by recalling the expression for determinants in terms o
18.335 Problem Set 3 Solutions
Problem 1: SVD and low-rank approximations (5+10+10+10 pts)
(a) A = QR, where the columns of Q are orthonormal and hence Q Q = I . Therefore, A A = (QR) (QR) =
R (Q Q)R = R R. But the singular values of A and R are the sq
18.335 Problem Set 2 Solutions
Problem 1: Floating-point
(a) The smallest integer that cannot be exactly represented is n = t + 1 (for base- with a t -digit man
tissa). You might be tempted to think that t cannot be represented, since a t -digit number, a
18.335 Problem Set 1 Solutions
Problem 1: Gaussian elimination
The inner loop of LU, the loop over rows, subtracts from each row a dierent multiple of the pivot
row. But this is exactly a rank-1 update U U xy T , where x is the column-vector of multiplier
18.335 Midterm Solutions, Fall 2010
Problem 1: SVD Stability (20 points)
Consider the problem of computing the SVD A = U V from a matrix A (the input). In this case, we are
computing the function f (A) = (U , , V ): the outputs are of the SVD are 3 matric
18.335 Mid-term Exam (Fall 2009)
Problem 1: Caches and QR (30 pts)
In class, we learned the Gram-Schmidt and modied Gram-Schmidt algorithms
to form the (reduced) A = QR factorization of an m n matrix A (with in
dependent columns a1 , a2 , . . . and n m).
18.335 Midterm Solutions
Problem 1: Schur, backsubstitution, complexity (20 points)
You are given matrices A (m m), B (n n), and C (m n), and want to solve for an unknown matrix X
(m n) solving:
AX X B = C.
We will do this using the Schur decompositions o
Experiments with
Cache-Oblivious
Matrix Multiplication
for 18.335
Steven G. Johnson
MIT Applied Math
platform: 2.66GHz Intel Core 2 Duo,
GNU/Linux + gcc 4.1.2 (-O3) (64-bit), double precision
(optimal) Cache-Oblivious
Matrix Multiply
=
C
mp
A
mn
B
np
divi
CPU
ideal cache
Z items
main memory
cache hit: CPU needs item in cache (fast)
cache miss: CPU needs item not in cache
item loaded into cache for future use, replacing some other item
optimal replacement: on cache miss, loaded item replaces item that will