cs240a-cilkmatrixhyperobject

cs240a-cilkmatrixhyperobject -...

This preview shows pages 1–9. Sign up to view the full content.

1 CS 240A :  Numerical Examples in Shared  Memory with Cilk++  Matrix-matrix  multiplication  Matrix-vector  multiplication  Hyperobjects Thanks to  Charles E. Leiserson for some of these slides

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 TP =  execution time on  P  processors T1  =  work T∞  span * * Also called  critical-path length or  computational depth . Speedup on p processors T1/Tp  Parallelism T1/T Work and Span (Recap)
3 cilk_for  (int i=0; i<n; ++i) {     A[i]+=B[i]; } Vector  addition Work:   T1 =  (n)  Θ Span:   T∞ = Parallelism:  T1/T∞ =  (n/lg n)   Θ T1 = T∞ =  (lg n) Θ  T1/T∞ = Cilk  Loops: Divide and Conquer G grain size Assume that  G =  (1) Θ . Implementation

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
4 Square-Matrix Multiplication c11 c12 c1n c21c22 c2n ⋮ ⋮ ⋱ ⋮ cn1cn2 cnn a11 a12 a1n a21a22 a2n ⋮ ⋮ ⋱ ⋮ an1an2 ann b11 b12 b1n b21b22 b2n ⋮ ⋮ ⋱ ⋮ bn1bn2 bnn = C A B cij = k = 1 n aik bkj Assume for simplicity that  n = 2k .
5 Parallelizing Matrix Multiply cilk_for  (int i=1; i<n; ++i) {      cilk_for  (int j=0; j<n; ++j) {         for (int k=0; k<n; ++k {             C[i][j] += A[i][k] * B[k][j];   } } (n3)  Θ Span:   T∞ = (n2)   Θ Work:   T1 = (n) Θ Parallelism:  T1/T∞ = For  1000 × 1000  matrices, parallelism   (103)2 = 106 .

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
6 Recursive Matrix Multiplication 8  multiplications of  n/2 × n/2  matrices. 1  addition of  n × n   matrices. Divide and conquer — C11 C12 C21 C22 = A11 A12 A21 A22 B11 B12 B21 B22 = + A11B11 A11B12 A21B11 A21B12 A12B21 A12B22 A22B21 A22B22
7 template <typename T> void MMult(T *C, T *A, T *B, int n) {   T * D = new T[n*n];   //    cilk_spawn  MMult(C11, A11, B11, n/2);      cilk_spawn  MMult(C12, A11, B12, n/2);      cilk_spawn  MMult(C22, A21, B12, n/2);      cilk_spawn  MMult(C21, A21, B11, n/2);      cilk_spawn  MMult(D11, A12, B21, n/2);      cilk_spawn  MMult(D12, A12, B22, n/2);      cilk_spawn  MMult(D22, A22, B22, n/2);                  MMult(D21, A22, B21, n/2);    cilk_sync ;   MAdd(C, D, n);  // C += D; } Row/column length of  matrices Determine  submatrices by   index calculation Coarsen for efficiency

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
8 template <typename T> void MMult(T *C, T *A, T *B, int n) {   T * D = new T[n*n];   //    cilk_spawn  MMult(C11, A11, B11, n/2);      cilk_spawn  MMult(C12, A11, B12, n/2);      cilk_spawn  MMult(C22, A21, B12, n/2);      cilk_spawn  MMult(C21, A21, B11, n/2);      cilk_spawn  MMult(D11, A12, B21, n/2);      cilk_spawn  MMult(D12, A12, B22, n/2);      cilk_spawn  MMult(D22, A22, B22, n/2);
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 12/27/2011 for the course CMPSC 240A taught by Professor Gilbert during the Fall '09 term at UCSB.

Page1 / 29

cs240a-cilkmatrixhyperobject -...

This preview shows document pages 1 - 9. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online