Elements row mstride col a sub matrix takes the stride

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: red / shared matmult •  A and B in global memory •  2D grid, each 16x16 thread block computes a 16x16 C block B Cij A C –  coalesced fetch a 16x16 A block into shared memory –  coalesced fetch a 16x16 B block into shared memory shared / shared matmult •  A and B in global memory •  2D grid, each 16x16 thread block computes a 16x16 C block B Cij A C –  coalesced fetch a 16x16 A block into shared memory –  coalesced fetch a 16x16 B block into shared memory –  each thread computes one inner product adding it to the one C element it is responsible for shared / shared matmult –  etcetera B –  initial code from CUDA Programming Guide Cij A C struct Matrix // Matrices are st...
View Full Document

Ask a homework question - tutors are online