Width blocksize m get asub and bsub notice every

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ored in row major order: // M[row,col] = *(M.elements + row * M.stride + col) // A sub matrix takes the stride of the total matrix avoiding copies. // Matrix is really a MATRIX DESCRIPTOR. Stride is the number of bytes // from one element of the matrix to the element in the same column but // one row down. It is present so that we may pad the rows in a matrix // with additional bytes in order to control word boundary alignment for // the elements of the matrix. typedef struct { int width; int height; int stride; float* elements; } Matrix; host •  create host matrices h_A, h_B, h_C •  create device matrices d_A, d_B, d_C memcpy h_A à༎ d_A memcpy h_B à༎ d_B •  call device kernel •  do Eming •  memcpy d_C à༎ h_C •  check results device kernel code for (int m = 0; m < (A.width / BLOCK_SIZE); ++m){ // Get Asub and Bsub … // Notice: every thread declares shared_A and shared_B in shared memory // even though a thread block has only one shared_A and one shared_B __shared__ float shared_A[BLOCK_SIZE][BLOCK_SIZE]; __shared__ float shared_B[BLOCK_SIZE][BLOCK_SIZE]; // Each thread copies just one element of shared_A and one element of shared_B shared_A[thread_row][thread_col] = GetElement(Asub, thread_row, thread_co...
View Full Document

This note was uploaded on 02/12/2014 for the course CS 475 taught by Professor Staff during the Fall '08 term at Colorado State.

Ask a homework question - tutors are online