Bc values k b k m ai br a bc cij c the whole

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: br . bc using a thread block of tr . tc threads •  We cannot choose m, k, and n but we can choose br and bc, and tr and tc •  How does the complexity of the algorithm change with br, bc, tr, tc ? n Each thread block with blockIdx = (i,j) computes block Cij of size br . bc B*j each thread block - computes k.br.bc multiply adds - communicates k.br + k.bc values k B k m Ai* br A bc Cij C The whole computation takes - m.n / br.bc C blocks - computes m.k.n multiply adds - communicates m.k.n (br+bc)/ br.bc values We cannot optimize the computes, but we can optimize the communicates. CommunicaEon • ...
View Full Document

Ask a homework question - tutors are online