Shared shared matmult bj b ai a cij c a and

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CommunicaEon takes m.k.n (br+bc) / br.bc •  Some extremes: –  br=m and bc=n (1 threadblock) comms = m.k + n.k (like sequenEal matmult) –  br=1 and bc=1 (fine grain) comms = 2.m.k.n (1 order of magnitude higher) OpEmizing CommunicaEon •  CommunicaEon takes m.k.n (br+bc) / br.bc •  What is the opEmum Cij Ele shape? minimize (br+bc) / br.bc br=s+h bc=s- h br+bc= 2.s br.bc = s2- h2 minimal when h = 0 à༎ square Ele ! shared / shared matmult B*j B Ai* A Cij C •  A and B in global memory •  blocks of A and B copied into shared memory •  2D grid of 2D thread blocks, each 16x16 thread block computes a 16x16 C block (tr=br , tc=bc) •  C elements: registers sha...
View Full Document

Ask a homework question - tutors are online