2 Dense Matrix Multiplication Given a matrix A of size N X N and a vector cc of size N, the value 3; = Am is given by y[i] = 21. A[i][j]w[j].

Or in other words, to compute y[z] multiply element wise the ith row of the matrix by m and sum the values. (Assume the network topology is a clique.)

Use only blocking Point to Point communication. 2.1 1D partitioning: Horizontal stripes Horizontal Data Partitioning Question: Write the algorithm that performs y = Aw; a: = y; 10 times

in a loop if the data is partitioned horizontally.

Question: How much memory does each node need if the data is partitioned horizontally?

Question: How much communication does the algorithm do per iter- ation if the data is partitioned horizontally? 2.2 1D partitioning: vertical stripes Vertical Data Partitioning Question: Write the algorithm that performs y 2 Am; a: = y; 10 times

in a loop if the data is partitioned vertically.

Question: How much memory does each node need if the data is partitioned vertically?

Question: How much communication does the algorithm do per iter- ation if the data is partitioned vertically? 2.3 2D partitioning: blocks Block Partitioning Question: Write the algorithm that performs y : Am; a: : y; 10 times

in a loop if the data is partitioned in blocks. Question: How much memory does each node need if the data is

partitioned in blocks? Question: How much communication does the algorithm do per iter—

ation if the data is partitioned in blocks?