Dense Matrix Algorithms
Dense Matrix Algorithms
•
Matrix multiplication, system of
linear equations, …
•
Few or no usable zero
elements
•
Data decomposition
techniques give efficient task
partitioning
•
Partitioning schemes for
1D Block Partitioning
2
matrices
–
1 and 2dimensional block,
cyclic and blockcyclic
partitionings
•
One task per process,
generally
–
Plenty of work per task
2D Block Partitioning
CS 6643
F '11
Lec 23
MatrixVector Multiplication
•
Dot product of two vectors is
the key operation
•
nxn matrix and nx1 vector are
multiplied to obtain nx1 vector
•
T
s
= W =
Θ
(n
2
)
•
Consider 1D rowwise block
partitioning
•
1 row/process
3
–
Process i, 1
≤
i
≤
n, has row i of
matrix A and element i of
vector x
•
Steps
–
Distribute vector x so that
each process has the entire x
–
Each process performs dot
product to get an element of
the result vector
CS 6643
F '11
Lec 23
MatrixVector Multiplication
4
•
AAB is used to distribute the x vector: message size, m = n/p
•
For n=p,
–
AAB takes
Θ
(n) under singleport model; dot product takes
Θ
(n) time
– T
p
=
Θ
(n)
– pT
p
=
Θ
(n
2
) = W
⇒
costoptimal
CS 6643
F '11
Lec 23
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
MatrixVector Product
•
If n>p, each process
–
has n/p rows and n/p elements of x at the beginning
–
computes n/p elements of the result vector
•
AAB with m=n/p takes,
t
s
log p +t
w
(n/p) (p1)
≅
t
s
log p +t
w
n
•
Each process spends
Θ
(n
2
/p) time to compute n/p elements
•
T
p
= n
2
/p + t
s
log p +t
w
n
•
pT
= n
2
+ t
p log p +t
np
5
p
s
w
•
T
o
= t
s
p log p +t
w
np
•
For isoefficiency, equate each term of T
o
to W and get
W = K
1
t
s
p log p
W = K
2
t
w
np
⇒
W =
Θ
(p
2
)
•
Since max. degree of concurrency, C(W) = n,
p =
Ο
(n)
⇒
W = n
2
=
Ω
(p
2
)
•
Isoefficiency function is the maximum of these three, which is
Θ
(p
2
)
CS 6643
F '11
Lec 23
MatrixVector Product
•
2D partitioning
• n
2
processes, each with one element of matrix A
•
Let the processes be in a grid like topology
•
n elements of x are distributed one per process among
the last column of processes
• Steps
6
– Align x so that x[i] is at process [i,i]
– OABs among groups of n processes: process[j,j] broadcasts its x
element to processes[*,j]
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '08
 STAFF
 Linear Equations, LEC, Pipelined Communication

Click to edit the document details