CME 323: Distributed Algorithms and Optimization, Spring 2016
~
rezab/dao
.
Instructor: Reza Zadeh, Matroid and Stanford.
Lecture 5, 4/17/2017. Scribed by
Andreas Santucci.
Lecture contents
1. QuickSort
2. Parallel algorithm for minimum spanning trees (Boruvka)
3. Parallel connected components (random mates)
1
QuickSort
First, we’ll finish the analysis of QuickSort. The algorithm is as follows.
Algorithm 1:
QuickSort
Input:
An array
A
Output:
Sorted
A
1
p
←
element of
A
chosen uniformly at random
2
L
←
[
a

a
∈
A
s.t.
a < p
]
// Implicitly:
B
L
←
{
a
i
< p
}
n
i
=1
, prefixSum
(
B
l
)
,
3
R
←
[
a

a
∈
A
s.t.
a > p
]
// which requires
Θ(
n
)
work and
O
(log
n
)
depth.
4
return
[
QuickSort
(
L
)
, p,
QuickSort
(
R
)]
1.1
Analysis on Memory Management
Recall that in Lecture 4, we designed an algorithm to construct
L
and
R
in
O
(
n
) work and
O
(log
n
)
depth. Since we know the algorithm used to construct
L
and
R
(which is the main work required
of
QuickSort
), let us take this opportunity to take a closer look at memory management during
the algorithm.
Selecting a pivot uniformly at random
We denote the size our input array
A
by
n
. To be
precise, we can perform step 1 in Θ(log
n
) work and
O
(1) depth. That is, to generate a number
uniformly from the set
{
1
,
2
, . . . , n
}
we can assign log
n
processors to independently flip a bit “on”
with probability 1/2. The resulting bitsequence can be interpreted as a log
2
representation of a
number from
{
1
, . . . , n
}
.
1
Allocating storage for
L
and
R
Start by making a call to the OS to allocate an array of
n
elements; this requires
O
(1) work and depth, since we do not require the elements to be initialized.
We compare each element in the array with the pivot,
p
, and write a 1 to the corresponding element
if the element belongs in
L
(i.e. it’s smaller) and a 0 otherwise. This requires Θ(
n
) work but can
be done in parallel, i.e.
O
(1) depth. We are left with an array of 1’s and 0’s indicating whether an
element belongs in
L
or not, call it
L
,
L
=
{
a
∈
A
s.t.
a < p
}
.
We then apply
PrefixSum
on the indicator array
L
, which requires
O
(
n
) work and
O
(log
n
)
depth. Then, we may examine the value of the last element in the output array from
prefixSum
to learn the size of
L
. Looking up the last element in array
L
requires
O
(1) work and depth. We
can further allocate a new array for
L
in constant time and depth. Since we know

L

and we know
n
, we also know

R

=
n
 
L

; computing

R

and allocating corresponding storage requires
O
(1)
work and depth.
Thus, allocating space for
L
and
R
requires
O
(
n
) work and
O
(log
n
) depth.
Filling
L
and
R
Now, we use
n
processors, assigning each to exactly one element in our input
array
A
, and in parallel we perform the following steps. Each processor 1
,
2
, . . . , n
is assigned to its
corresponding entry in
A
. Suppose we fix attention to the
k
th processor, which is responsible for
assigning the
k
th entry in
A
to its appropriate location in either
L
or
R
. We first examine
L
[k]
to determine whether the element belongs in
L
or
R
. In addition, examine the corresponding entry
in
prefixSum
output, denote this value by
i
=
prefixSum
(
L
)
[k]
. If the
k
th entry of
A
belongs
in
L
You've reached the end of your free preview.
Want to read all 12 pages?
 Fall '19
 Block matrix