th percentile: half the numbers are bigger than it,
and half are smaller. For instance, the median of
[45
,
1
,
10
,
30
,
25]
is
25
, since this is the middle
element when the numbers are arranged in order. If the list has even length, there are two
choices for what the middle element could be, in which case we pick the smaller of the two,
say.
The purpose of the median is to summarize a set of numbers by a single, typical value.
The
mean
, or average, is also very commonly used for this, but the median is in a sense more
typical of the data: it is always one of the data values, unlike the mean, and it is less sensitive
to outliers. For instance, the median of a list of a hundred
1
’s is (rightly)
1
, as is the mean.
However, if just one of these numbers gets accidentally corrupted to
10
,
000
, the mean shoots
up above
100
, while the median is unaffected.
Computing the median of
n
numbers is easy: just sort them. The drawback is that this
takes
O
(
n
log
n
)
time, whereas we would ideally like something linear. We have reason to be
hopeful, because sorting is doing far more work than we really need—we just want the middle
element and don’t care about the relative ordering of the rest of them.
When looking for a recursive solution, it is paradoxically often easier to work with a
more
general
version of the problem—for the simple reason that this gives a more powerful step to
recurse upon. In our case, the generalization we will consider is
selection
.
S
ELECTION
Input:
A list of numbers
S
; an integer
k
Output:
The
k
th smallest element of
S
For instance, if
k
= 1
, the minimum of
S
is sought, whereas if
k
=
b
S

/
2
c
, it is the median.
A randomized divideandconquer algorithm for selection
Here’s a divideandconquer approach to selection. For any number
v
, imagine splitting list
S
into three categories: elements smaller than
v
, those equal to
v
(there might be duplicates),
and those greater than
v
. Call these
S
L
,
S
v
, and
S
R
respectively. For instance, if the array
S
:
2
36
5
21
8
13
11
20
5
4
1
is split on
v
= 5
, the three subarrays generated are
S
L
:
2
4
1
S
v
:
5
5
S
R
:
36
21
8
13
11
20
The search can instantly be narrowed down to one of these sublists.
If we want, say, the
eighth
smallest element of
S
, we know it must be the
third
smallest element of
S
R
since

S
L

+

S
v

= 5
.
That is, selection
(
S,
8) =
selection
(
S
R
,
3)
.
More generally, by checking
k
against the sizes of the subarrays, we can quickly determine which of them holds the desired
element:
selection
(
S, k
) =
selection
(
S
L
, k
)
if
k
≤ 
S
L

v
if

S
L

< k
≤ 
S
L

+

S
v

selection
(
S
R
, k
 
S
L
  
S
v

)
if
k >

S
L

+

S
v

.
S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani
65
The three sublists
S
L
, S
v
, and
S
R
can be computed from
S
in linear time; in fact, this compu
tation can even be done
in place
, that is, without allocating new memory (Exercise 2.15). We
then recurse on the appropriate sublist. The effect of the split is thus to shrink the number of
elements from

S

to at most
max
{
S
L

,

S
R
}
.
You've reached the end of your free preview.
Want to read all 36 pages?
 Fall '14
 Complex number, Fast Fourier transform, S. Dasgupta, C.H. Papadimitriou, U.V. Vazirani