CS 70
Discrete Mathematics for CS
Spring 2005
Clancy/Wagner
Notes 5
DivideandConquer and Mergesort
One of the most basic tasks performed by computers is
sorting
: i.e., given
n
items from a totally ordered
set, put the items into ascending order. You can think of the items as English words, and the order as
alphabetical order: given
n
English words, sorting requires that we put these words into alphabetical order.
To avoid tiresome details, we will assume throughout this lecture that the
n
items to be sorted are all distinct.
How can we devise a general procedure for doing this? After a little thought, most people usually come up
with something like one of the following:
Method 1
By scanning through the list once, find the largest item and place this at the end of the output list.
Then scan the remaining items again to find the second largest, and so on. (This method is generally
known as
selection sort
.)
SELECTION SORT
Method 2
Take the first item and put it in the output list. Then take the second item and insert it in the
correct order with respect to the first item. Continue in this way, each time inserting the next item in
the correct position among the previously inserted items—this position can be found by a linear scan
through these items. (This method is known as
insertion sort
.)
INSERTION SORT
How good are these methods? Well, it’s not too hard to see that they are both
correct
, i.e., they both result
in a correctly sorted version of the original list. [As an exercise, you might like to state this fact formally
and prove it by induction on
n
, for each method.] But how efficient are the methods? Let’s look at selection
sort first. It’s easy to see that the first scan takes exactly
n

1 item comparisons to find the largest element;
similarly, the second scan takes
n

2 comparisons; and so on. The total number of comparisons is thus
(
n

1
) + (
n

2
) +
...
+
2
+
1
=
n

1
∑
i
=
1
i
=
1
2
n
(
n

1
)
,
(where we have used a formula for the sum that we proved by induction in Lecture Notes 1).
What about insertion sort? Well, to insert the second item requires one comparison; to insert the third item
requires (in the worst case) two comparisons; and in general, to insert the
i
th item requires (in the worst
case, where we have to scan the whole list)
i

1 comparisons. Thus the number of comparisons used by the
entire procedure in the worst case is
n

1
∑
i
=
1
i
=
1
2
n
(
n

1
)
,
exactly the same as for selection sort.
Thus the number of comparisons performed by both methods is at most
1
2
n
2

1
2
n
≈
1
2
n
2
for large
n
. Since
comparisons constitute the bulk of the work performed by the algorithm, we can think of
n
2
as a measure of
CS 70, Spring 2005, Notes 5
1