Unformatted text preview: CSE 101  A LGORITHMS  S UMMER 2000
Lecture Notes 4
Wednesday, July 12, 2000. 30 3.3.3 Bucket Sort
Like counting sort and radix sort, becket sort also assumes that the input verfies some property. More precisely, it
assumes that the n elements are uniformly distributed over a certain domain; to simplify the exposition we suppose
that they are uniformly distributed over the interval [0; 1). Thus, if we divide the interval [0; 1) in n equalsized
subintervals [0; 1=n), [1=n; 2=n), ..., [(n , 1)=n; 1) called buckets, then we expect only very few numbers to fall into
any of the buckets.
Let B [0; :::; n , 1] be an array of n lists which are initially empty. Then the following algorithm runs in linear time
on the average:
BUCKETS ORT (A) 1 for i
2
3 1 to n do
insert A[i] at its position into the list B [bn A[i]c]
concatenate all the lists B [0], B [1], ..., B [n , 1] Exercise 3.1 Illustrate the execution of BUCKETS ORT (h0:87; 0:21; 0:17; 0:92; 0:81; 0:12; 0:67; 0:52; 0:77; 0:85i).
Exercise 3.2 Informally analyze B UCKETS ORT. 3.4 Problems Related to Sorting
This section presents two interesting and useful problems which are or seem to be very related to sorting. 3.4.1 Binary Search
Searching is a common operation of databases. The search problem can be formulated in various ways, depending on
the type of desired returned information. We only consider an oversimplified version in this subsection:
SEARCH I NPUT: A sequence A = ha1 ; a2 ; :::; an i and an element x. O UTPUT: If x occurs in A then x, otherwise not found. The method to do it seems obvious: visit each element in A and compare it with x; if such alement is found then return
it, otherwise return not found. This algorithm obviously runs in O(n) time, which seems to be good. Unfortunately,
linear algorithms are too slow when bilions of items are involved (imagine Yahoo’s databases). For that reason, the
databases are in general organized in efficient datastructures, thus making operations like search (and many others) run
in logarithminc time. We only consider one of the simplest data structures in this subsection, a sorted array. Therefore,
let us consider the following more refined problem:
SORTEDSEARCH I NPUT: A sorted sequence A = ha1 ; a2 ; :::; an i and an element x. O UTPUT: If x occurs in A then x, otherwise not found. We can now use a simple degenerated divide and conquer algorithm to solve this problem, which is called binary
search:
B INARYS EARCH (A; i; k; x) 1 if i > k then return not found
2 j b i+2k c 3 if x = A[j ] then return x 4 if x < A[j ] then return B INARYS EARCH (A; i; j
5 return B INARYS EARCH (A; j + 1; k; x) , 1; x) 3.5. EXERCISES WITH SOLUTIONS 31 Then B INARYS EARCH (A; 1; n; x) is a solution algorithm for the problem SORTEDSEARCH. From now on in the
course, we’ll write B INARYS EARCH (A; x) instead of B INARYS EARCH (A; 1; n; x) whenever possible.
Exercise 3.3 Illustrate the execution of B INARYS EARCH (A; 3:14) for A
Do the same thing for A = h0:17; 1; 1:27; 2:2; 2:9; 3:13; 3:9; 4:2; 5i. = h0:17; 1; 1:27; 2:2; 2:9; 3:14; 3:9; 4:2; 5i. Exercise 3.4 Why is the input array of B INARYS EARCH required to be sorted?
Exercise 3.5 Write the recurence for B INARYS EARCH and then show that its running time is O(log n). 3.4.2 Medians and Order Statistics 3.5 Exercises with Solutions
Exercise 3.6 Let A = h3; 9; 5; 3; 1; 4; 8; 7i. Illustrate the execution of
1. I NSERTION S ORT (A);
2. S ELECTION S ORT (A);
3. Q UICK S ORT (A; 1; 8); do not illustrate the execution of PARTITION;
4. M ERGE S ORT (A; 1; 8); do not illustrate the execution of M ERGE;
5. H EAP S ORT (A); do not illustrate the executions of BUILD H EAP and H EAPIFY;
6. C OUNTING S ORT (A; 9).
Proof:
We only show how the array A is modified. This is enough to show that you understood the sorting
algorithms. There were many different solutions in your quizes; I condidered them all correct if it was clear that you
understood the algorithms.
1. I NSERTION S ORT h3; 9; 5; 3; 1; 4; 8; 7i
h3; 9; 5; 3; 1; 4; 8; 7i
h3; 5; 9; 3; 1; 4; 8; 7i
h3; 3; 5; 9; 1; 4; 8; 7i
h1; 3; 3; 5; 9; 4; 8; 7i
h1; 3; 3; 4; 5; 9; 8; 7i
h1; 3; 3; 4; 5; 8; 9; 7i
h1; 3; 3; 4; 5; 7; 8; 9i 2. S ELECTION S ORT h3; 9; 5; 3; 1; 4; 8; 7i
h1; 9; 5; 3; 3; 4; 8; 7i
h1; 5; 9; 3; 3; 4; 8; 7i
h1; 3; 9; 5; 3; 4; 8; 7i
h1; 3; 5; 9; 3; 4; 8; 7i
h1; 3; 3; 9; 5; 4; 8; 7i
h1; 3; 3; 5; 9; 4; 8; 7i
h1; 3; 3; 4; 9; 5; 8; 7i
h1; 3; 3; 4; 5; 9; 8; 7i
h1; 3; 3; 4; 5; 8; 9; 7i
h1; 3; 3; 4; 5; 7; 9; 8i
h1; 3; 3; 4; 5; 7; 8; 9i 32
3. Q UICK S ORT The array A is first partitioned as (consider that the pivot is A[1]) h1; 3; 3; 9; 5; 4; 8; 7i. Then Q UICK S ORT (A; 1; 3)
and Q UICK S ORT (A; 4; 8) are called. The first is not going to change the array A, so we only illustrate the second. The subarray h9; 5; 4; 8; 7i is again partitioned (the pivot is 9 now) modifying A to h1; 3; 3; 7; 5; 4; 8; 9i and
then Q UICK S ORT (A; 4; 7) is called. We keep doing this and obtain h1; 3; 3; 4; 5; 7; 8; 9i. 4. M ERGE S ORT h3; 9; 5; 3; 1; 4; 8; 7i
h3; 9; 5; 3i h1; 4; 8; 7i
h3; 9i h5; 3i h1; 4i h8; 7i
h3i h9i h5i h3i h1i h4i h8i h7i
h3; 9i h3; 5i h1; 4i h7; 8i
h3; 3; 5; 9i h1; 4; 7; 8i
h1; 3; 3; 4; 5; 7; 8; 9i 5. H EAP S ORT The procedure BUILD H EAP yields A = h9; 7; 8; 3; 1; 4; 5; 3i. Then the following changes generated by swapings and heapifies end up with the sorted array: h3; 7; 8; 3; 1; 4; 5; 9i
h8; 7; 5; 3; 1; 4; 3; 9i
h3; 7; 5; 3; 1; 4; 8; 9i
h7; 3; 5; 3; 1; 4; 8; 9i
h4; 3; 5; 3; 1; 7; 8; 9i
h5; 3; 4; 3; 1; 7; 8; 9i
h1; 3; 4; 3; 5; 7; 8; 9i
h4; 3; 1; 3; 5; 7; 8; 9i
h3; 3; 1; 4; 5; 7; 8; 9i
h1; 3; 3; 4; 5; 7; 8; 9i
h1; 3; 3; 4; 5; 7; 8; 9i
h1; 3; 3; 4; 5; 7; 8; 9i 6. C OUNTING S ORT The frequency array is F = h1; 0; 2; 1; 1; 0; 1; 1; 1i. After the next step, it becomes F = h1; 1; 3; 4; 5; 5; 6; 7; 8i.
Next, we visit the elements in A from the last one toward the first and output them in B according to F ,
appropriately decreasing the frequencies. We get B = h1; 3; 3; 4; 5; 7; 8; 9i and F = h0; 1; 1; 3; 4; 5; 5; 6; 7i. Exercise 3.7 Insertion sort can be expressed as a recursive procedure as follows. In order to sort A[1::n], we recursively sort A[1::n , 1] and then insert A[n] into the sorted array A[1::n , 1]. Write a recurrence for the running time
of this recursive version of insertion sort.
Proof: Let T (n) be the time needed to sort an array of n elements using this recursive version of insertion sort. In
the recursive step of this version, the same algorithm is run on an array of n , 1 elements, and there is an additional
step of at most n , 1 comparison in order to insert the last element input in the sorted array of n , 1 elements. Then
the recurrence for the running time is T (n) = T (n , 1) + O(n): Exercise 3.8 Exercise 26. in Skiena You can use any sorting algorithms presented so far as subroutines. 1. Let S be an unsorted array of n integers. Give an algorithm which finds the pair x; y
Your algorithm must run in O(n) worstcase time.
2. Let S be a sorted array of n integers. Give an algorithm which finds the pair
Your algorithm must run in O(1) worstcase time. 2 S that maximizes jx , yj. x; y 2 S that maximizes jx , yj. 3. Let S be an unsorted array of n integers. Give an algorithm which finds the pair x; y
for x 6= y . Your algorithm must run in O(n lg n) worstcase time. 2 S that minimizes jx , yj, 3.5. EXERCISES WITH SOLUTIONS 33 4. Let S be an sorted array of n integers. Give an algorithm which finds the pair x; y
for x 6= y . Your algorithm must run in O(n) worstcase time. 2 S that minimizes jx , yj, Proof:
1. We scan through the array once keeping track of the smallest and the largest integers found so far. At the end
we have the smallest and largest integers which when subtracted will get the maximum absolute difference. The
worstcase running time of the algorithm is T (n) = 2n = O(n) because we compare each integer in the array
with the smallest integer and largest integer found so far.
2. We can let y = A[1] and x = A[n]. Since the array is already sorted y will be the smallest element in the array,
and x will be the largest element in the array thus maximizing the difference jx , y j. Accessing an array requires
constant time so the worstcase running time of this algorithm is O(1).
3. We can sort the array using MergeSort which runs in worstcase time O(n lg n). Then we can traverse array
keeping track of the two consecutive array elements which have the smallest difference and are not equal. A
single traversal of the array takes worstcase time O(n). The total worstcase time of the algorithm is O(n log n).
4. Again we traverse the array keeping track of the two consecutive array elements which have the smallest difference and are not equal. The worstcase running time of this algorithm is O(n).
Exercise 3.9 Given an array of real numbers S , find a pair of numbers x; y in S that minimizes jx + y j. Give your
best algorithm for this problem, argue that it is correct and then analyze it. For partial credit you can write an O(n2 )
algorithm.
Proof: A solution in O(n2 ) time would be to generate all pairs x; y , to calculate jx + y j and to select those that give
the minimum.
The O(n log n) solution based on sorting that I expected is the following:
M IN S UM (S ) 1 sort S by the abolute value of its elements
2
3
4 min
for i 1 1 to length(S ) , 1
if jS [i] + S [i + 1]j < min then fmin 5 return x; y . jS [i] + S [i + 1]j; x S [i]; y S [i + 1]g The idea is to sort the numbers by their absolute value. This can be easily done by modifying any sorting algorithm
such that to compare jS [i]j with jS [j ]j instead of S [i] with S [j ]; the running time remains the same.
This algorithm is correct because if two numbers x and y minimize the expression jx + y j, then one of the following
cases can appear:
1. x; y are both positive; then x; y must be the smallest two positive numbers, so they occur at consecutive positions
in the sorted array. 2. x; y are both negative; then x; y must be the largest two negative numbers, so they are consecutive in the sorted
array. 3. One of x; y is positive and the other is negative; then jx + y j
minimized by two consecutive elements in the sorted array. = jjxj , jyjj; but the expression jjxj , jyjj is Therefore, the expression jx + y j is minimized by two elements x; y which occur on consecutive positions in the array
sorted by absolute value of numbers.
The running time of this algorithm is given by the running time of sorting in step 1, because the other steps take
linear time. Hence, the running time is O(n log n). 34
Exercise 3.10 (2.5, Skiena) Given two sets S1 and S2 (each of size n), and a number x, describe an O(n lg n) algorithm for finding whether there exists a pair of elements, one from S1 and one from S2 , that add up to x. Proof: (Solution 1) A (n2 ) algorithm would entail examining each element y1 in S1 and determining if there is an
element y2 in S2 such that y1 + y2 = x.
FINDSUM(S1 ; S2 ; x) 2 S1 do
for yj 2 S2 do 1 for yi
2
3 if yi + yj = x then return (yi ; yj ) Proof: (Solution 2) A more efficient solution takes advantage of sorting as a subroutine. First, we sort the numbers
in S2 . Next, we loop through each element yi in S1 and then do a binary search on the sorted S2 for x , yi .
FINDSUM(S1 ; S2 ; x) 1 M ERGE S ORT(S2 ; 1; n) 2 S1 do
yj ! BINARYS EARCH (S2 ; x , yi )
if yj =
6 not found then return (yi ; yj ) 2 for yi
3
4 The M ERGE S ORT takes time (n lg n). For each element in S1 the B INARYS EARCH on S2 will take O(lg n). Since
this binary search is done n times the for loop requires O(n lg n) worstcase time. The worstcase time for the entire
algorithm is also O(n lg n).
Exercise 3.11 Give an efficient algorithm for sorting a list of n keys that may each be either 0 or 1. What is the order
of the worstcase running time of your algorithm? Write the complete algorithm in pseudocode. Make sure your
algorithm has the stable sorting property.
Proof: Use radix sort (with count sort) for one digit numbers (so k = 1). Then the running time is O(n k ), that is,
O(n). Alternatively, you can create two arrays, one for zeros and another one for ones, scan the input updating the
two arrays, and then append them. The pseudocode is easy.
Exercise 3.12 (From Neapolitan and Naimipour, p.84. Problem 13.) Write an algorithm that sorts a list of n items
by dividing it into three sublists of almost n=3 items, sorting each sublist recursively and merging the three sorted
sublists. Analyze your algorithm, and give the results using order notation.
Proof: We’re doing a threeway merging version of Mergesort. You may write the pseudocode according to your
preferred style, as long as it is clear and correct! I have marked comments with a “#” sign.
Inputs: positive integer n ; array of keys S indexed from 1 to n.
Outputs: the array S containing the keys in nondecreasing order.
procedure MERGESORT3 (n ; var S );
# var will ensure that the changes we make to S will be retained.
const
third = floor (n=3);
var
U : array [1::third];
V : array [1::third];
W : array [1::(n , 2 third)];
UV : array [1::(2 third)];
# This array will consist of the keys in both U and V in sorted
# (nondecreasing) order. This is a stepping stone to merging all three
# sorted sublists.
begin 3.5. EXERCISES WITH SOLUTIONS 35 if n = 2 then
sort(S )
# We assume you know how to write the code for this case
# (when S has only two elements).
elseif n > 1 then
copy S [1] through S [third] to U ;
copy S [third + 1] through S [2 third] to V ;
copy S [2 third + 1] through S [n] to W ;
MERGESORT3(third; U );
MERGESORT3(third; V );
MERGESORT3(n , 2 third; W );
MERGE(third; third; U; V; UV );
# The inputs to MERGE are: h and m (the lengths of the two sorted
# arrays to be merged), the
arrays themselves, and the vared array that
# will contain the keys in the two smaller arrays in sorted (non# larger context
decreasing)
# order. MERGE is O(h + m , 1), which means that it is O(n) in the
of MERGESORT3.
MERGE(2 third; n , 2 third; UV; W; S );
end
end;
The master theorem will aid us in our time complexity analysis. Let us consider W(n), the worstcase time complexity function. The recurrence relation is: W (n) = 3W (n=3) + O(n) (don’t worry about floors and ceilings
here). Remember that MERGE is O(n), with its worstcase W(n) (n) ; the copy operations are also O(n), if you
would like to count those. So, we will use the master theorem with a = 3; b = 3; and f (n) (n) and O(n). Then,
nlogb a = nlog3 3 = n1 = n. We are in case 2 of the master theorem, and thus W (n) (nlgn). kWay Merge Sort
Give an O(n log k ) algorithm to merge k sorted lists into one sorted list, where n is the total number of elements Exercise 3.13
1. in all the input lists. Analyze the time complexity of your algorithm.
2. Analyze a k way merge sort algorithm which first splits the input array in k arrays (instead of 2) of size n=k and
then merges them. Is it better than merge sort?
Proof:
1. (Solution 1) An O(n log k ) algorithm for the problem is to pair the lists and merge each pair, and recursively
merge the k=2 resulting lists. The time is given by the recurrence T (n; k ) = O(n) + T (n; k=2), since the time
to perform the merge for each pair of lists is proportional to the sum of the two sizes, so a constant amount of
work is done per element. Then unwinding the recurrence gives log k levels each with O(n) comparisons, for a
total time of O(n log k ).
Proof: (Solution 2) Informally, another solution is the following. Given k sorted lists L1 ; : : : ; Lk as input,
we want our algorithm to output a sorted array A containing all elements in the lists. As in the algorithm for
merging two lists, our algorithm will put at each step one element in the final array A, starting with the smallest
and ending with the biggest. Then the basic idea is to construct a heap having k elements, one from each list and
use it as a priority queue. More precisely, the ith element in the heap will be the smallest element in the list Li
which has still not been inserted in the final array A. Then at each step our algorithm will do the following: it
will extract the minimum from the heap and insert it in array A; then, if the minimum was an element of list Lj ,
it will insert in the heap the next smallest element in Lj . Notice that to compute whether the minimum of the
heap was in the ith list, we start our algorithm by rewriting each element Li [j ] of the ith list as a pair (Li [j ]; i);
that is, by labeling each element of the list with the number of the list to which it belongs. This labeling phase takes n steps, since there are n elements to be labelled. Moreover, there are n basic insertion
steps since there are n elements to be inserted in A, and for each of these n steps, there is an operation of
extraction of the minimum from the heap and an operation of insertion of an element in the heap, which both 36
take O(log k ) time. The time to construct the heap in the first step is O(k log k ). Then, since k
computation time is O(n log k ). The algorithm is the following:
KMERGE(L1; : : :
1. for i = 1; : : :
2. n, the overall ; Lk )
; k, rewrite all elements Li [j ] of the ith list as a pair (Li [j ]; i); 3. for i = 1; : : : ; k,
4. HeapInsert(H; Li [1]);
(insert the first element of list Li in heap H ).
4. set indi = 2;
5. for i = 1; : : : ; n,
6. A[i]
Heapextractmin(H );
7. ifA[i] = (Lh [j ]; h) for some h then
8.
HeapInsert(H; Lh[indh ]);
(insert the next element of list Lh in heap H ).
9.
set indh
indh + 1;
10. return(A).
2. There will be k subproblems of size n=k and a k way merging which takes O(nlog k ). Therefore, the recurrence
is T (n) = k T (n=k ) + O(n log k ). Using the tree method, for example, we get T (n) = h (n log k ), where
h is the height of the tree, that is, logk n. Hence, T (n) = n logk n log k = n (log n= log k) log k = n log n.
The conlcusion is that k way merge sort is not better than the usual merge sort.
Exercise 3.14 (Convex Hull; page 35, Skiena) Given n points in two dimensions, find the convex polygon of
smallest area that contains them all. Proof: Let (x1 ; y1 ); ::; (xn ; yn ) be n points in two dimensions. Very often in geometrical problems, it is a good idea
to sort the points by one or both coordinates. In our problem, we sort the points (xi ; yi ) such that either xi < xi+1
or xi = xi+1 and yi yi+1 . This sorting assures us that the points are visited from the left to the right and from the
bottom to the top when the counter increases from 1 to n.
The strategy to solve this problem is to iteratively obtain the convex hull given by the points 1::i, where i increases
from 2 to n. Notice that the ith point is always a vertex in the convex hull given by the points 1::i. We have to
understand how the convex hull is modified when a new point is added. First, let us consider two arrays high[1::n]
and low[1::n] where high[i] and low[i] are going to be the high and the low neighbors of i in the convex hull given by
1::i, respectively. The key observation is that if there is a point k 2 1::i , 1 such that k is below the line determined
by the points i and high[k ] then k cannot be a vertex in the convex hull of 1::i; similarly, k cannot be a vertex in the
convex hull of 1::i if k is above the line determined by i and low[k ]. Summarizing all these up, we get the following
algorithm:
Step 1:
Sort the pairs (xi ; yi ) using a standard comparison algorithm (for example M ERGE S ORT)
where the order relation is given by: (xi ; yi ) (xj ; yj ) iff xi < yi or xi = xj and yi yj ,
Step 2:
high[1::n], low[1::n]; high[1] = 0, low[1] = 0
for i = 2 to n k =i,1 while B ELOW(k; i; high[k ]) = true k = high[k]
high[i] = k
k =i,1
while A BOVE(k; i; low[k ]) = true
k = low[k]
low[i] = k 3.5. EXERCISES WITH SOLUTIONS 37 where B ELOW(k; i; j ) is true iff k is below the line determined by the points i and j , and A BOVE(k; i; j ) is true iff k
is above the line det...
View
Full Document
 Fall '08
 staff
 Sort, Lecture Notes

Click to edit the document details