This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Introduction to Algorithms Massachusetts Institute of Technology Professors Erik D. Demaine and Charles E. Leiserson November 4, 2005 6.046J/18.410J Handout 21 Problem Set 5 Solutions
Problem 51. Skip Lists and Btrees Intuitively, it is easier to ﬁnd an element that is nearby an element you’ve already seen. In a dynamicset data structure, a ﬁnger search from to is the following query: given the node in the data structure that stores the element , and given another element , ﬁnd the node in the data structure that stores . Skip lists support fast ﬁnger searches in the following sense.
£ ¡ ¢ £ (a) Give an algorithm for ﬁnger searching from to in a skip list. Your algorithm should run in rank rank time with high probability, where rank denotes the current rank of element in the sorted order of the dynamic set. When we say “with high probability” we mean high probability with respect to rank rank . That is, your algorithm should run in time with , for any . probability Assume that the ﬁngersearch operation is given the node in the bottommost list of the skip list that stores the element .
¢ & % % © § ¥ ¤ 6 D C £ ¥ A % 9 6 6 ¢ ¥ ¢ ¥ £ ¢ £ ¥ ¢ ¢ ¥ ¥ © § ¥ ¤ Solution:
¢ For the purposes of this problem, we assume that each node following ﬁelds:
¢ in the skip list has the We present code for the case where key . The case where k key is sym, i.e., the search starts at the lowest level of metric. We also assume that level , then the ﬁnger search may take the skip list. (If the search starts at level time.) The algorithm proceeds in two phases. In the ﬁrst phase, we ascend levels as rapidly as possible, until we ﬁnd a level that does not allow forward motion.
Y © § ¥ ¤ I ¢ G T U T Y I © ¢ § W G & I ¢ G ¢ ¢ 6 6 ¢ I I ¢ ¢ G G ¢ I I I ¢ G key next level up down
¢ I G ¢ I G ¢ G ¢ G the key associated with node the next element in the linked list containing the level of the linked list containing the element in the linked list of level level the element in the linked list of level level containing the same key as containing the same key as 2
¢ Handout 21: Problem Set 5 Solutions
F INGER S EARCH k 1 while key next k N IL 2 do while up 3 do up 4 next 5 while level 6 do while key next k next 7 do 8 if level 9 then down 10 return
I I ¢ T G ¢ I G I I ¢ G ¢ G W ¥ & I I £ ¢ G § § § G ¢ W G I I I ¢ ¢ ¢ ¢ G ¢ ¢ D G ¥ I ¢ G § G ¢ ¢ (Notice that the search climbs higher than it needs to. A better solution is to examine the next point at each level, and only continue up if the next key is smaller than the target key. As we prove, though, the simpler algorithm above succeeds as well.) We ﬁrst establish the highest level reached during the ﬁnger search. The proof is exactly the same as the proof that a skip list has at most levels.
% © § ¥ ¤ T I ¢ G Lemma 1 While executing F INGER S EARCH , level rank key rank . bility, where
6 U ¥ % I ¢ G ¥ & % , with high proba Proof. Notice that there are at most elements in the skip list in the (closed) key . We calculate the probability that any of these elements interval . exceeds height
6 % ¥ T I 6 % ¥ © § any element in M is in more than
G levels Since none of the elements in the interval are promoted past level , with high probability, and key is always in the interval , the result follows. Notice that in the case of this proof, the high probability analysis is with respect to , not . We now consider the cost of the two phases of the algorithm: ﬁrst the cost of ascending, and then the cost of descending. Recall the lemma proved in class:
% © § ¥ ¤ % © § Lemma 2 The number of coin ﬂips required to see probability. heads is with high Using this lemma, we can see that the ﬁrst phase, ascending levels, completes within steps as there are only levels to ascend and each heads results in moving up a level. This lemma also shows that the second phase completes in steps: as in the analysis of a S EARCH , we calculate the cost in reverse, ascending
% © § ¥ ¤ % © § % © § ¥ ¤ ¢ Y ) ' % 6 # ! % % ¥ 6 © § 4 2 % 6 % % % ¥ ¥ Y T T © § ¥ U ¤ Search for in skip list containing node . I ¢ G I U I % ¥ ¢ G © § G & Handout 21: Problem Set 5 Solutions
from the node containing to the level reached during the ﬁrst phase; since there are levels to ascend, and each heads results in moving up a level, the cost of only the second phase is also . Hence the total running time is as desired.
6 % ¥ © § ¥ ¤ % © § ¥ U ¤ % © § 3 To support fast ﬁnger searches in Btrees, we need two ideas: B trees and level linking. Through. out this problem, assume that A B tree is a Btree in which all the keys are stored in the leaves, and internal nodes store copies of these keys. More precisely, an internal node with children stores keys: the maximum key in ’s subtree, the maximum key in ’s subtree, . . . , the maximum key in ’s subtree. (b) Describe how to modify the Btree S EARCH algorithm in order to ﬁnd the leaf contime. taining a given key in a B tree in Solution: The only modiﬁcation necessary is to always search to the leaves, rather than to return if the key is found in an internal node. B T REE S EARCH 1 2 while and key 3 do 4 if leaf 5 then if and key 6 then return 7 else return N IL 8 else return B T REE S EARCH
¥ I ¢ G I U ¢ G I ¢ G & U ¥ U U I ¢ ¥ 6 ¢ G Y I ¢ G T Y § T I ¢ G 6 § Y © § ¥ ¤ ¢ ¨ U 4 ¨ ¦ ¦ ¦ ¤ 4 6 ¤ U £ 4 ¢ 6 ¥ ¤ & ¡ (c) Describe how to modify the Btree I NSERT and D ELETE algorithms to work for B trees in time. Solution: Normally, during an insert, BT REE S PLITC HILD cuts a node into three pieces: the elements less than the median, the median, and the elements greater than the median. It puts the median in the parent, and the other parts in new nodes. In B trees, the split is unchanged at internal nodes, but not at leaves. The median has to be kept in one of the two leaf nodes—the left one—as well as being inserted into a parent. The following pseudocode implements the new split routine. Y © § ¥ ¤ U ¢ 4 B T REE S PLITC HILD 1 if leaf 2 then A LLOCATE N ODE 3 leaf FALSE 4 5 for to 6 do key key 7 8 9 10 for downto 11 do 12 key key 13 14 key key 15 16 else return BT REE S PLITC HILD
¥ 6 I I £ ¢ G G ¥ 4 ¤ I 2 I £ £ £ ¢ G G £ ¤ 6 £ I £ 6 £ G 6 § § § ¤ I I I I ¢ ¡ 6 § ¥ £ ¢ ¡ ¢ I G G G G I I ¢ 4 ¡ £ £ G ¤ ¢ ¡ G G ¤ 6 Y £ £ Y ¡ ¡ § § § I I § § I § § § § ¢ ¢ ¡ G G G I ¢ ¢ I I I 4 ¡ ¡ £ ¢ § G G G G ¦ ¡ Y Y Y I £ G Handout 21: Problem Set 5 Solutions Unlike a Btree, the old node keeps
¢ The remaining insert routines must be modiﬁed to use the new split routine. A B tree delete consists of two parts: deleting the key from the leaf, which may entail ﬁxing up the tree, and ensuring that a copy of the key no longer appears in an internal node. Each part takes time. The delete begins by descending from the root to the leaf containing the key. If the key being deleted is discovered in an internal node, it is ignored. (Remember these nodes; we will come back to these nodes later, in the second part of the delete.) As in the case of a Btree, we want to ensure that each node along the descent path has at least keys. Recall that a Btree delete consisted of three cases. In particular, case (3) ensured that while descending the tree to perform a delete, a child contained at least keys. Apply case (3) to ensure that each node along the path has at least keys. The only modiﬁcation is that when merging leaves, it is unnecessary to move a key down into the new merged node; the key can simply be removed from the parent. On ﬁnding the key in the leaf, remove it from the leaf. In the second part of the delete, we want to ensure that a copy of the deleted key does not appear in an internal node. First, search for the predecessor to the deleted key. Recall the nodes discovered during the ﬁrst part that contained the key being deleted. Wherever the deleted key appeared along the path, replace it with the predecessor. (Copies of the deleted key can only appear on the roottoleaf path traversed during the deletion.) A levellinked B tree is a B tree in which each node has an additional pointer to the node immediately to its left among nodes at the same depth, as well as an additional pointer to the node immediately to its right among nodes at the same depth. ¢ ¡ Y © § ¥ ¤ ¡ ¡ ¡ keys. £ ¢ Handout 21: Problem Set 5 Solutions
(d) Describe how your B tree I NSERT and D ELETE algorithms from part (c) can be time per operation. modiﬁed to maintain level links in
Y © § ¥ ¤ 5 Solution: Each node now contains a pointer link pointing to its right sibling. The split routine is modiﬁed to include the following two lines:
I £ G § I ¡ G This copies the link of the node being split, to the new node , and updates the link of to point to . Otherwise, the insert operation remains unchanged. During a delete operation, when two nodes are merged, the link pointers must be appropriately updated: the link from the left node (being deleted) must be copied to the link point of the remaining merged node. Otherwise, the delete operation remains unchanged.
¡ £ ¡ £ Solution: The algorithm is quite similar to the solution to part (a). The search proceeds up the tree until the next key is greater than the key being searched for. At this point, the search can continue down the tree in the usual fashion, using links when necessary. key , where represents As before, the code below assumes that initially the pointer to a key in the tree. The opposite case is symmetric. In order to perform the ﬁngersearch efﬁciently, we assume that each node has a parent pointer and . a parent index
I ¢ G £ ¢ ¥ I ¢ G ¢ D U I ¢ G £ £ £ ¥ (e) Give an algorithm for ﬁnger searching from to algorithm should run in rank rank
¢ ¢ ¥ ¥ © § ¥ ¤ ¡ § I £ G 1 link 2 link link in a levellinked B tree. Your time. I ¢ G ¢ 6 B 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Handout 21: Problem Set 5 Solutions
T REE F INGER S EARCH while key and link N IL and do while and key do if and key link then Begin downwards search. while not leaf do if key link then link
I ¢ G § ¢ U I I I ¢ ¢ G G G 4 U ¥ & I U ¢ G ¢ ¥ I 6 I U ¢ ¢ G G I £ ¢ G £ Y I § § § I ¢ T G ¢ ¢ G Y U ¢ else
The key to showing the search efﬁcient is proving that the ﬁrst phase ascends no more levels. In order to prove this fact, notice that is only set to in line than key link . This implies that is larger than every leaf in the subtree 5 when rooted at link . Moreover, every leaf in the subtree rooted at link must be when the search began (since the search moves only in the larger than the initial key direction of increasing keys). Hence, if ascends from level to level (assuming , implying that . That is, the that the leaves are at level 0), then levels. search never ascends more than On the other hand, while searching down the tree, only traverses at most one lateral link. Therefore we can conclude that the total search cost is . These ideas suggest a connection between skip lists and levellinked 234 trees. In fact, a skip list is essentially a randomized version of levellinked B tree. (f) Describe how to implement a deterministic skip list. That is, your data structure should have the same general pointer structure as a skip list: a sequence of one or more linked lists with pointers between nodes in adjacent lists that store the same key. The S EARCH algorithm should be identical to that of a skip list. You will need to modify the I NSERT operation to avoid the use of randomization to determine whether a key should be promoted. You may ignore D ELETE for this problem part. Solution: I ¢ I G I £ ¢ G 6 % G 4 © § ¥ % ¤ © § ¥ & ¤ ¢ ¢ U ¤ 2 ¡ ¢ D % © § % ¥ ¤ I ¢ G I I ¢ G I I G ¢ 4 G G 4 % © U § ¥ ¤ I ¢ G U 6 I ¢ G while do if key then return else return N IL
Y § ¢ T ¥ I ¢ G & U and key I I ¢ G key link
G 4 I I I ¢ G ¢ G 6 6 G 4 § § § I ¢ ¢ G D U Handout 21: Problem Set 5 Solutions
The resulting deterministic skiplist is exactly the levellinked B tree that was just developed in the prior parts. We repeat the same data structure here in a slightly different form, primarily to help see the connection between the two data structures. The data structure consists of a set of linked lists, just as a typical skip list. The . The S EARCH algorithm remains exactly as in the regular lists all begin with skiplist data structure. Each node in a linked list is augmented to include a count of consecutive nonpromoted nodes that follow it in the list. For example, if the next node in the linked list is promoted, the count is 0; if the next node is not promoted, but the following node is promoted, the count is 1. The goal is to ensure that the count of a promoted node is at least and no more than . (The slightly different values as compared to a Btree result from the slightly different deﬁnition of the count.) We now describe how the I NSERT proceeds. The I NSERT operation begins at the top level of the skip list at , and proceeds down the skip list following the usual skip list search. On reaching the lowest level, the new element is inserted into the linked list, and the counts are updated. (Notice that no more than counts must be updated.) If updating the counts causes the previous promoted node to have a count , then the node with count is promoted. This continues recursively larger than up the tree, adding a new level at the highest level if needed. In this way, we ensure that the skip list is always close to a perfect skip list, without using any randomization. Problem 52. Fun with Points in the Plane
6 ¡ 6 ¡ ¡ 6 ¡ 6 ¡ 7 It is 3 a.m. and you are attempting to watch 6.046 lectures on video, looking for hints for Problem Set 5. For some odd reason, possibly because you are fading in and out of consciousness, you start to notice a strange cloud of black dots on an otherwise white wall in your room. Thus, instead of watching the lecture, your subconscious mind starts trying to solve the following problem. Let be a set of points in the plane. Each point has coordinates and has a weight (a real number representing the size of the dot). Let be and weight to a real number, an arbitrary function mapping a point with coordinates computable in time. For a subset of , deﬁne the function to be the sum of over all points in , i.e.,
£ £ ¥ © ¢ % ¥ £ ¢ ¥ © & £ ¥ % © £ ¥ £ ¢ ¥ ¦ £ ¥ © £ ¢ & ¢ ¥ £ Y § ¥ £ ¦ 6 % ¦ ¥ ¦ ¤ ¤ £ 4 £ £ & ¢ Our goal is to compute the function for certain subsets of the points. We call each subset a . Because there may be a large number of query, and for each query , we want to calculate queries, we want to design a data structure that will allow us to efﬁciently answer each query. First we consider queries that restrict the coordinate. In particular, consider the set of points . Formally, let be the set of points whose coordinates are at least
¦ § & $ " ¢ & $ D " ¢ ¥ ¢ ¢ 1 £ £ ¢ & & $ " & $ ¢ ¥ " ¢ ¢ ¥ Y ¥ % & For example, if depicted in Figure 1.
£ ¥ © , then ¢ is the sum of the weights of all points. This case is 8 Handout 21: Problem Set 5 Solutions
0 I ¤ % " 5 ¢ § G ¢ % & % © (a) Show how to modify a balanced binary search tree to support such a query in time. More speciﬁcally, the computation of can be performed using only a single walk down the tree. You do not need to support updates (insertions and deletions) for this problem part.
Y © § ¥ ¤ & $ " ¢ ¥ ¥ Solution: For this part of the problem, we present two different solutions using binary tree. In the ﬁrst version, we store keys only at the leaves of the binary tree, while in the second version we store keys at every node. The solutions are nearly identical. Let be the sum of for all points represented by the subtree rooted at . If is a leaf, then is simply the value of for the point stored in . For both . The idea behind a query solutions, we augment each node in the tree to store in the binary tree, computing the desired sum along the way. is to search for The ﬁrst solution applies when the keys are stored only at the leaves. If we reach a node in our search and key , then recurse on the right subtree. Otherwise, to the answer from recursion on the left subtree. we add the value right
& $ " ¢ £ I ¡ G I ¡ G ¥ X ¡ ¡ ¡ ¡ ¥ X £ ¥ © £ ¡ £ ¥ © ¡ ¥ & X $ " ¢ ¡ ¥ X ¡ 6 & & $ ¢ $ We want to answer queries of the following form: given any value . Figure 2 is an example of such a query. In this case, value of interest are those with coordinate at least .
" ¢ 6 ¢ & $ " ¢ ¥ ¥ as input, calculate the , and the points W & 6 ¥ ¥ and . & 6 ¥ 6 & & $ " W £ V £ U £ T £ S £ R £ £ Figure 1: In this example of eight points, if , then .
F & ¢ ¥ % & £ ¥ © Figure 2: When , ( ¤ # ' 0 © ¡ # ¤ ¦ © ¡ ¤ ¦ © % ¤ ¡ 8 B ¦ © % ¡ ¤ ¦ ¤ P ¡ © ¡ ¨ ¤ % ¦ C ¦ 8 8 © ¡ ¤ % A 8 © ¡ ( ¤ # ' 0 © ¡ # ¤ ¦ © ¡ ¤ ¦ © % ¤ ¡ 8 B ¦ © ¡ ¦ ¤ 0 ¤ % 5 ¢ % % © ¡ © ¡ ¨ ¤ % ¦ C ¦ 8 8 © ¡ ¤ % A 8 © ¡ Handout 21: Problem Set 5 Solutions
FW ITH X MIN V1 Keys stored at leaves. N IL 1 if 2 then return 3 if leaf 4 then if key return 5 else return 0 6 if key 7 then return FW ITH X MIN V1(right ) FW ITH X MIN V1(left 8 else return right
¢ I G ¡ & $ " ¢ I G ¡ ¥ : ¢ I ¡ G & $ & $ " " ¢ ¢ ¥ D ¡ W : I ¥ & ¡ $ G " ¢ £ I I ¡ ¡ G G & ¡ 9 )
& Y G ¢ Note that in the last step, we could also avoid visiting ’s right child by noticing that right left . When the keys can be stored at internal nodes in the tree, the only modiﬁcation to the query is that we have to look at the value of for the node we are currently ). visiting (in a slight abuse of notation, call this FW ITH X MIN V2 Keys stored at every node. N IL 1 if 2 then return 3 if key 4 then return FW ITH X MIN V2(right ) right FW ITH X MIN V2(left 5 else return
¡ & $ " ¢ I G I ¡ G ¢ ¥ : & $ " ¢ ¡ ¥ ¡ © W ¥ & $ " ¢ £ I ¡ G & ¡ ¡ £ ¡ ¥ ¥ © © I ¡ G ¥ : ¡ ¥ : & I ¡ G ¥ : As before, in the last recursive call, we can avoid visiting ’s right child by noticing right left . that Finally, one could prove the correctness of FW ITH X MIN by induction on the height of the tree. (b) Consider the static problem, where all points are known ahead of time. How long does it take to build your data structure from part (a)? Solution: Constructing the data structure requires because we are effectively sorting the points. One solution is to sort the points by coordinate, and then construct a balanced tree by picking a median element as the root and recursively constructing the left and right subtrees. This construction takes time to sort the points and to construct the tree. (c) In total, given points, how long does it take to build your data structure and answer different queries? On the other hand, how long would it take to answer different queries without using any data structure and using the na¨ve algorithm of computing ı from scratch for every query? For what values of is it asymptotically more efﬁcient to use the data structure?
U U Y & $ " ¢ ¥ ¥ U Y © § ¥ ¤ Y © § Y ¥ ¤ Y ¥ ¤ Y I ¡ G ¥ : ¡ ¥ : & I ¡ G ¥ : ¡ ¥ © & $ " ¢ I $ " ) 10 Handout 21: Problem Set 5 Solutions
Solution: Using parts (a) and (b), we require time to build the data ı structure and then answer queries. The na¨ve algorithm to compute from scratch is for each query, to scan through all points and sum those with . This second algorithm requires time. Intuitively, from coordinate at least , then building the data structure these expressions, we can conclude that if will be more efﬁcient than computing from scratch. We were not looking for a formal proof for this part, but one way to give such an and be the times required to answer argument is as follows. Let queries on points, for the ﬁrst and second algorithms, respectively. Then we want to give sufﬁcient conditions on such that asymptotically. .
Y © § U Y © § Y ¥ ¤ 4 £ £ U Y ¥ 4 £ Y Y © § ¥ ¡ & be the constants for the , for the ﬁrst algoProof. Let and for the second algorithm. rithm, and let and be the constants for Then we want to give sufﬁcient conditions on and that give us
¦ Y U ¥ ¤ £ Y U Y ¤ £ Y U U ¥ Y ¦ © § U Y © § Y ¥ 4 T U ¤ Y Y Y ¥ 4 ¤ 4 If , then by deﬁnition, choosing the constant to be , we can ﬁnd a such that for all , is large enough to satisfy the above condition. value Therefore, if , for all . (d) We can make this data structure dynamic by using a redblack tree. Argue that the augmentation for your solution in part (a) can be efﬁciently supported in a redblack tree, i.e., that points can be inserted or deleted in time. Solution: at every When keys are stored at every node, the augmentation of maintaining node in a redblack tree is almost exactly the same as the augmentation to maintain the sizes of subtrees for the dynamic orderstatistic tree.
¡ ¥ : Y © § ¥ ¤ £ Y ¤ Y 9 4 U Y ¥ ¤ £ U Y ¥ 4 § 5 U Y 5 4 Y Y ¤ D Y Y 4 Y £ © § & £ Y Y ¥ 5 ¡ Y & U D Y 9 Y ¤ T Y © § ¦ 4 Y © § 4 Y 4 ¤ D Y U § 4 Y ¤ Y 4 Y 4 £ Y Let
© § be a constant such that for all , we know that , ¦ Y © § Y 4 © § Y 4 Y ¤ U § ¤ Y 4 Y £ © § D Y If , then we know Y D Y Lemma 3 If for all
U U Y ¥ ¤ , then there exists a constant such that . Then, for all U ¢ & $ " ¢ ¥ ¥ Y Y © U § ¥ U U Y ¥ ¤ Y Y Y © § © T § Y U ¥ ¥ U ¡ Y ¥ Y & ¤ ¥ 4 U U Y ¥ 4 U U & $ " ¢ Y Handout 21: Problem Set 5 Solutions
When we insert a point to the tree, we increment by for the appropriate nodes as we walk down the tree. For a delete, when a point is spliced out of the tree, we walk up the path from to the root and decrement by . In the cases where we delete a node by replacing it with its predecessor or successor , then by , and then recursively we walk up the path from and increment delete from the tree rooted at . Maintaining during rotations is also similar. For example, after performing a leftrotation on a tree that originally had as the root and as its right child (see Figure 14.2 on p.306), we update and by and left right . Note that for the ﬁrst type of solution, when keys are stored only at the leaves in a redblack, we also need to maintain the maximum value of the left subtree of . We should also argue that this property can be maintained. Similar arguments for normal tree inserts and deletes and for rotations will work.
¡ I ¢ G ¥ : § ¢ ¥ : ¢ ¥ £ : § £ ¥ : £ ¥ : ¢ ¢ ¥ : ¢ ¥ ¡ ¥ © : I ¢ G ¥ : £ £ ¥ © £ £ ¥ ¡ £ © ¥ ¥ © : ¡ ¥ £ ¥ : © ¡ ¥ : £ £ £ £ £ ¡ £ 11 Next we consider queries that take an interval (with ) as input . Let be the set of points whose coordinates fall in that instead of a single number interval, i.e.,
¦ ¤ " ¢ T & $ " ¢ ¢ I ¦ ¦ ¤ § " I ¦ ¢ ¤ & " $ ¢ " ¢ & G $ " ¢ & G 1 ¢ £ £ £ ¥ & £ ¥ & $ " ¢ See Figure 3 for an example of this sort of query.
¥ ¥ We claim that we can use the same dynamic data structure from part (d) to compute Solution: First, we ﬁnd a split node for and . The split node satisﬁes the property that it is the node of greatest depth whose subtree contains all the nodes whose keys are in the interval . We ﬁnd a split node by following the normal search and until their paths diverge. algorithm for
¨ ¦ ¤ " ¢ & $ " ¢ I ¦ ¤ ¦ " ¤ ¨ ¢ " ¢ & $ " ¢ G & $ " ¢ & $ Y " © ¢ § (e) Show how to modify your algorithm from part (a) to compute in time. Hint: Find the shallowest node in the tree whose coordinate lies between and .
¥ ¤ ¥ ¥ ¢ ¦ ¤ " ¢ £ £ £ . 12
0 % © Handout 21: Problem Set 5 Solutions .
Keys stored at leaves. F IND S PLIT N ODE V1 N IL 1 if 2 then return N IL 3 if leaf 4 if ( key ) return 5 else return N IL 6 if key 7 then return F IND S PLIT N ODE V1(left ) 8 if key 9 then return F IND S PLIT N ODE V1(right ) 10 return z
¦ ¤ ¦ " ¤ ¢ " ¢ & $ & " $ ¢ " ¢ I I G G ¢ ¡ ¦ ¤ " ¢ ¦ ¤ & $ " ¢ " ¢ T ¡ I ¥ ¡ G I & ¡ $ G " ¢ T £ & T $ I I " ¡ ¦ ¡ G G ¤ ¢ " & ¢ ¥ ¥ ¡ W & ¥ £ ¥ ¥ If F IND S PLIT N ODE returns N IL , then the interval. Otherwise, to compute
¥ because there are no points in , we can use the function FW ITH X MIN ¦ ¤ " ¢ T I ¡ F IND S PLIT N ODE V2 Keys stored at every node. N IL 1 if 2 then return N IL 3 if key 4 then return F IND S PLIT N ODE V2(left ) 5 if key 6 then return F IND S PLIT N ODE V2(right ) 7 return If we get here, key
G T & $ ¦ " ¤ ¦ ¢ " ¤ ¢ " ¢ & $ & " $ ¢ " ¢ I I G G £ ¢ ¢ ¦ ¤ " ¢ & $ " ¢ ¡ ¥ I & ¡ $ G " ¢ £ £ I ¡ ¡ ¦ G ¤ " & ¢ ¥ ¥ ¡ . § W £ S £ R £ £ & & ¥ I F ¦ £ ¥ ¥ G ¥ 6 & ¥ ¥ and
§ . and & I F ¦ ¤ 6 G & £ & ¥ I F ¦ ¤ £ 6 G & Figure 3: When
W £ V £ U £ S £ R £ £ , Figure 4: For , and 0 § ( ¤ ¤ # ' ¢ 0 © ¡ G # ¦ ¦ ¤ 8 ¤ © ¤ ¦ © ¡ % P ¤ I ¡ 8 G B ¦ ¨ 0 © ¡ § ¦ ¦ ¤ ¤ 0 ¤ % ¢ 5 ¢ % % © % ¡ G © ¡ ¤ ¨ £ P I G ¨ ¤ % ¦ C ¦ 8 8 © ¡ ¤ % A 8 © ¡ ( % ¤ # ¤ ' 0 ¢ © ¡ £ G # ¦ ¤ ¤ © ¦ © ¡ % ¤ ¡ 8 B ¦ © % ¡ ¤ ¦ ¤ P I ¤ % 5 G ¢ % ¡ £ © ¡ ¨ ¤ % ¦ C ¦ 8 8 © ¡ ¤ % A 8 © ¡ Handout 21: Problem Set 5 Solutions
from part (a) and a corresponding function FW ITH X MAX (this function computes ). Keys stored at leaves. FW ITH X MAX V1 N IL 1 if 2 then return 3 if leaf 4 then if key return 5 else return 0 6 if key 7 else return left FW ITH X MAX V1(right ) 8 then return FW ITH X MAX V1(left I G ¦ ¤ " ¢ I G ¡ ¥ : ¢ ¢ ¦ I ¦ ¦ ¤ ¤ ¤ ¡ G " " " ¢ ¢ ¢ ¥ T ¡ ¡ ¥ ¥ W : I ¦ ¡ ¤ G " ¢ £ I I ¡ ¡ G G & ¡ I ¦ ¤ " ¢ G ¥ ¥ 13 )
¦ When keys are stored at both leaves and internal nodes, we have
¦ ¦ ¤ " ¢ I ¨ G ¥ & $ " ¢ I ¨ G ¥ ¨ ¥ © & £ FW ITH X MIN V2 left
¦ ¤ " ¢ & $ " FW ITH X MAX V2 right Finally, we generalize the static problem to two dimensions. Suppose that we are given two interand . Let be the set of all points in this rectangle, vals, i.e., and
¦ § 1 £ £ £ ¥ 1 ¢ I £ ¦ £ ¤ " & £ & $ " £ G £ ¥ & I ¦ ¤ " ¢ & $ " ¢ G & £ ¦ ¤ " ¢ I ¡ G ¥ & $ " ¢ I ¡ G FW ITH XV2 1 if N IL 2 then return 0 3 F IND S PLIT N ODE V2( ) N IL 4 if 5 then return 0 6 else return FW ITH X MIN V2 left
¥ ¦ ¤ " ¢ & $ " ¢ ¡ ¨ ¥ © ¢ ¡ ¥ & & § ¡ ¨ ¨ FW ITH XM AX V2 right ¦ ¦ ¤ " ¢ I ¨ G ¥ & $ " ¢ I ¨ G FW ITH X MIN V1 left
¥ & FW ITH X MAX V1 right ¨ If we have a split node , we know all nodes in the left subtree of , and all nodes in the right subtree have keys at least . If all keys are stored at the leaves, we have
& $ " ¢ ¨ £ ¥ ¦ ¥ ¤ " ¢ have keys at most ¦ ¤ " ¢ I G Keys stored at every node. FW ITH X MAX V2 N IL 1 if 2 then return 3 if key 4 then return FW ITH X MAX V2(left ) left FW ITH X MAX V2(right 5 else return
¦ ¤ " ¢ I G I ¡ G ¥ : ¡ ¥ © W ¦ ¤ " ¢ I ¡ G & ¡ ¤ " ¢ ) ¥ ¥ 14 See Figure 4 for an example of a twodimensional query. Handout 21: Problem Set 5 Solutions (f) Describe a data structure that efﬁciently supports a query to compute for time. Hint: Augment a arbitrary intervals and . A query should run in range tree. Solution: We use a 2d range tree with primary tree keyed on the coordinate. Each node in this tree contains a pointer to a 1d range tree, keyed on the coordinate. as in part (a). We augment each of the nodes in the trees with the same To perform a query, we ﬁrst search in the primary tree for all nodes/subtrees whose . Then, for each of these points have coordinates that fall in the interval nodes/subtrees, we query the tree to compute for all the points in that tree whose coordinate falls in the interval . We can use the exact same pseudo code from parts (a) and (e) if we replace every occurrence of with FW ITH Y ytree , where FW ITH Y is the query on a 1d range tree. The algorithm for queries is the same as in part (a), except replacing the constanttime function calls with the calls to query ytree . Since each query in a tree takes time, the total runtime for a query is . (g) How long does it take to build your data structure? How much space does it use? Solution: Since we are only adding a constant amount of information at every node in a tree . or tree, the space used is asymptotically the same as a d range tree, i.e., because every point appears in For a 2d range tree, the space used is trees. A simple algorithm for constructing a d range tree takes time. First, sort time as in part (c). the points by coordinate and construct the tree in Then, for every node in the tree, construct the corresponding tree, using the algorithm from (c) by sorting on the coordinates. The recurrence for the time to construct all trees is , or . It is possible to construct a 2d range tree in with a more clever solution. Notice that in the 2d range tree, for a node in the tree, the trees for the left and right subtrees of are disjoint. This observation suggests the following algorithm. First, sort all the points by their coordinate and construct the primary range tree as before. Then, sort the points by coordinate, and use that ordering to construct the largest tree for the root node . To get the trees for left and right , perform a stable partition on the coordinates of the array of points about xkey , the key of in the tree. This partition allows us to construct the trees for the left and right subtrees of without having to sort the points by coordinate again. Therefore, the , or . recurrence for the runtime is
Y Y © © § § ¥ £ Y ¤ ¥ ¤ I I Y ¡ ¡ G ¤ G © § £ Y ¥ Y Y ¤ ¤ © © § § £ Y & Y Y © I ¥ ¥ § ¡ G Y Y ¤ ¤ ¥ ¥ £ ¤ Y Y ¢ © ¥ § £ Y ¤ Y ¥ © § ¤ ¢ Y ¥ £ Y ¡ 9 ¤ © § Y ¥ Y ¥ £ ¤ 9 Y & ¥ £ ¢ Y ¢ ¡ ¥ & Y ¥ ¢ ¡ ¡ ¢ £ ¡ ¢ £ ¢ ¡ £ £ £ £ ¢ ¡ ¥ I : ¦ ¤ ¢ ¦ ¤ Y ¢ ¤ ¢ £ G I © ¡ § G ¥ ¤ £ I ¦ I ¤ ¡ G £ £ ¥ £ G £ ¡ ¥ : ¢ ¢ ¡ Y © § ¡ ¥ ¥ ¤ £ : ¥ ¥ Y ¤ © § ¥ ¤ £ £ Handout 21: Problem Set 5 Solutions
Unfortunately, there are problems with making this data structure dynamic. (h) Explain whether your argument in part (d) can be generalized to the twodimensional case. What is the worstcase time required to insert a new point into the data structure in part (f)? Solution: We can’t generalize the argument in part (d) to a 2d range tree because we can not easily update the tree. Performing a normal tree insert on a single or tree time. Performing a rotation on a node in the tree, however, can be done in requires rebuilding an entire tree. For example, consider the diagram of leftrotation in CLRS, Figure 13.2 on pg. 278. The tree for after the rotation contains exactly the same nodes as tree for before the rotation. To ﬁx the tree rooted at after the rotation, however, we must remove all the points in from ’s original tree and add all the points in . In the worst case, if is at the root, then the subtrees and will have nodes. time. Since an update to a Thus, performing a rotation in tree might require for an redblack tree requires only a constant number of rotations, the time is update. (i) Suppose that, once we construct the data structure with initial points, we will perform at most updates. How can we modify the data structure to support both queries and updates efﬁciently in this case? Solution: If we do not have to worry about maintaining balance in any of the trees or trees, then we can use a normal tree insert to add each new point into the tree and the time per update because we corresponding trees. This algorithm requires have to add a point to trees. As a side note, when we are performing a small number of updates, it is actually possible to perform queries in time and handle updates in time! In this . Insert and scheme, we augment the range tree with an extra buffer of size delete operations are lazy: they only get queued in the buffer. When a query happens, it searches the original 2d range tree and ﬁnds an old answer in time. It then updates its answer by scanning through each point in the buffer. Thus, we can actually support updates without changing the asymptotic running time for queries. Completely Optional Parts
£ Y ¢ 6 Y ¤ ¥ ¤ © § ¢ © ¤ ¥ § ¥ ¤ ¤ Y ¤ © § ¥ ¤ Y ¤ © § ¥ ¤ £ Y © § ¥ ¤ £ Y ¤ © § ¥ ¤ Y Y © § ¥ ¤ £ ¢ Y Y ¥ £ ¢ ¥ ¤ ¤ ¢ £ £ Y ¥ C ¦ £ £ ¢ ¢ £ ¢ Y ¢ C £ © § ¥ ¤ 15 The remainder of this problem presents an example of a function that is useful in an actual application and that can be computed efﬁciently using the data structures you described in the previous parts. Parts (j) through (l) outline the derivation of the corresponding function . The remainder of this problem is completely optional. Please do not turn these parts in!
£ ¥ © 16
¢ Handout 21: Problem Set 5 Solutions As before, consider a set of points in the plane, with each point having and a weight . We want to compute the axis that minimizes the moment of coordinates inertia of the points in the set. Formally, we want to compute a line in the plane that minimizes the quantity
£ ¤ § £ Y ¥ ¤ £ § ¥ % £ ¦ ¥ ¡ ¦ ¦ ¤ £ % 4 £ £ & £ ¢ ¥ plane is to describe it using a pair , (j) One parameterization of a line in the where is the distance from the origin to the line and is the angle the line makes with the axis. It can be shown that the distance between a point and a line parameterized by is We deﬁned the orientation of the set of points the function
¢ ¥ % ¥ ¡ & © ¥ © where
¦ £ % ¥ ¡ & 4 6 ¢ % ¥ ¡ & 4 2 % ¥ ¡ & £ Solution:
¤ © £ ¢ ¥ % ¥ & © ¥ © ¦ W & G © £ 4 6 G G G © 4 2 W © 9 © Set equal to to ﬁnd and , we have ¦ B % 4 ¥ ¡ @ © 4 B W £ & % ©
4 £ ¥ ¡ © @ 4 4 £ 6 B ¢ % 4 ¢ ¥ ¡ ¥ 4 @ 2 % ¥ W 4 4 ¥ ¡ ¡ 4 & 1 ) ( ( Show that setting gives us the constraint © ¥ ¦ ¤ & © as the line
that minimizes © ¥ £ ¢ ¥ 6 & % ¦ © £ ¢ £ £ ¢ £ where is the distance from point axis as the “orientation” of the set.
£ ¥ 4 4 ¢ © ¥ & & © © 7 7 ¢ 7 © 7 ¤ to the line . If for all , we can think of this 4 4 6 2 § ¤ 6 ¤ 2 ¥ £ ¤ ¤ & © G £ 4 4 ¢ 6 2 6 2 £ ¥ £ £ 6 2 & ¤ 6 ¤ 2 @ G G B 4 4 4 4 6 2 6 2 ¤ ¤ £ W & 4 4 4 4 6 2 G G ¥ 6 2 G G ¥ ¤ ¤ ¤ ¤ 6 2 ¤ 6 ¤ 2 ¥ G G £ W & 4 4 4 4 G 6 G 2 ¥ G 6 G 2 ¥ 6 2 ¤ 6 ¤ 2 ¥ G G £ & © G 4 4 G 6 G 2 G © ¤ § £ ¢ ¥ © ¥ £ ¢ Handout 21: Problem Set 5 Solutions (k) Show that setting To ﬁnd the extrema where both partial derivates are 0, we plug in the expression from the previous part to eliminate .
G G © ¥ 4 4 ¦ 6 2 ¥ © 6 2 ¤ 6 ¤ 2 ¥ & 4 ¡ ¤ § £ ¢ ¥ © £ ¢ £ ¢ ¥ % & ¤ ¤ £ £ ¥ 4 ¡ £ ¢ ¥ % & ¤ ¤ ¤ ¤ £ £ ¥ 4 ¡ £ ¢ ¥ © £ ¢ ¥ % ¥ & 7 © ¥ 7 W 9 © Solution: This question is also doable, assuming the solution given above is actually correct. You may want to doublecheck the math for this solution. We compute and set it equal to .
7 7 4 4 4 ¡ ¡ ¡ ¦ £ % & ¤ 6 ¢ % & ¤ 2 £ ¢ % & 6 2 ¤ ¤ ¥ ¥ ¥ 4 4 2 6 ¤ 6 ¤ 2 ¥ £ ¤ ¤ & © 4 4 ¢ 6 2 6 2 £ ¥ ( W & ) ( where and using the constraint from part (j) leads to the equation 17 18 Handout 21: Problem Set 5 Solutions
(l) Give the function that makes the orientation problem a special case of the probis a vectorvalued function. lem we just solved. Hint: The function
¤ £ ¤ ¢ £ ¢ £ ¢ 6 ¥ % & £ ¥ © Solution: Use £ ¥ © £ ¥ © . ...
View
Full
Document
This note was uploaded on 02/01/2010 for the course COMPUTERSC 6.046J/18. taught by Professor Erikd.demaineandcharlese.leiserson during the Fall '05 term at MIT.
 Fall '05
 ErikD.DemaineandCharlesE.Leiserson

Click to edit the document details