This preview shows page 1. Sign up to view the full content.
Unformatted text preview: 18.415 6.854 Advanced Algorithms
Lecturer: Michel X. Goemans September 1994 Randomized Algorithms 1 Introduction
We have already seen some uses of randomization in the design of on-line algorithms. In these notes, we shall describe other important illustrations of randomized algorithms in other areas of the theory of algorithms. For those interested in learning more about randomized algorithms, we strongly recommend the forthcoming book by Motwani and Raghavan. 9 . First we shall describe some basic principles which typically underly the construction of randomized algorithms. The description follows a set of lectures given by R.M. Karp 7 . 1. Foiling the adversary. This does not need much explanation since this was the main use of randomization in the context of on-line algorithms. It applies to problems which can be viewed as a game between the algorithm designer and the adversary. The adversary has some payo function running time of the algorithm, cost of solution produced by algorithm, competitive ratio, ... and the algorithm designer tries to minimize the adversary's payo . In this framework randomization helps in confusing the adversary. The adversary cannot predict the algorithm's moves. 2. Abundance of witnesses. Often problems are of the type does input have property p?" for example. Is n composite?". Typically the property of interest can be established by providing an object, called a witness". In some cases, nding witnesses deterministically may be di cult. In such cases randomization may help. For example if there exists a probability space where witnesses are abundant then a randomized algorithm is likely to nd one by repeated sampling. If repeated sampling yields a witness, we have a mathematical proof that the input has the property. Otherwise we have strong evidence that the input doesn't have the property, but no proof. A randomized algorithm which may return an incorrect answer is called a Monte-Carlo algorithm. This is in contrast with a Las Vegas algorithm which is guaranteed to return the correct answer. 3. Checking an identity For example, given a function of several variables f x1; : : : ; xn, is f x1; : : : ; xn 0? One way to test this is to generate a random vector a1; : : : ; an and evaluate f a1; : : : ; an. If its value is not 0, then clearly f 6 0. Random-1 4. Random ordering of input The performance of an algorithm may depend upon the ordering of input data; using randomization this dependence is removed. The classic example is Quicksort, which takes On2 time in the worst case but when randomized takes On lg n expected time, and the running time depends only on the coin tosses, not on the input. This can be viewed as a special case of category 1. Notice that randomized quicksort is a Las Vegas algorithm; the output is always correctly sorted. 5. Fingerprinting This is a technique for representing a large object by a small ngerprint. Under appropriate circumstances, if two objects have the same ngerprint, then there is strong evidence that they are identical. An example is the randomized algorithm for pattern matching by Karp and Rabin 8 . Suppose we are given a string of length n such as randomizearandomlyrandomrandomizedrandom and a pattern of size m such as random. The task is to nd all the places the pattern appears in the long string. Let us rst describe our model of computation. We assume a simpli ed version of the unit-cost RAM model in which the standard operations +; ,; ; =; ; = take one unit of time provided they are performed over a eld whose size is polynomial in the input size. In our case, the input size is On + m = On and thus operations on numbers with Olog n bits take only one unit of time. A naive approach is to try starting at each location and compare the pattern to the m characters starting at that location; this takes Onm time in our model of computation we cannot compare two strings of m characters in O1 time unless m = Olog n. The best deterministic algorithm takes On + m time, but it is complicated. There is, however, a fairly simple On + m time randomized algorithm. Say that the pattern X is a string of bits x1; : : : ; xm and similarly Y = y1; : : : ; yn. We want to compare X , viewed as a number to Yi = yi; : : : ; yi+m,1 . This would Random-2 Suppose we can generate random vectors a1; : : : ; an under some probability distribution so that 1 P = Pr f a1; : : : ; an = 0jf 6 0 2 ; or any other constant bounded away from 1. Then we can determine whether or not f 0 with high probability. Notice that this is a special case of category 2, since in this probability space, vectors a for which f a1; : : : ; an 6= 0 constitute witnesses". normally take Om time, but it can be done much more quickly by computing ngerprints and comparing those. To compute ngerprints, choose a prime p. Then the ngerprint of X is hX = X mod p and similarly hYi = Yi mod p. Clearly hX 6= hYi X 6= Yi. The converse is not necessarily true. Say that we have a false match if hX = hYi but X 6= Yi. A false match occurs i p divides jX , Yi j. We show that if p is selected uniformly among all primes less than some threshold Q then the probability of a small match is small. First how many primes p divide jX , Yij? Well, since every prime is at least 2 and jX , Yij 2m, we must have at most m primes dividing jX , Yij. As a result, if p is chosen uniformly at random among fq : q prime and q Qg then Pr hX = hYi jX 6= Yi m , Q where n denotes the number of primes less or equal to n. Thus, the probability that there is a false match for some i is upper bounded by n times m . Q Since n is asymptotically equal to n= ln n, we derive that this probability is O lnnn if Q = n2m. This result can be re ned by using the following lemma and the fact that there is a false match for some i if p divides Q jX , Y j 2nm .
i i Lemma 1 The number of primes dividing a 2n is at most n + O1.
The re ned version is the following: Theorem 2 If p is chosen uniformly at random among fq : q prime and q n2mg,
then the probability of a false match for some i is upper bounded by
2+O1 . n The ngerprint has only lgn2m bits, much smaller than m. Operations on the ngerprints can be done in O1 time in our computational model. The advantage of this approach is that it is easy to compute hYi+1 from hYi in O1 time: Yi+1 = 2Yi + yi+m , 2m yi hYi+1 = 2hYi + yi+m , 2m yi mod p:
One then checks if the ngerprints are equal. If they are, the algorithm claims that a match has been found and continues. To reduce the probability of failure, one can repeat the algorithm with another prime or several other primes and ouput only those who were matches for all primes tried. This is thus a Monte Carlo algorithm whose running time is On + m. This algorithm can easily be transformed into a Las Vegas algorithm. Whenever there is a potential match i.e. the ngerprints are equal, we compare X and Yi directly at a cost of Om. The expected running time is now On + m + 2 km + nm n = Okm + n, where k denotes the number of real matches. Random-3 6. Symmetry breaking This is useful in distributed algorithms, but we won't have much to say about it in this class. In that context, it is often necessary for several processors to collectively decide on an action among several seemingly indistinguishable actions, and randomization helps in this case. 7. Rapidly mixing Markov chains These are useful for counting problems, such as counting the number of cycles in a graph, or the number of trees, or matchings, or whatever. First, the counting problem is transformed into a sampling problem. Markov chains can be used to generate points of a given space at random, but we need them to converge rapidly | such Markov chains are called rapidly mixing. This area is covered in details in these notes. 2 Randomized Algorithm for Bipartite Matching
We now look at a randomized algorithm by Mulmuley, Vazirani and Vazirani 10 for bipartite matching. This algorithm uses randomness to check an identity. Call an undirected graph G = V; E bipartite if 1 V = A B and A B = ;, and 2 for all u; v 2 E , either u 2 A and v 2 B , or u 2 B and v 2 A. An example of a bipartite graph is shown in Figure 1. Figure 1: Sample bipartite graph. A matching on G is a collection of vertex-disjoint edges. A perfect matching is a matching that covers every vertex. Notice that we must have jAj = jB j. We can now pose two problems: 1. Does G have a perfect matching? 2. Find a perfect matching or argue that none exists. Both of these problems can be solved in polynomial time. In this lecture we show how to solve the rst problem in randomized polynomial time, and next lecture we'll Random-4 A B cover the second problem. These algorithms are simpler than the deterministic ones, and lead to parallel algorithms which show the problems are in the class RNC. RNC is Randomized NC, and NC is the complexity class of problems that can be solved in polylogarithmic time on a number of processes polynomial in the size of the input. No NC algorithm for either of these problems is known. The Mulmuley, Vazirani and Vazirani randomized algorithm works as follows. Consider the adjacency matrix A on graph G = V; E whose entries aij are de ned as follows: 1 aij = 1 if i; j 2 E 0 otherwise where the indices i and j correspond to vertices of the vertex sets A and B respectively. There exists a perfect matching in the graph G if and only if the adjacency matrix contains a set of n 1's, no two of which are in the same column or row. In other words, if all other entries were 0 a permutation matrix would result. Consider the function called the permanent of A, de ned as follows: Y ! n X 2 permA = ai i : permutations i=1 This gives the number of perfect matchings of the graph A represents. Unfortunately the best known algorithm for computing the permanent has running time On2n. However, note the similarity of the formula for computing the permanent to that for the determinant of A: Y ! n X detA = sign ai i : i=1 permutations The determinant can be computed in On3 time by using Gaussian elimination and in Olog2n time on On3:5 processors. Note also that: detA 6= 0 permA 6= 0 , 9 perfect matching: Unfortunately the converse is not true. To handle the converse, we replace each entry aij of matrix A with aij xij , where xij is a variable. Now both detA and permA are polynomials in xij . It follows detA 0 , permA 0 ,6 9 perfect matching: A polynomial in 1 variable of degree n will be identically equal to 0 if and only if it is equal to 0 at n +1 points. So, if there was only one variable, we could compute this determinant for n + 1 values and check whether it is identically zero. Unfortunately, we are dealing here with polynomials in several variables. So to test whether detA 0, we will generate values for the xij and check if the resulting matrix has detA = 0 using Gaussian elimination. If it is not 0, we know the determinant is not equivalent to 0, so G has a perfect matching. Random-5 Theorem 3 Let the values of the xij be independently and uniformly distributed in 1; 2; : : : ; 2n , where n = jAj = jB j. Let A0 be the resulting matrix. Then
1 Pr detA0 = 0jdetA 6 0 2 : It follows from the theorem that if G has a perfect matching, we'll nd a witness in k trials with probability at least 1 , 1=2k . In fact, this theorem is just a statement about polynomials. We can restate it as follows: Theorem 4 Let f x1; : : : ; xn be a multivariate polynomial of degree d. Let xi be independently and uniformly distributed in f1; 2; : : : ; 2dg. Then Pr f x1; : : : ; xn 6= 0jf 6 0 1 : 2 This theorem can be used for other problems as well. Instead of proving this theorem we'll prove a stronger version which can be used for the second problem, that of nding the perfect matching. Consider assigning costs cij cij 2 N to the edges i; j 2 E . De ne the cost of a matching M as: X cM = cij : Now, consider the matrix A with entries aij wij , where wij = 2cij . Then: X permA = 2cM M
i;j 2M and detA = X
M signM 2cM : If a unique minimum cost matching with cost c exists then detA will be nonzero and, in fact, it will be an odd multiple of 2c . We will prove that if we select the costs according to a suitable probability distribution then, with high probability, there exists a unique minimum cost matching. Let cij be independent, identically distributed random variables with distribution uniform in the interval 1; : : : ; 2m , where m = jE j. The algorithm computes detA and claims that there is a perfect matching if and only if detA is nonzero. The only situation in which this algorithm can err is when there is a perfect matching, but detA = 0: This is thus a Monte-Carlo algorithm. The next theorem upper-bounds the probability of making a mistake. Theorem 5 Assume that there exists a perfect1 matching in G. Then the probability
that we will err with our algorithm is at most 2 : Random-6 If a higher reliability is desired then it can be attained by running multiple passes, and only concluding that there is no perfect matching if no pass can nd one. Proof: We need to compute Pr detA = 0 . Though this quantity is di cult to compute, we can fairly easily nd an upper bound for it. As we have seen previously, Pr detA = 0 = 1 , Pr detA 6= 0 1,P
where P = Pr 9 unique minimum cost matching : Indeed, if there is a unique minimum cost matching of cost say c then detA is an odd multiple of 2c and, hence, non-zero. The following claim completes the proof. 1 Claim 6 P 2
Given a vector c, de ne dij to be the maximum value for cij such that i; j is part of some minimum cost matching. We can then draw the following inferences: 8 cij dij i; j is not part of ANY minimum cost matching c = d i; j is part of SOME minimum cost matching : cij dij i; j is part of EV ERY minimum cost matching ij ij Thus, if cij 6= dij for all i; j 9 a unique minimum cost matching M . Moreover, this matching is given by M = fi; j j cij dij g.
a 4 5 3 d b 6 c 2 8 2 e f Figure 2: Example graph for dij computations. Figure 3 shows an example of a bipartite graph with the values of cij assigned. Notice that in this graph c = 9. Consider the edge c; f . The cheapest perfect matching not containing c,f has a cost of 6+8+5 = 19. The other edges in the perfect matching containing c; f have a total Random-7 cost of 7, so dcf = 12. Thus, it is in every perfect matching. dad = 5 = 4+3+2 , 2 , 2 = cad. Thus it is in some minimum cost matching. Finally, dce = ,2 = 9 , 6 , 5 cce , so c; e is not in any minimum cost matching. Therefore, Pr unique minimum cost matching Pr cij 6= dij for all i; j = 1 , Pr cij = dij for some i; j X 1, Pr cij = dij
1 = 2: The equation in the next to last line is justi ed by our selection of m = jE j and the fact that dij is independent of cij , so that the probability of cij being equal to the particular value dij is either 21 i dij is in the m range 1; : : : ; 2m or 0 otherwise. Notice that if we repeat the algorithm with new random cij 's, then the second trial will be independent of the rst run. Thus, we can arbitrarily reduce the error probability of our algorithm, since the probability of error after t iterations is at most 1 t 2 . Also, note that, in the proof, we do not make use of the assumption that we are working with matchings. Thus, this proof technique is applicable to a wide class of problems. In order to construct a perfect matching, assume that there exists a unique minimum cost matching which we have shown to be true with probability at least 1 with cost 2 c. The determinant of A will then be an odd multiple of 2c . By expanding the determinant along row i, it can be computed by the following formula: X cij 3 2 aij detAij j 1 , m 21 m i;j 2E 2.1 Constructing a Perfect Matching where Aij is the matrix created by removing column i and row j from the matrix A See gure 3, and the sign depends on the parity of i + j . The term in the summation above will be an odd multiple of 2c if i; j 2 M and an even multiple otherwise. So, we can reconstruct a perfect matching M by letting: n o 4 M = i; j j 2cij detAij is an odd multiple of 2c : Random-8 i j Figure 3: The Matrix Aij . The matrix Aij is formed by removing the ith column and the jth row from the matrix A. Note that c can be obtained since 2c is the largest power of 2 in detA. The algorithm we have presented can be seen to be in RNC since a determinant can be computed in RNC. We can apply a similar algorithm for solving the following related matching problem: Given: A bipartite graph G, A coloring for each edge in G of either red or blue, an integer k, nd a perfect matching with exactly k red edges. However, in contrast to the problem of nding any perfect matching, it is not known whether this problem is in P or even NP-complete. For this problem, de ne the entries aij of the matrix A as follows: 8 0 if i; j 62 E 5 aij = wij if i; j is blue : wij x if i; j is red where x is a variable. Both the permanent of A and the determinant of A are now polynomials in one variable, x, and we wish to know the coe cients ck of xk . If all wij were 1, ck would represent the number of perfect matchings with exactly k red edges. If there does exist a perfect matching with k red eges, then Prck = 0 1 by the 2 same argument we derived the probability that detA = 0 when a perfect matching Random-9 exists, since we can always decompose the determinant into a sum of products of matrices with xk . We can now compute all the ck by computing the determinant of A in n + 1 di erent points and interpolating from that data to compute the coe cients. 3 Markov Chains
A lot of recent randomized algorithms are based on the idea of rapidly mixing Markov chains. A Markov chain is a stochastic process, i.e. a random process that evolves with time. It is de ned by: A set of states that we shall assume to be nite 1; : : : ; N . A transition matrix P where the entry pij represents the probability of moving to state j when at state i, i.e. pij = Pr Xt+1 = j j Xt = i , where Xt is a random variable denoting the state we are in at time t.
0.3 0.4 2 0.2 0.1 5 3 1 4 Figure 4: A Markov Chain. Figure 4 partially illustrates a set of states having the following transition matrix: 0 1 0 0 1 0 0C B 0:4 0:3 0:1 0 0:2 C B B C 6 P = B 0 0:5 0 0 0:5 C B B 0:2 0:8 0 0 0 C C @ A 0:1 0:1 0:1 0:1 0:6 The transition matrix P satis es the following two conditions and any such matrix is called stochastic":
P 0 X pij = 1 for all i.
j Random-10 ...
View Full Document