This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Derandomizing Approximation
Algorithms for Hard Counting
Problems
Michael Luby International Computer Science Institute, Berkeley, CA
and University of California at Berkeley "‘ TR—95—069
December 1995 ‘Research supported in part by National Science Foundation operating grant OCR9304722 and NCR
9416101, and United StatesIsrael Biuational Science Foundation grant No. 9200226 1 Introduction This paper is a (biased) survey of some work in derandomization of randomized algorithms.
Perhaps the most famous open problem in Computer Science is whether or not NP is equal
to P, i.e., are efﬁcient nondeterministic algorithms more powerful than efﬁcient determinis
tic algorithms. An analogous question is whether nor not RP is equal to P, i.e., are efﬁcient
randomized algorithms more powerful than efﬁcient deterministic algorithms. These two questions formally are quite similar, as the only difference between the def
initions of NP and RP are that for an NP language L, there is only required to be one witness for each 55' E L, whereas for RP a constant fraction of strings are required to be
witnesses. Nevertheless, it is my belief that NP is more powerful than P, and on the other hand RP is equal to P. Paradoxically, there is some hope that an eventual proof that shows that
NP is not equal to P will be strong enough to also show that RP is equal to P. Deﬁnition (function ensemble): Let f : {0,1}‘[“) —> {0,3303} denote a function
ensemble, where t(n) and €(n) are polynomial in n and where f with respect to n is a
function mapping {0, IF“) to {0,1}E(”). ' do Deﬁnition (Ptime function ensemble): Call f : {0,1}*(”3 x {0,1}£(”3 —) {0, l}m(”l is
a T(n)time function ensemble if f is a function ensemble and there is a Turing machine
such that, for all z 6 {0,1}‘("), for all y E {0,1}E[n), f(x,y) is computable in time Call 3' is a Ptime function ensemble if T(n) = no”). in Deﬁnition (language): Let L : {0,1}”’ —> {0, 1} be a functiou ensemble. One can view
L as a Zenguuge, where, fOr all e: 6 {0,1}”, 3: E L if L(x) = 1 and a" E L if L(:r) = 0. in In the following, deﬁnitions of a variety of cemplexity classes with respect to a language
L as just deﬁned are made. Deﬁnition (P): Say that L E P if L is a Ptime function ensemble. in Deﬁnition (NP): Say that L E NP if there is a Ptime function ensemble f : {0,1}” x
{0, 1W“) —) {0, 1} such that for all z e {0,1}n, :r E L implies PrYEu{D!1}¢(n)[f(:r, Y) = 1] > 0.
m e' L implies PrYEu{0I1}£(13) [f(s:, Y) = 1] = 0. .1. Deﬁnition (RP): Say that L E RP if there is a constant e > 0 and a Ptime functiOn
ensemble f : {0,1}n X {0, 1}“1'33 —> {0,1} such that for all a: 6 {0,1}”, :t' E L implies PrY6u{0,1}gm[f(m,Y) = 1]
:1: g L implies PI'YEumJFm) = 1] II N 5.
0. ii There are a few philosophical reasons to want to show that RP = P, e.g., does random
ization help in the general context of efﬁcient computation? There are also some practical
reaSOns, e.g., in practice if one uses a Monte Carlo algorithm then there should be great
concern about where the source of random bits comes from. In practice, what one typically
does is to take a very small number of random bits and stretch them into a large number using a simple pseudorandom generator, e.g., a linear congruential generator, and then use
these bits as the source of randomness for the Monte Carlo simulation. The problem in
practice is that there is no guarantee that these pseudorandorn bits are randOm enough to
make the Monte Carlo simulation behave as if though they were truly random. In practice
sometimes the anewer turns out to be that this is good enough, sometimes it has turned
out to be not good enough, and in many cases the answer is not known. One of the moti
vations behind the work on derandomization of randomized algOrithrns is to put this entire
question on a much ﬁrmer scientiﬁc footing, i.e., to be able to say authoritatively which
pseudorandom generators can be prove.ny used for which Monte Carlo algorithms. 1.1 Randomness and Pseudoranclomness The notion of randomness tests for a string evolved over time: from set—theoretic tests
to enumerable [24, Kolmogorov], recursive and ﬁnally limited time tests. There were s0me preliminary works that helped motivate the concept of a pseudorandom generator including
[38, Shamir]. [9, Blum and Micah] introduce the fundamental concept of a pseudorandom generatOr
that is useful for cryptographic (and other) applicatiOns, and gave it the signiﬁcance it has
today by providing the ﬁrst pr0vable construction of a pseudorandom generator based on the
conjectured diihcrilty of a wellknown and wellstudied computational problem, the discrete
log problem. [41, Yao] introduces the new standard deﬁnition of a pseudorandOm generator,
and shows an equivalence between the this deﬁnition and the next bit test introduced in
[9, Blurn and Micah]. The standard deﬁnition of a pseudorandorn generator introduced
by [41, Yao] is based on the fundamental concept of computational indistinguishability
introduced previously in [18, Goldwasser and Micali]. [41, Yao] also shows how to construct
a pseudorandom generator from any oneway permutation. Another important observation of [41, Yao] is that a pseudorandom generator can be
used to reduce the number of random bits needed for any probabilistic polynomial time
algorithm, and this shows how to perform a deterministic simulation of any polynomial
time probabilistic algorithm in subexponential time based on a pseudorandom generator. The results on deterministic simulation were subsequently generalized in [10, BOppana and
Hirschfeld]. The robust notion of a pseudorandom generator, due to [9, Blum and Micah], [41,
Yao], should be contrasted with the classical methods of generating random looking bits as
described in, e.g., [23, Knuth]. In studies of classical methods, the output of the generatOr
is considered good if it passes a particular set of standard statistical tests. The linear
congruential generator is an example of a classical method for generating random looking
bits that pass a variety of standard statistical tests. However, [11, Boyar] and [25, Krawczyk] iii show that there is a polynomial time statistical test which the output from this generator
does not pass. 2 Circuits A convenient way to view computation is via boolean circuits. Deﬁnition (cf,£‘,§*5(”}): For all n e N, let cams“) be the set of all ermine with an) El?!)
boolean input variables 2: = (2:1, . .. ,2g(n]) of depth d(n) with a total of at most 5(a) gates.
A circuit C E Cgég‘SW consists of “and”gates and “or”gates, where each gate is allowed unbounded fanin. C consists of d(n) levels of gates, where ali gates at a given level are
the same type. All the gates at levei 1 have as inputs any mixture of variables and their
negations. For all i E {2, . . . , d(n)}, all gates at level i receive their inputs from the gates
at level i — i. The set of gates at level (ﬂu) are considered to be the output of C. J For example, let f : {O,1}” >< {0,1}£("') —) {0,1} be the Ptime function ensemble
associated with an RP language L. One can think of f as a family of boolean circuits as
follows. For each u, there are 2” circuits in the family with £(n)bit inputs, one circuit
for each of the 2“ possible values of a” E {0, 1}”. The circuit C; associated with the input
a: computes the value of f(:r,y) on input 9 E {0,1}£("}. The circuit Cm can be derived
frorn the description of f and from 3:. Since f is a Ptime functiOn ensemble, without
loss of generality one can say that Ca, consists of at most a polynomial number pm) of
alternating levels of “and” and “or” gates, with a single output gate at the bottOIn level.
The property that C}, has is that if :r E L then Pry[Cz(Y) = 1] 3 1/2 and if a: E L then
inn/[04m = 1] = 0, where Y ER {0, 1W3. One Can view this cirCuit family for L as a subfamily of (3%)”). In terms of circuits, the approach to derandomization explored in this survey is to
construct a pseudorandom generator 9 that can be used as the source of randomness for all C E C. This approach was initiated by [9, Blurn and Micah] and [41, Yao]. Deﬁnition (epseudorandom generator for a circuit family G): Let g : {0, nil”) —)
{0,1}” be a Ptirne function ensemble, where ﬁn) < it. Let C be an inﬁnite family of
circuits, and let 0 E C be a circuit with n. inputs. The distinguishing probability of G for
g is 6 = [atom = 11 * plates» = 11L where Y ER {0, 1}“ and Z ER {0, IV”), in which case 9 is said to e—approximate C' for any
5 > 6. Say that g is a e—pseudorandorn generator for C if g s—approﬁrnates O for all C E C.
Say that C has timesuccess ratio 8(a) for 9 if the minimum over all circuits 0 E C with n
inputs of the ratio of the number of gates in 0 divided by the distinguishing probability of
C is at least 4. Given g, the derandomization for the RP circuit family described above is straightfor
ward: For any :1: E {0,1}”, one can approximate the fraction of the 23”} inputs on which
Cm outputs 1 as follows. Let g : {0,1}"(“') —> {0,1}£{”) be a 1/2—pseudoraudom generator iv for Oggl’pm). For all z E {0,1}"{”l, compute Cx(g(z)), and if any of the answers is 1 then
conclude that as E L, and otherwise conclude that x E’ L. Note that by the properties
of g this procedure is guaranteed to correctly classify z with respect to membership in L.
The smaller Mn) is in relationship to n the stronger the pseudolrandom generator 9. For example, if rte) is O(log(n)) then the entire procedure runs in deterministic polynomial
time, showing that HP = P. One way to view the ab0ve approach is to ﬁnd a small set S of €(njbit strings such
that the average value of 041;) over 3; E S is close to the average value of Cato) over all
y E {0,1}£["’). In the above, 3 can be viewed as the set {9(2) : z E {0,1}"(”l}. The other
important ingredient to the derandomization is that the set S can be efﬁciently enumerated.
This is true in the above approach because 9 is a Ptirne functiOn ensemble, and thus 8 can
be enumerated simply by sequencing through all z E {0, 1PM) and computing 9(2). Suppose one is only interested in the property that S be small and drop the (crucial)
requirement that .5' is easy to enumerate, Then, it is not hard to see that there is such a
small set 5'. To see this, suppose that one chooses randomly a set S of size n + 1. If .r E’ L
then, independent of S, I is correctly classiﬁed. On the other hand, if :c E L, then .r is
incorrectly classiﬁed with probability at most Tin“). Since there are at most 2” such 3:, the
probability that there is an z E {0, 1}” that is correctly classiﬁed is at most 1/2. Thus, there
is a set S of size n + 1 that correctly classiﬁes all :1: E {0, 1}” with respect to membership
in L. This is the approach that [1, Adleman] tool; to show that RP C; P/poly, i.e., in a
non—uniform sense randomization is no more powerful than deterministic computation. The
problem with this approach is that there is no clue of how to efﬁciently generate the set S
in deterministic polynomial time. Thus, the uniform version of the “RP = P?” question is
far from resolved. 3 Derandomizing Particular Algorithms In some applications, the goal is to completely derandomize the algorithm to produce a
deterministic solution. However, in other cases the basic goal is only to drastically reduce
the number of random bits used by the randomized algorithms, e.g., reduce from (9(a) bits
to O(10g(n)) bits. The philosophical and practical import of this is that it is hard in practice
to produce high quality independent random bits at a high rate. 3.1 Pairwise Independence One of the key ideas in derandomizing randomized algorithms is the observation that sorne
times it is the case that the randomized algorithms works just as well if there is not full
independence between the random variables. One special but important case is when pairs
wise independence between the randOm variables suﬁces. Consider a set of random variables indexed by a set U, La, {2, : i E U}, where Z, E T.
For ﬁnite t = T, a uniform distribution assigns Pr[Zz = d] = 1ft, for all x E U, or E T. If
this distribution were furthermore pairwise independent, one would have: for all m 79 y E U, for all a, £1 E T, Pr[Zz = a, z, = a] = Pr[z,, = a]  Pr[z,, = 51:1/9. This is not the same as complete independence, as evidenced by the following set of three
pairwiseindependent variables (U = {1,2, 3}, T = {0, 1}, t = 2): Each row 3 can be thought of as a function as : U > T. Let S be the index set for these
functions, where in this case S = {0, 1}2. For all x 75 y E U, for ail oz, 6 E T, saws) = o: A lists) = s] = 1/4 = 1/152 (Notice in particular that Prseﬂs[hs(x) : 31509)] = 1/2 = l/t.) Any set of functions
satisfying this condition is a 2universa1 family of hash functions. Deﬁnitions and explicit constructions of 2universal hash functions were ﬁrst given by
[40, Carter and Wegman] . One simple way to construct a general family of hash functions
mapping {0,1}” —> {0,1}” is to let S = {0, 1}” X {0, 1}”, and then for all s = {(1,6) E 8,
for all a: E {0,1}n deﬁne 313(3) = as + b, where the arithmetic operations are with respect
to the ﬁnite ﬁeld GF[2“]. Thus, each h, maps {0,1}” —> {0,1}“ and S is the index set of
the hash functions. For each s = (a, b) E 5, one can write: it, (rs) = as 1 [3
315(9) 9 1 b
When :1: aé y, the matrix is nonsingular, so that for any a, y E {0, 1}”, the pair (.543), h, takes on all 22" possible values (as s varies over all 5 Thus if s is chosen uniformly at
random ﬁ'om S, then (h5($), hs(y)) is also uniformly distributed. One can view 8 as the set of points in a sample space on the set of random variables {Zz :
a: E {0,1}”} where Zz(s) = h3(m) for all 3 E 8. With respect to the uniform distribution
0n 5, these random variables are pairwise independent, i.e., for all s: # y E {0,1}", for all
a, x3 E {0,1}” ,gsiztts) = a A Zita) = 5] = sggstzxei = a]  sggsizyts) = 6] = 1/22“. To obtain a hash function that maps to k < 11 bits, we can still use 8 as the function
index family: The value of of the hash function indexed by s on input m is obtained by
computing h,(:c) and using the ﬁrst in bits. The imporant properties of these hash functions
are: vi o Pairwise independence. I Succinctness — each function can be described as a 2nbit string. Therefore, randomly
picking a function index requires only 27:. random bits. I The function h5($) can easily be computed (in LOGSPACE, for instance) given the
function index .9 and the input a. In the sequel, unless otherwise speciﬁed, references to pairwise independent hash func
tions refer to this construction, and 5 denotes the set of indices for the hash family. 3.2 Applications to Particular Algorithms The original applications described in [40, Carter and Wegman] were applications to hash
ing. Subsequently 2universal hashing has been applied in surprising ways to a rich variety
of problems. Fer a survey of some of these applications, see [31, Luby and Wigderson]. Consider, for example, the MAXCUT problem: given a graph G = (V, E), ﬁnd a two
coloring ofthe vertices x : V —) {0,1} so as to maximize C(X) = y) E E : 3((55) 7E The following is a description of a solution to this problem that is guaranteed to produce a
out where at least half the edges cross the cut. If the vertices are colored randomly (0 or 1 with probability 1/2) by choosing X uniformly
from the set of all possible 2'“ colorings, then: E[c(x)l = Z Prlx($)¢x(y)l = % (ayleE Thus, there must always be a cut of size at least Let S be the index set for the
hash family mapping V —> {0,1}. Since the summation above only requires the coloring
of vertices to be pairwiseindependent, it follows that E[c(hs)] = %1 when 3 ER 8. Since
5 = IV2, one can deterministically try it, for all s E S in polynomial time (even in the
parallel complexity class N 0), and for at least one s E S, h, deﬁnes a partition of the nodes
where at least 1% edges cross the partition. This derandomization approach was developed and discussed in general terms in the
series of papers [13, Cher and Goldreich], [27, Luby], [5, Alon, Babai and Itai]. There, the
approach was applied to derandomize algorithms such as witness sampling, a fast parallel
algorithm for ﬁnding a maximal independent set, and other graph algorithms. This work
was extended to more efﬁcient parallel approaches in [28, Luby], and generalized in [8, Berger and Rompel] and [32, Motwani, Naor and Naor] to apply to parallel solutions of
other combinatOrial problems. Other examples include the use of ﬁvewise independent random variables to choose
the pivot elements for quicksort while still maintaining the O(nlog(n)) running time [22,
Karloff and Raghavan]. Extending the werk of [22, Karloff and Raghavan], [33, Mulmuley] vii reanalyzes many classical computational geometry problems, including trapezoidal trian
gulations, convex hulls, Voronoi diagrams, and hidden surface removal. The new analysis shows that only limited independence between the random variables sufﬁces for the Ian
dornized algOrithm. Other ideas that have been used to derandomize particular algorithms include using
expander graphs, and combining expander graphs with pairwise independence techniques.
An example of this approach is [21, Karger and Motwani]. 4 Derandomizing Depth Two Circuits In this section depth two unbounded fanin boolean circuits are c0nsidered. The circuit
subfamily of GrimmH with a ﬁrst layer of m = m(n) “an ” gates all feeding into a single
output “or” gate can be viewed as a formula F in disjunctive normal form [DNF) on n
variables with m clauses. Let t be the maximum length of a clause in the formula, and
let Pr[F] denote the probability that a random, independent and uniformly chosen truth
assignment to the variables satisﬁes F. The problem of computing Pr[F] exactly is known
to be #Pcomplete [39, Valiant]. On the other hand, for many applications a good estimate of Pr[F] is all that is needed. [29, Luby and Velickovié] introduces several ideas which can be combined with known
results to obtain approximation algorithms for the DNF problem. The ﬁrst idea is that
of stmﬂower reduction in a way similar to the way that [37, Razborov] uses it to prove
exponential lower bounds for the size of the smallest monotone circuit for ﬁnding the biggest
clique in a graph. Given a DNF formula F, one looks for a large collection of clauses
which form a sunﬂower, i.e., the intersection of any two distinct clauses in this collection
is the same. Then, all the clauses in this collection are replaced by their common pairwise intersection, thus obtaining another formula which has probability of being satisﬁed close
to that of F. This procedure is repeated until no large sunﬂowers can be found, obtaining a new
formula F“. A themem of [14, Erdds and Redo] implies that at the end of this procedure
there are not too many clauses in F'. Because F’ is so small, it turns out to be easier to
appr0ximate Pr[F’], and because of the properties of the reduction this approximation is
also a good approximation of Pr[F]. This reduction can be combined with the algorithm
from [36, Nisan and Wigderson] (using the improvement based on an observation from [30,
Luby, Velickovié and WigdersouD to produce a polynomial time eapproximation for Pr[F]
when t = 0(log1i8(nm)) and e is not too small. The second approach relies on special constructions of small probability distributions.
Given a distribution 1') let Prp[F] denote the probability that F is satisﬁed by a truth
assignment chosen according to ED. The goal is to ﬁnd a distribution ID with a small sample
space such that Pry [F] is close to Pr[F]. Then, Pr~D[F] can be calculated by exhaustive
consideration of the points in the space. An easy counting argument shows that such a dis tribution exists. The results of [29] can be viewed as progress towards ﬁnding a polynomial
time construction of...
View
Full Document
 Spring '08
 Staff
 Algorithms

Click to edit the document details