Unformatted text preview: Automaticity IV: Sequences, Sets, and Diversity
Je rey Shallit Department of Computer Science University of Waterloo Waterloo, Ontario, Canada N2L 3G1
[email protected] June 9, 1996
This paper studies the descriptional complexity of (i) sequences over a nite alphabet; and (ii) subsets of N (the natural numbers). If (s(i))i 0 is a sequence over a nite alphabet , then we de ne the kautomaticity of s, Ak (n), to be the smallest possible number of states in any deterministic nite s automaton that, for all i with 0 i n, takes i expressed in basek as input and computes s(i). We give examples of sequences that have high automaticity in all bases k; for example, we show that the characteristic sequence of the primes has kautomaticity Ak (n) = (n1=43) for all k 2, thus making quantitative the classical s theorem of Minsky and Papert that the set of primes expressed in base2 is not regular. We give examples of sequences with low automaticity in all bases k, and low automaticity in some bases and high in others. We also obtain bounds on the automaticity of certain sequences that are xed points of homomorphisms, such as the Fibonacci and ThueMorse in nite words. Finally, we de ne a related concept called diversity and give examples of sequences with high diversity. Abstract 1 Introduction and De nitions
In this paper, I study the descriptional complexity of (i) sequences over a nite alphabet; and (ii) subsets of N (the natural numbers). In 1972, Cobham 5] introduced the notion of what is now called a kautomatic sequence. (In the literature, one can also nd the terms krecognizable sequence and uniform tag sequence.) Roughly speaking, a sequence (s(i))i 0 over a nite alphabet is kautomatic if and only if s(i) is a nitestate function of the basek representation of i. However, most sequences are not kautomatic for any k. Instead of simply saying that a sequence is not kautomatic, we can measure quantitatively how \close" a sequence is to
Research supported in part by a grant from NSERC. 1 being kautomatic using the concept of automaticity studied in previous papers of the author and coauthors 26, 27, 20, 10]. In addition to its evident intrinsic interest, automaticity has proved useful in obtaining nontrivial lower bounds in computational complexity theory; see 7, 8, 16, 17]. More formally, de ne a deterministic nite automaton with output (DFAO) M to be a 6tuple, (Q; ; ; q0; ; ), where Q is a nite set of states, is a nite input alphabet, q0 is the start state, and is a nite output alphabet. The map : Q ! Q is called the transition function, and is extended in the obvious way to a map : Q ! Q. The map : Q ! is the output function. On input w 2 , the machine M outputs the single symbol ( (q0; w)). For more on these concepts, see, for example, 15]. Let k be an integer 2 and de ne k = f0; 1; : : : ; k ? 1g. If w 2 k , then by w]k I mean w evaluated as a basek integer, that is, if w = w1w2 wr , then w]k = P1 i r wr?i+1ki?1. If n 0 is an integer, then by (n)k I mean the default basek representation of n  that is, one not containing leading zeroes. Note that (0)k = , the empty string. Suppose (s(i))i 0 is a sequence over the nite alphabet . If there exists a DFAO M such that for all i 0, we have s(i) = ( (q0; wR)) for all w 2 k such that w]k = i, then the sequence (s(i))i 0 is said to be kautomatic. (Here wR is the reverse of the string w.) Note that the slightly awkward de nition results from the problem of \leading zeroes" input, and our convention that the machine M reads the input number starting with the least signi cant digit. Here is one alternate de nition of kautomatic sequences. De ne the k ber of the sequence (s(i))i 0 at a to be Fk (s; a) = f(n)k : s(n) = ag: Then Fk (s; a) is a regular set for all a 2 if and only if the sequence (s(i))i 0 is kautomatic. Another alternate de nition of kautomatic sequences can be given in terms of a set called the kkernel. Let (s(n))n 0 be a sequence over a nite alphabet. The kkernel of (s(n))n 0, which we denote by Ksk , is de ned as follows: Ksk = f(s(kim + a))m 0 : i 0; 0 a < kig: (1) Eilenberg 9, Proposition 3.3, p. 107] proved that a sequence is kautomatic if and only if its kkernel is nite. Given a sequence (s(i))i 0, we can de ne its kautomaticity Ak (n) as follows: Ak (n) is s s the smallest possible number of states in any DFAO M = (Q; ; ; q0; ; ) such that for all i with 0 i n, we have s(i) = ( (q0; wR)) for all w 2 k with w]k = i. We emphasize that the automaton is fed with the digits of i, starting with the least signi cant digit. This convention is actually important to specify, since it is known that there are languages of low automaticity whose reversal has high automaticity; see 10]. There is another way to de ne kautomaticity. Suppose we de ne the ntruncated kkernel of the sequence s, as follows: Ksk (n) = f(s(kim + a))0 m (n?a)=ki : i 0; 0 a < ki g: The ntruncated kkernel consists of nite sequences. Call two such sequences v; w 2 Ksk (n) ndissimilar if there exists a position j for which both v(j ) and w(j ) are de ned and v(j ) 6= 2 w(j ). (Note that under this de nition, if v is a pre x of w, then v and w are similar.) Then Ak (n) is de ned to be the maximum number of pairwise ndissimilar sequences in Ksk (n). It s is not hard to see that this de nition is identical to the previous one; see 27]. Note that the condition m (n ? a)=ki is equivalent to kim + a n; in other words, the variable that is bounded by n is not m but the \true" variable kim + a. The following basic results on automaticity are easy to prove 27]: Proposition 1 Let (s(i))i be a sequence over a nite alphabet . Then (a) Ak (n) Ak (n + 1) for all n 0; s s (b) Ak (n) = O(1) if and only if s is kautomatic; s (c) There exists an absolute constant c such that if s is not kautomatic, then Ak (n) s c logk n for in nitely many n. (d) For any sequence s we have Ak (n) = O(n= log k n). s
0 As parts (b) and (c) of this theorem show, if a sequence is not kautomatic, then its kautomaticity must be greater than c logk n in nitely often. This suggests studying sequences that are not kautomatic, but which are \as close as possible" to kautomatic. We say that a sequence (s(i))i 0 is kquasiautomatic if Ak (n) = O(log n). We then have the following s theorem, whose proof is easy and is omitted: Proposition 2 A sequence (s(i))i 0 is kquasiautomatic if and only if it is ke quasiautomatic for all e 1. So far we have discussed the kautomaticity of sequences, but the same terminology can be used for sets of nonnegative integers. We say a set S N is kautomatic if its characteristic sequence ( S (n))n 0 is kautomatic. Similarly, if S is a set, then by Ak (n) we S mean Ak S (n). 2 Classical sets with high automaticity in all bases
In this section, we examine two classical sets (the primes, the squarefree numbers) and show that their characteristic sequences have high kautomaticity (that is, (n ) for some > 0) in all bases k 2. (By f = (g) we mean there exist positive constants c; n0 such that f (n) cg(n) for all n n0.) For the primes, our results can be viewed as making quantitative the classical result of Minsky and Papert 19] that the primes expressed in base 2 cannot be accepted by a nite automaton. Our method is based on the following useful lemma: Lemma 3 Let (s(i))i 0 be a sequence over a nite alphabet , and suppose that there exists a constant d such that for all r; a; b with r 2, 1 a; b < r, a 6= b, and gcd(r; a) = gcd(r; b) = 1, there exists a nonnegative integer m = O(rd ) such that s(rm + a) 6= s(rm + b). Then Ak (n) = (n1=(d+1)=(k log log n)) for all k 2, where the implied constant in the big does s not depend on k. 3 Proof. Since m = O(rd ), there exists a constant c such that m crd ? 1 for all r 2. Let i = b(logk n ? logk c)=(d + 1)c. Then
1 n 1=(d+1) < ki n 1=(d+1): kc c Put r = ki. It follows that there exists m ckid ? 1 such that s(kim + a) 6= s(kim + b). However, kim + a < (ckid ? 1)ki + ki = cki(d+1) c (n=c) = n, and the same bound holds for kim + b. It follows that the two subsequences (s(kit + a))t 0 and (s(kit + b))t 0 are ndissimilar. Since a; b were arbitrary integers relatively prime to r, we know that there are at least '(ki) pairwise ndissimilar sequences, where ' is Euler's phifunction. By 21, Theorem 15], we know that '(n) n=(5 log log n) for n 3. Hence '(ki) ki 5 log log ki (1=k)(n=c) = d : 5 log log(n=c) = d
1 ( +1) 1 ( +1) Thus Ak(n) = (n1=(d+1)=(k log log n)). s We rst examine the automaticity of the characteristic sequence of the primes. We need the following lemmas.
2 3 Lemma 4 For all x 1 we have Qx<p x p > ex= , where the product is over primes only. Proof. Let #(x) = Pp x log p, where the sum is over primes only. We know that #(x) < 1:000081x for x > 0 22, p. 360], and #(x) :84x for x 101 21, Theorem 10]. It follows that Px<p 2x log p > 1:68x ? 1:000081x > x=3 for x 101=2. Now it is easily veri ed by computer or hand Q calculation that Px<p 2x log p > x=3 for 1 x < 101=2. It follows that x<p 2x p > ex=3 for all x 1.
11 2 Lemma 5 Given integers k; l 1 with gcd(k; l) = 1, there exists a prime p = km + l with
m = O(max(k; l)
= Proof. Choose x = max(1; l=k; 3 log l). Then from the previous lemma we have Qx<p x p > l, so there exists a prime q=l with x < q 2x. Now q > l=k, so kq > l, and gcd(kq; l) = j
2 ). The constant in the bigO is independent of k and l. 1. Hence by HeathBrown's version of Linnik's theorem 14], there exists a prime p l (mod kq) with p = O((kq)11=2). Since q 2x = 2 max(1; l=k; 3 log l), we have p = O(max(l11=2; (k log l)11=2; k11=2)). Hence m = (p?l)=k = O(max(l11=2k?1 ; k9=2(log l)11=2); k9=2), and the result follows.
165 4 Lemma 6 Given integers r; a; b with r 2, gcd(r; a) = gcd(r; b) = 1, 1 a; b < r, and a = b, there exists m = O(r = ) such that rm + a is prime and rm + b is composite. 6
4 Proof. We use a trick suggested by papers of Hartmanis and Shank 12] and Allen 1]. By HeathBrown's version of Linnik's theorem 14], there exists m0 = O(r9=2) such that p = rm0 + a is a prime. De ne q = rm0 + b. Then q = O(r11=2). If q is composite, we're done, and m0 = O(r9=2). Otherwise, assume q is prime. Now, in Lemma 5, take k = qr and l = qr + p. Then there exists m1 = O((qr + p)11=2) = O(r143=4) such that (qr)m1 + (qr + p) is prime. However, t = (qr)m1 + (qr + q) is composite, since q j t and q < t. Take m = qm1 + q + m0. Then m = O(r165=4).
1 43 integers k 2. Proof. Combine Lemmas 3 and 6. We note that the constant 1=43 in Theorem 7 is not optimal. Indeed, the constant 11=2 in Lemma 5 is almost certainly not optimal. Wagsta 31] has provided a heuristic model that predicts that the least prime congruent to l (mod k) is O('(k)(log k)(log '(k))). If this prediction were true, it would improve the constant 1=43 in Theorem 7 to 1=(2 + ). We now turn to providing a lower bound on the kautomaticity of the squarefree numbers. Recall that a number n is said to be squarefree if t2=n for all integers t > 1. j Lemma 8 Let (si)i 0 be de ned as follows: si = 1;; if i is squarefree; 0 otherwise. Then for all > 0, and r; a; b such that r 2, 1 a < r, and 0 b < r with gcd(a; r) squarefree and a 6= b, there exists an m = O(r13=9+ ) such that rm + a is squarefree and rm + b is not squarefree. Proof. Let q be the least prime not dividing rjb ? aj. Since rjb ? aj < r2, by the prime number theorem we have q = O(log r2) = O(log r). Now rk + b 0 (mod q2) if and only if k = ?br?1 (mod q2). Let c be such that 0 c < q2 and c ?br?1 (mod q2). Consider the arithmetic progression ((rq2)m + (rc + a))m 0: We have gcd(rq2; rc + a) is squarefree, because any prime divisor of rq2 and rc + a must be a divisor of r or q2. But t j r and t j rc + a implies t j a, and we know gcd(r; a) is squarefree by hypothesis. On the other hand, rc + a 0 (mod q) implies that rc ?a (mod q). But rc ?b (mod q), so a b (mod q), a contradiction since q=a ? b. Hence q2= gcd(rq2; rc + a). j j 13=9+ Then, by a result of HeathBrown 13], there exists an m0 = O(r ) such that (rq2)m0 + (rc + a) is squarefree. Take m = q2m0 + c. Then rm + a is squarefree, but rm + b is divisible by q2. Theorem 7 The set P of prime numbers has kautomaticity Ak (n) = (n = ) for all P k 2. Theorem 9 The set S of squarefree numbers has kautomaticity Ak (n) = (n = ) for all S
25 Proof. Apply Lemma 3 with d = 13=9 + . Again, the constant 2=5 in Theorem 9 is not optimal. 5 3 A set with low automaticity in all bases
Theorem 10 De ne a(1) = 1, and a(i + 1) = a(i) + Q b i b b b a i c for i 1. Then the set A = fa(i) : i 1g is not kautomatic, but is kquasiautomatic for all k 2.
2 +2 1+ log () In this section I give an example of a sequence that is kquasiautomatic for all k 2. The sequence (a(i)) begins 1; 3; 39; 331815; 114126085737676800331815; : : : Proof. First, we note the following observation. Suppose there exists an in nite string w = w w w over k = f0; 1; : : :; k ? 1g such that all but nitely many members s of a set S have the \pre x property", that is, (s)R is a pre x of w. Then Ak (n) = O(log n). k S To see this, note that in this case we can write S = S S , where S is nite and S has the pre x property. To build an automaton that accepts all the basek representations of elements of S \ 0; n], we simply create a linear chain of nodes, with transitions between them labeled with the symbols of w. The accepting states correspond to the members of S , and of course we need a single dead state in addition to handle the other transitions. The resulting automaton has log n + O(1) states. Since S is nite, we can accept it with a nite automaton. The result now follows because we can accept S S using a direct product construction. The construction of the sequence (a(i))i should now be clear. For bases k 2, the sequence has the property that (a(i))R is a pre x of (a(i +1))R provided i k ? 1. Hence the k k observation of the previous two paragraphs applies, and the automaticity of A is O(log n) for all k 2. Note, however, that the constant in the bigO depends on k. To show that A is not kautomatic for any k, it su ces to show that limi!1 a(i)=ki = 1. But this follows, since from the recurrence we have a(i) i!.
0 1 2 1 2 1 2 2 2 2 1 1 2 1 4 Automaticity of xed points of homomorphisms
Let ' be a homomorphism from for some x 2 , then to
2 . If there is a symbol a 2
3 such that '(a) = ax y = ax'(x)' (x)' (x) = jlim 'j (a) !1 is a xed point of '; that is, '(y) = y. If further ' is nonerasing (i.e., '(b) 6= for all b 2 ), then y is in nite. If j'(b)j = k for all b 2 , then ' is said to be kuniform. A 1uniform homomorphism is called a coding. A wellknown theorem of Cobham 5] states that (s(i))i 0 is the image (under a coding) of a xed point of a kuniform homomorphism if and only if (s(i))i 0 is kautomatic. A natural problem is to determine the automaticity of xed points of nonuniform homomorphisms. In particular, are there xed points of homomorphisms which are quasiautomatic, but not automatic? This question was raised by the author in 1992 in the context 6 of the xed point (tn)n 0 of the homomorphism 1 ! 121; 2 ! 12221. The sequence (tn) and its relationship to the classical ThueMorse sequence was studied by Allouche et al. 2]. Computation strongly suggests that (tn) is 2quasiautomatic. For example, t16n+1 = t64n+1 for 0 n 1864134, but not for n = 1864135. Although we are not yet able to prove the 2quasiautomaticity of (tn), it is possible to prove that it is not 2automatic 24]. (This last result was, according to J.P. Allouche (personal communication), also proved by M. Mkaouar.) We now give three examples. First, we exhibit a homomorphism whose xed point is 2quasiautomatic, but not 2automatic. Next, we give a homomorphism whose xed point is 2automatic, but not kquasiautomatic for any odd k. Finally, we use some simple theorems of Diophantine approximation to exhibit a homomorphism whose xed point is not kquasiautomatic for any k 2. Theorem 11 Let '(c) = cba, '(a) = aa, and '(b) = b. Let (si)i be the xed point of ' beginning with c. Let X = f2j + j : j 0g. Then (a) s = c and si = b if and only if i 2 X ;
0 0 (b) (si )i 0 is not 2automatic. (c) (si )i 0 is 2quasiautomatic. Proof. Part (a) follows easily from the observation that 'r (c) = cbaba ba ba ba r? : For part (b), it su ces to show that L = F (s; b) is not a regular set. It is easy to see
2 4 8 2 1 that 2 L = f1 0n?b 2 nc? (n) : n 1g f1g: Now a routine argument using the pumping lemma 15] completes the proof. Finally, for part (c), it su ces to construct an automaton with output with O(log n) states that generates the terms of the sequence (si) correctly for all i n. We sketch the construction of such an automaton, leaving the details to the reader. Let n = i n i. If L is a language, we say that L0 is an nthorder approximation to L if L \ n = L0 \ n . The basic idea of our construction is that it su ces to concentrate on LR = F (s; b)R and create an automaton accepting a (1 + blog nc)th order approximation to LR. This is easy, since strings in LR begin with a short sequence of bits which are followed by many zeroes and then a 1. The state set consists of four parts. The rst part is A = fqw : w 2 (0+1) b 2 2 nc g. This part of the automaton forms a binary tree that can handle all possible strings of length blog log nc + 1. The transitions between states in the rst part are given by (qw ; e) = qwe for jwj blog log nc and e 2 f0; 1g. The output function for the states in A is given by (q ) = c, (qw ) = b if wR] 2 X , and (qw) = a for all other w. The second part of the automaton consists of a linear chain of states, B = fpi : 0 i < blog ncg. The transitions between states in the second part are given by (pi; 0) = pi? for 2 i < blog nc, (p ; 1) = p , and (p ; 0) = p . The output function for these states is (pi) = a for i 6= 0, and (p ) = b.
log 1 2 0 2 2 log log +1 2 2 2 2 2 2 1 2 1 0 0 0 0 7 The third part of the state set is C = fp0i : 0 i < blog2 ncg, a copy of B . The output function for these states is (p0i ) = b for all i. The fourth and nal part consists of a single dead state d. We set (d; e) = d for e 2 f0; 1g, and (d) = a. The start state is q . We leave to the reader the task of specifying the connections between the di erent groups of states, observing that transitions (qw; 0) for jwj = blog2 log2 nc + 1 that are not selfloops go to a state in C if wR]2 2 X , and otherwise go to a state in B . As an example, the machine in Figure 1 computes (si) correctly for all i < 28. The total number of states needed is jAj + jB j + jC j + 1 6 log 2 n.
q000/c 0 1 q001/a 0 0 q00 /c 0 q0 /c 0 q01 /a 1 qε /c 0 1 0 q1 /b 0 q11 /b 1 q10 /b 1 q010/a 0 p4 /a 1 0 0 p3 /a 0 p2 /a 0 p1 /a 1 0 p0 /b 1 q011/b 0 d/a 0 0,1 q100/b p’ /b 4 q101/a 0 0 q110/b 1 0 p’ /b 3 0 p’ /b 2 0 p’ /b 1 1 p’ /b 0 0 1 1 q111/a 0 Figure 1: Automaton computing s(i) for 0 i < 256. The input is the base2 expansion of i, starting with the least signi cant bit. The output is s(i). The states are labeled with the name of the state, followed by a slash, followed by the output associated with that state. All unmarked transitions go to the dead state, labeled d=a. Next, we exhibit a homomorphism whose xed point is 2automatic, but has high kautomaticity for all odd k. De ne '(0) = 01; '(1) = 00, and consider the xed point (p(i))i 0 starting with 0. It is easy to see that p(i) = 2(i + 1) mod 2, where 2(n) is the exponent of the highest power of 2 which divides n. 8 We rst give two simple lemmas: Lemma 12 Let (s(i))i be a sequence over a nite alphabet , and suppose that there exists a constant d such that for all r; a; b with r 2, 0 a; b < r, and a = b, there exists a non6 d ) such that s(rm + a) = s(rm + b). Then Ak (n) = (n = d =k ) negative integer m = O(r 6 s
0 1 ( +1) for all k 2, where the implied constant in the big does not depend on k. Pro...
View
Full Document
 Spring '10
 Klee
 Computer Science, Natural number, Prime number, partial quotients

Click to edit the document details