Unformatted text preview: Automaticity IV: Sequences, Sets, and Diversity
Je rey Shallit Department of Computer Science University of Waterloo Waterloo, Ontario, Canada N2L 3G1
[email protected] June 9, 1996
This paper studies the descriptional complexity of (i) sequences over a nite alphabet; and (ii) subsets of N (the natural numbers). If (s(i))i 0 is a sequence over a nite alphabet , then we de ne the k-automaticity of s, Ak (n), to be the smallest possible number of states in any deterministic nite s automaton that, for all i with 0 i n, takes i expressed in base-k as input and computes s(i). We give examples of sequences that have high automaticity in all bases k; for example, we show that the characteristic sequence of the primes has kautomaticity Ak (n) = (n1=43) for all k 2, thus making quantitative the classical s theorem of Minsky and Papert that the set of primes expressed in base-2 is not regular. We give examples of sequences with low automaticity in all bases k, and low automaticity in some bases and high in others. We also obtain bounds on the automaticity of certain sequences that are xed points of homomorphisms, such as the Fibonacci and Thue-Morse in nite words. Finally, we de ne a related concept called diversity and give examples of sequences with high diversity. Abstract 1 Introduction and De nitions
In this paper, I study the descriptional complexity of (i) sequences over a nite alphabet; and (ii) subsets of N (the natural numbers). In 1972, Cobham 5] introduced the notion of what is now called a k-automatic sequence. (In the literature, one can also nd the terms k-recognizable sequence and uniform tag sequence.) Roughly speaking, a sequence (s(i))i 0 over a nite alphabet is k-automatic if and only if s(i) is a nite-state function of the base-k representation of i. However, most sequences are not k-automatic for any k. Instead of simply saying that a sequence is not k-automatic, we can measure quantitatively how \close" a sequence is to
Research supported in part by a grant from NSERC. 1 being k-automatic using the concept of automaticity studied in previous papers of the author and co-authors 26, 27, 20, 10]. In addition to its evident intrinsic interest, automaticity has proved useful in obtaining nontrivial lower bounds in computational complexity theory; see 7, 8, 16, 17]. More formally, de ne a deterministic nite automaton with output (DFAO) M to be a 6-tuple, (Q; ; ; q0; ; ), where Q is a nite set of states, is a nite input alphabet, q0 is the start state, and is a nite output alphabet. The map : Q ! Q is called the transition function, and is extended in the obvious way to a map : Q ! Q. The map : Q ! is the output function. On input w 2 , the machine M outputs the single symbol ( (q0; w)). For more on these concepts, see, for example, 15]. Let k be an integer 2 and de ne k = f0; 1; : : : ; k ? 1g. If w 2 k , then by w]k I mean w evaluated as a base-k integer, that is, if w = w1w2 wr , then w]k = P1 i r wr?i+1ki?1. If n 0 is an integer, then by (n)k I mean the default base-k representation of n | that is, one not containing leading zeroes. Note that (0)k = , the empty string. Suppose (s(i))i 0 is a sequence over the nite alphabet . If there exists a DFAO M such that for all i 0, we have s(i) = ( (q0; wR)) for all w 2 k such that w]k = i, then the sequence (s(i))i 0 is said to be k-automatic. (Here wR is the reverse of the string w.) Note that the slightly awkward de nition results from the problem of \leading zeroes" input, and our convention that the machine M reads the input number starting with the least signi cant digit. Here is one alternate de nition of k-automatic sequences. De ne the k- ber of the sequence (s(i))i 0 at a to be Fk (s; a) = f(n)k : s(n) = ag: Then Fk (s; a) is a regular set for all a 2 if and only if the sequence (s(i))i 0 is k-automatic. Another alternate de nition of k-automatic sequences can be given in terms of a set called the k-kernel. Let (s(n))n 0 be a sequence over a nite alphabet. The k-kernel of (s(n))n 0, which we denote by Ksk , is de ned as follows: Ksk = f(s(kim + a))m 0 : i 0; 0 a < kig: (1) Eilenberg 9, Proposition 3.3, p. 107] proved that a sequence is k-automatic if and only if its k-kernel is nite. Given a sequence (s(i))i 0, we can de ne its k-automaticity Ak (n) as follows: Ak (n) is s s the smallest possible number of states in any DFAO M = (Q; ; ; q0; ; ) such that for all i with 0 i n, we have s(i) = ( (q0; wR)) for all w 2 k with w]k = i. We emphasize that the automaton is fed with the digits of i, starting with the least signi cant digit. This convention is actually important to specify, since it is known that there are languages of low automaticity whose reversal has high automaticity; see 10]. There is another way to de ne k-automaticity. Suppose we de ne the n-truncated kkernel of the sequence s, as follows: Ksk (n) = f(s(kim + a))0 m (n?a)=ki : i 0; 0 a < ki g: The n-truncated k-kernel consists of nite sequences. Call two such sequences v; w 2 Ksk (n) n-dissimilar if there exists a position j for which both v(j ) and w(j ) are de ned and v(j ) 6= 2 w(j ). (Note that under this de nition, if v is a pre x of w, then v and w are similar.) Then Ak (n) is de ned to be the maximum number of pairwise n-dissimilar sequences in Ksk (n). It s is not hard to see that this de nition is identical to the previous one; see 27]. Note that the condition m (n ? a)=ki is equivalent to kim + a n; in other words, the variable that is bounded by n is not m but the \true" variable kim + a. The following basic results on automaticity are easy to prove 27]: Proposition 1 Let (s(i))i be a sequence over a nite alphabet . Then (a) Ak (n) Ak (n + 1) for all n 0; s s (b) Ak (n) = O(1) if and only if s is k-automatic; s (c) There exists an absolute constant c such that if s is not k-automatic, then Ak (n) s c logk n for in nitely many n. (d) For any sequence s we have Ak (n) = O(n= log k n). s
0 As parts (b) and (c) of this theorem show, if a sequence is not k-automatic, then its kautomaticity must be greater than c logk n in nitely often. This suggests studying sequences that are not k-automatic, but which are \as close as possible" to k-automatic. We say that a sequence (s(i))i 0 is k-quasiautomatic if Ak (n) = O(log n). We then have the following s theorem, whose proof is easy and is omitted: Proposition 2 A sequence (s(i))i 0 is k-quasiautomatic if and only if it is ke -quasiautomatic for all e 1. So far we have discussed the k-automaticity of sequences, but the same terminology can be used for sets of non-negative integers. We say a set S N is k-automatic if its characteristic sequence ( S (n))n 0 is k-automatic. Similarly, if S is a set, then by Ak (n) we S mean Ak S (n). 2 Classical sets with high automaticity in all bases
In this section, we examine two classical sets (the primes, the squarefree numbers) and show that their characteristic sequences have high k-automaticity (that is, (n ) for some > 0) in all bases k 2. (By f = (g) we mean there exist positive constants c; n0 such that f (n) cg(n) for all n n0.) For the primes, our results can be viewed as making quantitative the classical result of Minsky and Papert 19] that the primes expressed in base 2 cannot be accepted by a nite automaton. Our method is based on the following useful lemma: Lemma 3 Let (s(i))i 0 be a sequence over a nite alphabet , and suppose that there exists a constant d such that for all r; a; b with r 2, 1 a; b < r, a 6= b, and gcd(r; a) = gcd(r; b) = 1, there exists a non-negative integer m = O(rd ) such that s(rm + a) 6= s(rm + b). Then Ak (n) = (n1=(d+1)=(k log log n)) for all k 2, where the implied constant in the big- does s not depend on k. 3 Proof. Since m = O(rd ), there exists a constant c such that m crd ? 1 for all r 2. Let i = b(logk n ? logk c)=(d + 1)c. Then
1 n 1=(d+1) < ki n 1=(d+1): kc c Put r = ki. It follows that there exists m ckid ? 1 such that s(kim + a) 6= s(kim + b). However, kim + a < (ckid ? 1)ki + ki = cki(d+1) c (n=c) = n, and the same bound holds for kim + b. It follows that the two subsequences (s(kit + a))t 0 and (s(kit + b))t 0 are n-dissimilar. Since a; b were arbitrary integers relatively prime to r, we know that there are at least '(ki) pairwise n-dissimilar sequences, where ' is Euler's phi-function. By 21, Theorem 15], we know that '(n) n=(5 log log n) for n 3. Hence '(ki) ki 5 log log ki (1=k)(n=c) = d : 5 log log(n=c) = d
1 ( +1) 1 ( +1) Thus Ak(n) = (n1=(d+1)=(k log log n)). s We rst examine the automaticity of the characteristic sequence of the primes. We need the following lemmas.
2 3 Lemma 4 For all x 1 we have Qx<p x p > ex= , where the product is over primes only. Proof. Let #(x) = Pp x log p, where the sum is over primes only. We know that #(x) < 1:000081x for x > 0 22, p. 360], and #(x) :84x for x 101 21, Theorem 10]. It follows that Px<p 2x log p > 1:68x ? 1:000081x > x=3 for x 101=2. Now it is easily veri ed by computer or hand Q calculation that Px<p 2x log p > x=3 for 1 x < 101=2. It follows that x<p 2x p > ex=3 for all x 1.
11 2 Lemma 5 Given integers k; l 1 with gcd(k; l) = 1, there exists a prime p = km + l with
m = O(max(k; l)
= Proof. Choose x = max(1; l=k; 3 log l). Then from the previous lemma we have Qx<p x p > l, so there exists a prime q=l with x < q 2x. Now q > l=k, so kq > l, and gcd(kq; l) = j
2 ). The constant in the big-O is independent of k and l. 1. Hence by Heath-Brown's version of Linnik's theorem 14], there exists a prime p l (mod kq) with p = O((kq)11=2). Since q 2x = 2 max(1; l=k; 3 log l), we have p = O(max(l11=2; (k log l)11=2; k11=2)). Hence m = (p?l)=k = O(max(l11=2k?1 ; k9=2(log l)11=2); k9=2), and the result follows.
165 4 Lemma 6 Given integers r; a; b with r 2, gcd(r; a) = gcd(r; b) = 1, 1 a; b < r, and a = b, there exists m = O(r = ) such that rm + a is prime and rm + b is composite. 6
4 Proof. We use a trick suggested by papers of Hartmanis and Shank 12] and Allen 1]. By Heath-Brown's version of Linnik's theorem 14], there exists m0 = O(r9=2) such that p = rm0 + a is a prime. De ne q = rm0 + b. Then q = O(r11=2). If q is composite, we're done, and m0 = O(r9=2). Otherwise, assume q is prime. Now, in Lemma 5, take k = qr and l = qr + p. Then there exists m1 = O((qr + p)11=2) = O(r143=4) such that (qr)m1 + (qr + p) is prime. However, t = (qr)m1 + (qr + q) is composite, since q j t and q < t. Take m = qm1 + q + m0. Then m = O(r165=4).
1 43 integers k 2. Proof. Combine Lemmas 3 and 6. We note that the constant 1=43 in Theorem 7 is not optimal. Indeed, the constant 11=2 in Lemma 5 is almost certainly not optimal. Wagsta 31] has provided a heuristic model that predicts that the least prime congruent to l (mod k) is O('(k)(log k)(log '(k))). If this prediction were true, it would improve the constant 1=43 in Theorem 7 to 1=(2 + ). We now turn to providing a lower bound on the k-automaticity of the squarefree numbers. Recall that a number n is said to be squarefree if t2=n for all integers t > 1. j Lemma 8 Let (si)i 0 be de ned as follows: si = 1;; if i is squarefree; 0 otherwise. Then for all > 0, and r; a; b such that r 2, 1 a < r, and 0 b < r with gcd(a; r) squarefree and a 6= b, there exists an m = O(r13=9+ ) such that rm + a is squarefree and rm + b is not squarefree. Proof. Let q be the least prime not dividing rjb ? aj. Since rjb ? aj < r2, by the prime number theorem we have q = O(log r2) = O(log r). Now rk + b 0 (mod q2) if and only if k = ?br?1 (mod q2). Let c be such that 0 c < q2 and c ?br?1 (mod q2). Consider the arithmetic progression ((rq2)m + (rc + a))m 0: We have gcd(rq2; rc + a) is squarefree, because any prime divisor of rq2 and rc + a must be a divisor of r or q2. But t j r and t j rc + a implies t j a, and we know gcd(r; a) is squarefree by hypothesis. On the other hand, rc + a 0 (mod q) implies that rc ?a (mod q). But rc ?b (mod q), so a b (mod q), a contradiction since q=a ? b. Hence q2= gcd(rq2; rc + a). j j 13=9+ Then, by a result of Heath-Brown 13], there exists an m0 = O(r ) such that (rq2)m0 + (rc + a) is squarefree. Take m = q2m0 + c. Then rm + a is squarefree, but rm + b is divisible by q2. Theorem 7 The set P of prime numbers has k-automaticity Ak (n) = (n = ) for all P k 2. Theorem 9 The set S of squarefree numbers has k-automaticity Ak (n) = (n = ) for all S
25 Proof. Apply Lemma 3 with d = 13=9 + . Again, the constant 2=5 in Theorem 9 is not optimal. 5 3 A set with low automaticity in all bases
Theorem 10 De ne a(1) = 1, and a(i + 1) = a(i) + Q b i b b b a i c for i 1. Then the set A = fa(i) : i 1g is not k-automatic, but is k-quasiautomatic for all k 2.
2 +2 1+ log () In this section I give an example of a sequence that is k-quasiautomatic for all k 2. The sequence (a(i)) begins 1; 3; 39; 331815; 114126085737676800331815; : : : Proof. First, we note the following observation. Suppose there exists an in nite string w = w w w over k = f0; 1; : : :; k ? 1g such that all but nitely many members s of a set S have the \pre x property", that is, (s)R is a pre x of w. Then Ak (n) = O(log n). k S To see this, note that in this case we can write S = S S , where S is nite and S has the pre x property. To build an automaton that accepts all the base-k representations of elements of S \ 0; n], we simply create a linear chain of nodes, with transitions between them labeled with the symbols of w. The accepting states correspond to the members of S , and of course we need a single dead state in addition to handle the other transitions. The resulting automaton has log n + O(1) states. Since S is nite, we can accept it with a nite automaton. The result now follows because we can accept S S using a direct product construction. The construction of the sequence (a(i))i should now be clear. For bases k 2, the sequence has the property that (a(i))R is a pre x of (a(i +1))R provided i k ? 1. Hence the k k observation of the previous two paragraphs applies, and the automaticity of A is O(log n) for all k 2. Note, however, that the constant in the big-O depends on k. To show that A is not k-automatic for any k, it su ces to show that limi!1 a(i)=ki = 1. But this follows, since from the recurrence we have a(i) i!.
0 1 2 1 2 1 2 2 2 2 1 1 2 1 4 Automaticity of xed points of homomorphisms
Let ' be a homomorphism from for some x 2 , then to
2 . If there is a symbol a 2
3 such that '(a) = ax y = ax'(x)' (x)' (x) = jlim 'j (a) !1 is a xed point of '; that is, '(y) = y. If further ' is nonerasing (i.e., '(b) 6= for all b 2 ), then y is in nite. If j'(b)j = k for all b 2 , then ' is said to be k-uniform. A 1-uniform homomorphism is called a coding. A well-known theorem of Cobham 5] states that (s(i))i 0 is the image (under a coding) of a xed point of a k-uniform homomorphism if and only if (s(i))i 0 is k-automatic. A natural problem is to determine the automaticity of xed points of non-uniform homomorphisms. In particular, are there xed points of homomorphisms which are quasiautomatic, but not automatic? This question was raised by the author in 1992 in the context 6 of the xed point (tn)n 0 of the homomorphism 1 ! 121; 2 ! 12221. The sequence (tn) and its relationship to the classical Thue-Morse sequence was studied by Allouche et al. 2]. Computation strongly suggests that (tn) is 2-quasiautomatic. For example, t16n+1 = t64n+1 for 0 n 1864134, but not for n = 1864135. Although we are not yet able to prove the 2-quasiautomaticity of (tn), it is possible to prove that it is not 2-automatic 24]. (This last result was, according to J.-P. Allouche (personal communication), also proved by M. Mkaouar.) We now give three examples. First, we exhibit a homomorphism whose xed point is 2-quasiautomatic, but not 2-automatic. Next, we give a homomorphism whose xed point is 2-automatic, but not k-quasiautomatic for any odd k. Finally, we use some simple theorems of Diophantine approximation to exhibit a homomorphism whose xed point is not k-quasiautomatic for any k 2. Theorem 11 Let '(c) = cba, '(a) = aa, and '(b) = b. Let (si)i be the xed point of ' beginning with c. Let X = f2j + j : j 0g. Then (a) s = c and si = b if and only if i 2 X ;
0 0 (b) (si )i 0 is not 2-automatic. (c) (si )i 0 is 2-quasiautomatic. Proof. Part (a) follows easily from the observation that 'r (c) = cbaba ba ba ba r? : For part (b), it su ces to show that L = F (s; b) is not a regular set. It is easy to see
2 4 8 2 1 that 2 L = f1 0n?b 2 nc? (n) : n 1g f1g: Now a routine argument using the pumping lemma 15] completes the proof. Finally, for part (c), it su ces to construct an automaton with output with O(log n) states that generates the terms of the sequence (si) correctly for all i n. We sketch the construction of such an automaton, leaving the details to the reader. Let n = i n i. If L is a language, we say that L0 is an nth-order approximation to L if L \ n = L0 \ n . The basic idea of our construction is that it su ces to concentrate on LR = F (s; b)R and create an automaton accepting a (1 + blog nc)th order approximation to LR. This is easy, since strings in LR begin with a short sequence of bits which are followed by many zeroes and then a 1. The state set consists of four parts. The rst part is A = fqw : w 2 (0+1) b 2 2 nc g. This part of the automaton forms a binary tree that can handle all possible strings of length blog log nc + 1. The transitions between states in the rst part are given by (qw ; e) = qwe for jwj blog log nc and e 2 f0; 1g. The output function for the states in A is given by (q ) = c, (qw ) = b if wR] 2 X , and (qw) = a for all other w. The second part of the automaton consists of a linear chain of states, B = fpi : 0 i < blog ncg. The transitions between states in the second part are given by (pi; 0) = pi? for 2 i < blog nc, (p ; 1) = p , and (p ; 0) = p . The output function for these states is (pi) = a for i 6= 0, and (p ) = b.
log 1 2 0 2 2 log log +1 2 2 2 2 2 2 1 2 1 0 0 0 0 7 The third part of the state set is C = fp0i : 0 i < blog2 ncg, a copy of B . The output function for these states is (p0i ) = b for all i. The fourth and nal part consists of a single dead state d. We set (d; e) = d for e 2 f0; 1g, and (d) = a. The start state is q . We leave to the reader the task of specifying the connections between the di erent groups of states, observing that transitions (qw; 0) for jwj = blog2 log2 nc + 1 that are not self-loops go to a state in C if wR]2 2 X , and otherwise go to a state in B . As an example, the machine in Figure 1 computes (si) correctly for all i < 28. The total number of states needed is jAj + jB j + jC j + 1 6 log 2 n.
q000/c 0 1 q001/a 0 0 q00 /c 0 q0 /c 0 q01 /a 1 qε /c 0 1 0 q1 /b 0 q11 /b 1 q10 /b 1 q010/a 0 p4 /a 1 0 0 p3 /a 0 p2 /a 0 p1 /a 1 0 p0 /b 1 q011/b 0 d/a 0 0,1 q100/b p’ /b 4 q101/a 0 0 q110/b 1 0 p’ /b 3 0 p’ /b 2 0 p’ /b 1 1 p’ /b 0 0 1 1 q111/a 0 Figure 1: Automaton computing s(i) for 0 i < 256. The input is the base-2 expansion of i, starting with the least signi cant bit. The output is s(i). The states are labeled with the name of the state, followed by a slash, followed by the output associated with that state. All unmarked transitions go to the dead state, labeled d=a. Next, we exhibit a homomorphism whose xed point is 2-automatic, but has high kautomaticity for all odd k. De ne '(0) = 01; '(1) = 00, and consider the xed point (p(i))i 0 starting with 0. It is easy to see that p(i) = 2(i + 1) mod 2, where 2(n) is the exponent of the highest power of 2 which divides n. 8 We rst give two simple lemmas: Lemma 12 Let (s(i))i be a sequence over a nite alphabet , and suppose that there exists a constant d such that for all r; a; b with r 2, 0 a; b < r, and a = b, there exists a non6 d ) such that s(rm + a) = s(rm + b). Then Ak (n) = (n = d =k ) negative integer m = O(r 6 s
0 1 ( +1) for all k 2, where the implied constant in the big- does not depend on k. Pro...
View Full Document
- Spring '10
- Computer Science, Natural number, Prime number, partial quotients