Automatic sequences. Theory, applications, generalizations. (English) Zbl 1086.11015

Cambridge: Cambridge University Press (ISBN 0-521-82332-3/pbk). xvi, 571 p. (2003).
Words (or sequences) are natural objects in several areas of Mathematics and Computer Science. Numbers, for instance, can be represented by sequences over a finite alphabet. For an integer \(k \geq 2\), the base-\(k\) representation of a non-negative integer \(n\) is the unique sequence \((n)_k = a_t a_{t-1} \ldots a_0\) satisfying \(n = \sum_{0 \leq i \leq t} a_i k^i\), \(a_t \not = 0\), and \(a_i \in \Sigma_k = \{0, 1, \ldots, k-1\}\). For \(w = b_1 b_2 \ldots b_r\), an inverse operation is defined as \([w]_k = \sum_{1 \leq i \leq r} b_i k^{r-i}\). Clearly, \([(n)_k]_k = n\).
Computer Science uses computational models to idealize computers. One of the simplest models, called the deterministic finite automaton, is a 5–tuple \(M = (Q, \Sigma, \delta, q_0, F)\), where \(Q\) is a finite set called the set of states, \(\Sigma\) is a finite set called the input alphabet, \(\delta: Q \times \Sigma \rightarrow Q\) is the transition function, \(q_0 \in Q\) is the initial state, and \(F \subseteq Q\) is the set of accepting states.
Informally, the automaton \(M\) is started in state \(q_0\) and is fed with an input \(w\) (a sequence of symbols or letters from \(\Sigma\)). Then \(M\) moves from state to state according to its transition function \(\delta\), while reading the symbols of \(w\). When the end of the sequence is reached, \(M\) halts in a state \(q\) accepting \(w\) if \(q \in F\) (and rejecting otherwise).
Another model of computation, called the deterministic finite automaton with output, is a 6–tuple \(M = (Q, \Sigma, \delta, q_0, \Delta, \tau)\), where \(Q, \Sigma, \delta, q_0\) are as above, \(\Delta\) is a finite set called the output alphabet, and \(\tau: Q \rightarrow \Delta\) is the output function.
Such a finite state machine defines a function \(f\) from \(\Sigma^*\), the set of all sequences over \(\Sigma\), to \(\Delta\), as \(f(w) = \tau(\delta(q_0,w))\) where \(\delta(q_0,w)\) is the state \(M\) ends up in when started in state \(q_0\) and fed with the input \(w\). At this point, \(M\) outputs the symbol \(\tau(\delta(q_0,w))\).
A sequence \(a_0 a_1 \ldots\) over a finite alphabet \(\Delta\) is k-automatic if there exists a deterministic finite automaton with output \(M = (Q, \Sigma_k, \delta, q_0, \Delta, \tau)\) such that \(a_n = \tau(\delta(q_0,w))\) for all \(n \geq 0\) and all \(w\) satisfying \([w]_k = n\). Several sequences of mathematical interest turn out to be \(k\)-automatic for some \(k\): the 2-automatic Thue-Morse sequence, the 2-automatic Rudin-Shapiro sequence, to name a few.
A theorem of Cobham gives a beautiful description of \(k\)-automatic sequences in terms of \(k\)-uniform morphisms. To be more precise, suppose that \(\Sigma = \Delta\). A morphism \(\varphi: \Delta^* \rightarrow \Delta^*\) is called \(k\)-uniform if the length of \(\varphi(a)\) is \(k\) for all \(a \in \Delta\). If there exists a letter \(a\) such that \(\varphi(a) = aw\) for some word \(w\) of length \(k-1\), then the sequence \(aw\varphi(w)\varphi^2 (w) \varphi^3 (w) \ldots\) is the unique infinite fixed point of \(\varphi\) starting with \(a\). Cobham’s result states that a sequence is \(k\)-automatic if and only if it is the image (under a coding) of a fixed point of a \(k\)-uniform morphism. Thus, automatic sequences are not only generated by finite automata, but are also generated by iterating uniform morphisms.
Allouche and Shallit’s book presents an introduction to the fascinating subject of automatic sequences. This is the first book that systematically develops the theory of these sequences. It brings together results in automata theory and number theory in a consistent and unified framework. In the literature, automatic sequences have been studied under the name uniform tag sequences, in analogy with Post’s process of tag, and also under the name recognizable sequences. The book does assume some mathematical maturity from the reader. The major contents of the book can be divided into several categories which are summarized as follows:
Background material Chapters 1–4 include background material used in the rest of the book. Chapter 1 discusses the basic concepts of finite and infinite words. Chapter 2 reviews knowledge from number theory and algebra. Chapter 3 emphasizes the base-\(k\) representation of integers by sequences over the alphabet \(\Sigma_k\). Chapter 4 introduces finite automata and other models of computation.
Theory Chapter 5 defines the fundamental objects of the book: \(k\)-automatic sequences, while later chapters explore properties of these sequences. Ultimately periodic sequences turn out to be \(k\)-automatic for all \(k \geq 2\).
Chapter 6 begins the study of sequences that are fixed points of morphisms focusing on the case of uniform morphisms. Chapter 7 studies the more general case where the morphisms need not be uniform. The famous Fibonacci sequence is the infinite fixed point of the morphism that maps 0 to 01 and 1 to 0. Chapter 8 shows how to prove that some sequences are not \(k\)-automatic for any \(k\).
Chapter 9 introduces a class of infinite sequences over \(\Sigma_2\), the so-called characteristic sequences, which are of great number-theoretic interest and which generalize the Fibonacci sequence. Sturmian sequences are also introduced in this chapter.
Chapter 10 focuses on the subword complexity of infinite words. An infinite word \(w\) may be partially understood by studying its finite subwords. A natural question that arises is: “How many distinct subwords of \(w\) of length \(n\) are there, and what is the growth rate of this number as \(n\) approaches infinity?” This question refers to a measure of complexity, called subword complexity. This measure is of particular interest since automatic sequences have relatively low subword complexity, while random sequences have high such complexity.
Chapter 11 proves a deep theorem due to Cobham which states that if a sequence is both \(k\)-automatic and \(\ell\)-automatic for some multiplicatively independent integers \(k\) and \(\ell\), then that sequence is ultimately periodic.
Generalizations Several generalizations of automatic sequences are discussed in the book. These include:
Morphic sequences Chapter 7 generalizes \(k\)-automatic sequences, which are generated by iterating uniform morphisms, to morphic sequences, generated by arbitrary morphisms. Such sequences have been studied in the literature under several names including D0L sequences.
Multidimensional automatic sequences Chapter 14 generalizes \(k\)-automatic sequences, which are one-dimensional sequences, to higher-dimensional automatic sequences. This chapter concentrates on the two-dimensional case.
k-regular sequences Chapter 16 generalizes \(k\)-automatic sequences, which are defined over finite alphabets, to \(k\)-regular sequences, defined over infinite alphabets. There, we can find some examples of \(k\)-regular sequences such as the sequence which counts the sum of digits in the base-\(k\) representation of a non-negative integer \(n\). The theory of \(k\)-regular sequences is closely related to the one of rational series.
Applications Applications of automatic sequences are discussed throughout the book. To cite some examples, applications are found in: Number theory (particularly formal power series and transcendence in finite characteristic (Chapters 9, 12 and 13)); Physics (particularly quasicrystals (Chapter 17)); Computer graphics (Chapter 14); and Music (Chapters 1, 7 and 16).
Allouche and Shallit’s book, mostly self-contained, is suitable for courses in automata theory and number theory at the graduate or advanced undergraduate level. Experts who want to learn more about automatic sequences and their generalizations will also find it useful. Exercises are provided at the end of each chapter, some of which offer the kind of drill undergraduate students need in order to test their understanding, while others are much more substantial problems. Hints, references, and solutions for selected exercises are given in the appendix. Chapters are supplemented by open problems, as well as citations to the literature.
This book, which incorporates results from both Mathematics and Computer Science, will be very valuable to a large audience.


11B85 Automata sequences
11-01 Introductory exposition (textbooks, tutorial papers, etc.) pertaining to number theory
68R15 Combinatorics on words
68Q45 Formal languages and automata
Full Text: DOI

Online Encyclopedia of Integer Sequences:

Kolakoski sequence: a(n) is length of n-th run; a(1) = 1; sequence consists just of 1’s and 2’s.
Bell or exponential numbers: number of ways to partition a set of n labeled elements.
1’s-counting sequence: number of 1’s in binary expansion of n (or the binary weight of n).
Total number of 1’s in binary expansions of 0, ..., n.
Thue-Morse sequence: let A_k denote the first 2^k terms; then A_0 = 1 and for k >= 0, A_{k+1} = A_k B_k, where B_k is obtained from A_k by interchanging 1’s and 2’s.
Sierpiński’s triangle (Pascal’s triangle mod 2) converted to decimal.
The ruler function: exponent of the highest power of 2 dividing 2n. Equivalently, the 2-adic valuation of 2n.
Number of bifix-free (or primary, or unbordered) words of length n over a two-letter alphabet.
The infinite Fibonacci word: start with 1, repeatedly apply the morphism 1->12, 2->1, take limit; or, start with S(0)=2, S(1)=1, and for n>1 define S(n)=S(n-1)S(n-2), then the sequence is S(oo).
The infinite Fibonacci word (start with 0, apply 0->01, 1->0, take limit).
Primes written in base 2.
Running sum of every third term in the {+1,-1}-version of Thue-Morse sequence A010060.
The binary complement of the infinite Fibonacci word A003849. Start with 1, apply 0->1, 1->10, iterate, take limit.
a(2n) = a(n) + a(n+1), a(2n+1) = 2a(n+1), if n >= 2.
Number of tripling steps to reach 1 from n in ’3x+1’ problem, or -1 if 1 is never reached.
The almost-natural numbers: write n in base 10 and juxtapose digits.
Exponent of highest power of 2 dividing n, a.k.a. the binary carry sequence, the ruler sequence, or the 2-adic valuation of n.
Numbers that contain only 1’s and 2’s. Nonempty binary strings of length n in lexicographic order.
a(n) is the final nonzero digit of n!.
Characteristic function of primes: 1 if n is prime, else 0.
Characteristic function of squares: a(n) = 1 if n is a square, otherwise 0.
Thue-Morse sequence: let A_k denote the first 2^k terms; then A_0 = 0 and for k >= 0, A_{k+1} = A_k B_k, where B_k is obtained from A_k by interchanging 0’s and 1’s.
Aitken’s array: triangle of numbers {a(n,k), n >= 0, 0 <= k <= n} read by rows, defined by a(0,0)=1, a(n,0) = a(n-1,n-1), a(n,k) = a(n,k-1) + a(n-1,k-1).
The regular paper-folding sequence (or dragon curve sequence).
The regular paper-folding (or dragon curve) sequence.
Positive numbers k such that k = x^5 + y^5 has a solution in nonzero integers x, y.
The Rudin-Shapiro or Golay-Rudin-Shapiro sequence (coefficients of the Shapiro polynomials).
Zero-one version of Golay-Rudin-Shapiro sequence (or word).
a(n) = number of 1’s between n-th 2 and (n+1)st 2 in A026600.
Number of aperiodic binary strings of length n; also number of binary sequences with primitive period n.
Number of 1’s when n is written in base -2.
Jacobi (or Kronecker) symbol (-1/n).
a(n) = (-1)^n * Sum_{i=0..n} binomial(n+1,i+1)*Catalan(i).
Triangle T(n,k) = Sum_{d|gcd(n,k)} mu(d)*C(n/d,k/d) (n >= 1, 1 <= k <= n).
The parity of the number of zero digits when n is written in binary.
If A_k denotes the first 3^k terms, then A_0 = 0, A_{k+1} = A_k A_k B_k, where B_k is obtained from A_k by interchanging 0’s and 1’s.
(Number of 1’s in binary expansion of n) mod 3.
Decimal expansion of c = (7-sqrt(5))/2.
Tribonacci word: limit S(infinity), where S(0) = 0, S(1) = 0,1, S(2) = 0,1,0,2 and for n >= 0, S(n+3) = S(n+2) S(n+1) S(n).
Lexicographically earliest squarefree infinite ternary word.
Decimal expansion of Product_{n>=1} (2n/(2n+1))^((-1)^t(n)), where t(n) = A010060(n) is the Thue-Morse sequence.
Baum-Sweet sequence: a(n) = 1 if binary representation of n contains no block of consecutive zeros of odd length; otherwise a(n) = 0.
Number of adjacent identical digits in the binary representation of n.
The ternary tribonacci word; also a Rauzy fractal sequence: fixed point of the morphism 1 -> 12, 2 -> 13, 3 -> 1, starting from a(1) = 1.
Decimal expansion of log(3)/log(4).
Sum of all digits in ternary expansions of 0, ..., n.
Numbers k such that gcd(k, A000120(k)) = 1.
Parity of 1-fibits in Zeckendorf expansion A014417(n).
Fixed point of the morphism 0->01, 1->011.
Prime analog of Baum-Sweet sequence: a(n) = 1 if binary representation of n contains no block of consecutive zeros of exactly prime length; otherwise a(n) = 0.
Numbers n such that n in binary representation has a block of exactly a semiprime number of zeros.
Numbers n such that n in binary representation has a block of exactly a nontrivial triangular number number of zeros.
Numbers n such that n in binary representation has a block of exactly a nontrivial square number of zeros.
Numbers n such that n in ternary representation (A007089) has a block of exactly a prime number of consecutive zeros.
Numbers n such that n in binary representation has a block of exactly a nontrivial pentagonal number of zeros.
Define a sequence of binary words by w(1) = 01 and w(n+1) = w(n)w(n)Reverse[w(n)]. Sequence gives the limiting word w(infinity).
a(n) = floor(S2(n)/2) mod 2, where S2(n) is the binary weight of n.
Concatenation of n digits 1 and n digits 0.
Sturmian word of slope (3-sqrt(3))/2.
For a binary reflected Gray code, the (Hamming/Euclidean) distance between 2 subsequent points x and y is 1, say in coordinate k. If y has a 1 in coordinate k and x has a 0, than (x,y) is indicated by k, if it is the other way around, (x,y) is indicated by -k. The sequence has a fractal character such that G(d+1) = G(d) d+1 R(G(d)) where R(G(d)) alters d –> -d and leaves all other numbers invariant.
The Pell word: Fixed point of the morphism 0->001, 1->0.
Primes p such that p^3 = q//3 for a prime q, where ”//” denotes concatenation.
Ordered list in decimal notation of the subwords (with leading zeros omitted) appearing in the infinite Fibonacci word A005614 (0->1 & 1->10).
Number of 0’s (mod 3) in the binary expansion of n.
”Recurrence function” for Thue-Morse infinite word A010060.
(Number of 1’s in the binary expansion of n) mod 4.
[nr]-[kr]-[nr-kr], where r=sqrt(3), k=1, [ ]=floor.
Fixed point of the morphism 0->001, 1->010.
List of words over {1,2} with equal numbers of 1’s and 2’s.
Trajectory of 0 under the morphism 0 -> 01, 1 -> 202, 2 -> empty string.
a(n) = min{2 + c_n + 2(c_1 + c_2 + ... + c_(n-1)) | n = p + q, 1 <= p < q < n, gcd(p, q) = 1, and p/q has continued fraction expansion [0; c_1, c_2, ...., c_n]}.
a(n) = Sum_{i=0..n} digsum_4(i), where digsum_4(i) = A053737(i).
a(n) = Sum_{i=0..n} digsum_5(i), where digsum_5(i) = A053824(i).
a(n) = Sum_{i=0..n} digsum_6(i), where digsum_6(i) = A053827(i).
a(n) = Sum_{i=0..n} digsum_7(i), where digsum_7(i) = A053828(i).
a(n) = Sum_{i=0..n} digsum_8(i), where digsum_8(i) = A053829(i).
a(n) = Sum_{i=0..n} digsum_9(i), where digsum_9(i) = A053830(i).
Trajectory of 0 under the morphism 0 -> 001, 1 -> 0010.
First differences of the Beatty sequence A022841 for sqrt(7).
First differences of the Beatty sequence A004976 for 2 + sqrt(5).
If A_k denotes the first 2*3^k terms, then A_0 = 01, A_{k+1} = A_k A_k B_k, where B_k is obtained from A_k by interchanging 0’s and 1’s.
{00->null}-transform of the infinite Fibonacci word A003849.
Positions of 1 in A286907; complement of A286908.
Positions of 1 in A287769; complement of A276855.
Image of 3 under repeated applications of the morphism 1 -> 11, 2 -> 2, 3 -> 312.
Decimal expansion of real number whose continued fraction expansion is [0, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 2, 1, ...] (the Thue-Morse sequence A001285).
The 3-adic valuation of the central Delannoy numbers (sequence A001850).
For any number n >= 0 with binary expansion (b_1, ..., b_w), a(n) is the least p > 0 such that b_i = b_{p+i} for i = 1..w-p.