Logarithmic combinatorial structures: A probabilistic approach.

*(English)*Zbl 1040.60001
EMS Monographs in Mathematics. Zürich: European Mathematical Society (EMS) (ISBN 3-03719-000-0/hbk). xi, 363 p. (2003).

This research monograph is concerned with precise approximations to the process of component counts in random decomposable combinatorial structures by simpler discrete, independent processes. The authors develop powerful methods for pushing most of the results (as well as the conditions required) to their limit. The arguments used are completely probabilistic (unlike some of the authors’ previous papers), and indeed the book may be roughly summarized as a deep study of component counts in decomposable combinatorial structures without generating function, without Fourier analysis, and with purely probabilistic reasoning.

A prototype of the so-called “logarithmic combinatorial structures” is cycles in permutations. Permutations can be viewed as a set of cycles, whose exponential generating function is of the form \(\log(1/(1-z))\) (since there are \((n-1)!\) different ways of arranging \(n\) distinct numbers in a cycle), thus the term “logarithmic”. The authors’ view of the cycle structures is through the fine, informative counting process \((C_1(n), C_2(n), \dots)\), where \(C_i(n)\) denotes the number of cycles of size \(i\) in a random permutation of \(n\) elements. The basic idea is that if we have good knowledge on such processes, then many other properties like the distribution of the total number of cycles and functional limit theorems can be derived. Two typical types of limit theorems are as follows. (i) \((C_1(n), C_2(n), \dots)\) converges in distribution to \((\text{Po}(1), \text{Po}(\frac 12), \dots)\) in \(\mathbb{Z}_+^\infty\), where \(\text{Po}(\frac1i)\) denotes a Poisson random variable with mean \(1/i\), and, (ii) for the size of the \(k\)th largest cycle \(L_k(n)\), \(n^{-1} (L_1(n), L_2(n), \dots)\) converges in distribution to \((L_1, L_2, \dots)\) in \([0,1]^\infty\), where the random vector \((L_1, L_2,\dots)\) has a Poisson-Dirichlet distribution with parameter \(1\).

These results roughly say that counts of small cycles are asymptotically independent, while those of large cycles are strongly dependent. In this book such results are (i) extended to cover a wider class of combinatorial structures, and (ii) refined by stronger types of approximation theorems with different degrees of precision on error rate.

For a general treatment of the component counts, the following two conditions are crucial: (i) the conditioning relation \[ \mathcal{L}(C_1^{(n)}, \dots,C_n^{(n)}) = \mathcal{L}(Z_1,\dots,Z_n \mid Z_1+2Z_2+\cdots + nZ_n =n), \] and (ii) the logarithmic condition \[ \mathbb{P}(Z_n=1) \sim \mathbb{E}(Z_n) \sim \theta/n \qquad (\theta>0). \] Here \(C_j^{(n)}\) denotes the number of components of size \(j\) and the \(Z_j\)’s are some discrete random variables, which are usually Poisson, binomial or negative binomial. From a combinatorial point of view, the conditioning relation corresponds roughly to decomposability of the underlying structure and the logarithmic condition reflects analytic properties needed on the generating functions.

For the types of approximation theorems, the authors study mostly the total variation distances \(d_{\text{TV}}(\mathcal{L}(C_1^{(n)}, \dots,C_b^{(n)}), \mathcal{L}(Z_1,\dots,Z_b))\) and \( d_{\text{TV}}(\mathcal{L}(C_{b+1}^{(n)}, \dots,C_n^{(n)}), \mathcal{L}(C_{b+1}^{*(n)},\dots,C_n^{*(n)}))\), where \(d_{\text{TV}}\) denotes the total variation distance and \(C_j^{*(n)}\) denotes the numbers of cycles of lengths \(j\), in a \(\theta\)-biased random permutation of \(n\) objects (whose joint distribution follows the Ewens sampling formula). Once good bounds are available for these distances, properties for the approximators can be transferred to those of the component counts. Thus two chapters (4 and 5) are devoted to derive all types of properties needed for the Ewens sampling formula, which include local and functional limit theorems and large deviations for the number of cycles, distributions of the lengths of small cycles, ordered cycle lengths, and largest cycles, and the Erdős-Turán law for log order of a random element. The diversity of the types of results implied by the authors’ strong approximation theorems shows the advantages of the authors’ consideration of the more general total variation distances.

A large amount of fine estimates are derived in order to achieve the generality of the condition and the precision for the total variation distances. The first half of the book (first six chapters) describes the background, the main types of results discussed in the book, properties needed for the Ewens sampling formula (which is used later on as a basic approximator), as well as the methods of proof used. Chapter 7 describes the main theorems of the book under general conditions whose consequences are discussed in Chapter 8; the proofs are then given in Chapters 9 to 14 (pp. 225 – 338).

The contents of each chapter are sketched as follows. Chapter 1 starts from a discussion of properties of cycles in permutations: joint distribution and moments of cycle counts (Cauchy’s formula), marginal distribution, total variation distance between the cycle counts and Poisson process, the Feller coupling, local and functional limit theorems for total number of cycles, distributions of the small and large cycles, age-ordered list of cycle lengths, and the Erdős-Turán law. Then the asymptotic analogy between cycles in permutations and prime factors in random integers are highlighted (theory in one side is generally inspiring for that in the other). Chapter 2 describes decomposable combinatorial structures, aiming mainly at assemblies (set construction for labelled structures), multisets and selections (set construction for unlabelled structures), and their variations by refining, coloring and tilting, the viewpoint being consistently probabilistic. Examples include integer partitions, set partitions, polynomials over finite fields, binary search trees, additive arithmetic semigroups, coagulation fragmentation processes, etc. Chapter 3 gives an overview of the main results in the book, beginning with the conditioning relation and logarithmic condition. A section on total variation and Wasserstein distances and another on Stein’s method are included. Chapters 4 and 5 give a detailed discussion of the Ewens sampling formula. The notion of size-biasing is first introduced. Chapter 4 then focusses on distributional properties of the random variables \(n^{-1} \sum_{b<j\leq n} jZ_j\), where \(\mathcal{L}(Z_j) =\mathcal{L}(\text{Po}(1/j))\), \(0\leq b<n\). The main reason of studying such partial sums is that the total variation distance between two random vectors are reduced to some weighted sum of two random variables (one-dimensional) of such a type. Chapter 5 then deals with the more general process approximations. Chapter 6 turns to general logarithmic combinatorial structures and follows the same pattern of treatment for Ewens sampling formula by considering first distributions of \(n^{-1}\sum_{b<j\leq n}jC_j^{(n)}\) and then the associated process approximations. It derives, in particular, asymptotics of the total variation distance between the process of component counts in assembly, multiset and selection constructions and the associated limiting process. From Chapter 7 on, the treatment is for more general (not only for assembly, multiset and selection) logarithmic structures but under stronger analytic assumptions in order to prove more effective bounds on the total variation distances. “Chapter 8 gives a number of consequences of the approximation theorems of the preceding chapter, illustrating the power inherent in discrete functional limit theorems and approximations. Each is based on earlier limiting results, improving upon them in two ways. First, the context is broadened from an often quite restrictive setting to that of a very general logarithmic combinatorial structure. Secondly, the limit theorems are supplied with error bounds.”

This book, whose title was already cited in the paper by R. Arratia and S. Tavaré [Adv. Math. 104, 90–154 (1994; Zbl 0802.60008)], represents the culmination of a decade-long research by the authors. It is very well-written and informative. Highly recommended.

A prototype of the so-called “logarithmic combinatorial structures” is cycles in permutations. Permutations can be viewed as a set of cycles, whose exponential generating function is of the form \(\log(1/(1-z))\) (since there are \((n-1)!\) different ways of arranging \(n\) distinct numbers in a cycle), thus the term “logarithmic”. The authors’ view of the cycle structures is through the fine, informative counting process \((C_1(n), C_2(n), \dots)\), where \(C_i(n)\) denotes the number of cycles of size \(i\) in a random permutation of \(n\) elements. The basic idea is that if we have good knowledge on such processes, then many other properties like the distribution of the total number of cycles and functional limit theorems can be derived. Two typical types of limit theorems are as follows. (i) \((C_1(n), C_2(n), \dots)\) converges in distribution to \((\text{Po}(1), \text{Po}(\frac 12), \dots)\) in \(\mathbb{Z}_+^\infty\), where \(\text{Po}(\frac1i)\) denotes a Poisson random variable with mean \(1/i\), and, (ii) for the size of the \(k\)th largest cycle \(L_k(n)\), \(n^{-1} (L_1(n), L_2(n), \dots)\) converges in distribution to \((L_1, L_2, \dots)\) in \([0,1]^\infty\), where the random vector \((L_1, L_2,\dots)\) has a Poisson-Dirichlet distribution with parameter \(1\).

These results roughly say that counts of small cycles are asymptotically independent, while those of large cycles are strongly dependent. In this book such results are (i) extended to cover a wider class of combinatorial structures, and (ii) refined by stronger types of approximation theorems with different degrees of precision on error rate.

For a general treatment of the component counts, the following two conditions are crucial: (i) the conditioning relation \[ \mathcal{L}(C_1^{(n)}, \dots,C_n^{(n)}) = \mathcal{L}(Z_1,\dots,Z_n \mid Z_1+2Z_2+\cdots + nZ_n =n), \] and (ii) the logarithmic condition \[ \mathbb{P}(Z_n=1) \sim \mathbb{E}(Z_n) \sim \theta/n \qquad (\theta>0). \] Here \(C_j^{(n)}\) denotes the number of components of size \(j\) and the \(Z_j\)’s are some discrete random variables, which are usually Poisson, binomial or negative binomial. From a combinatorial point of view, the conditioning relation corresponds roughly to decomposability of the underlying structure and the logarithmic condition reflects analytic properties needed on the generating functions.

For the types of approximation theorems, the authors study mostly the total variation distances \(d_{\text{TV}}(\mathcal{L}(C_1^{(n)}, \dots,C_b^{(n)}), \mathcal{L}(Z_1,\dots,Z_b))\) and \( d_{\text{TV}}(\mathcal{L}(C_{b+1}^{(n)}, \dots,C_n^{(n)}), \mathcal{L}(C_{b+1}^{*(n)},\dots,C_n^{*(n)}))\), where \(d_{\text{TV}}\) denotes the total variation distance and \(C_j^{*(n)}\) denotes the numbers of cycles of lengths \(j\), in a \(\theta\)-biased random permutation of \(n\) objects (whose joint distribution follows the Ewens sampling formula). Once good bounds are available for these distances, properties for the approximators can be transferred to those of the component counts. Thus two chapters (4 and 5) are devoted to derive all types of properties needed for the Ewens sampling formula, which include local and functional limit theorems and large deviations for the number of cycles, distributions of the lengths of small cycles, ordered cycle lengths, and largest cycles, and the Erdős-Turán law for log order of a random element. The diversity of the types of results implied by the authors’ strong approximation theorems shows the advantages of the authors’ consideration of the more general total variation distances.

A large amount of fine estimates are derived in order to achieve the generality of the condition and the precision for the total variation distances. The first half of the book (first six chapters) describes the background, the main types of results discussed in the book, properties needed for the Ewens sampling formula (which is used later on as a basic approximator), as well as the methods of proof used. Chapter 7 describes the main theorems of the book under general conditions whose consequences are discussed in Chapter 8; the proofs are then given in Chapters 9 to 14 (pp. 225 – 338).

The contents of each chapter are sketched as follows. Chapter 1 starts from a discussion of properties of cycles in permutations: joint distribution and moments of cycle counts (Cauchy’s formula), marginal distribution, total variation distance between the cycle counts and Poisson process, the Feller coupling, local and functional limit theorems for total number of cycles, distributions of the small and large cycles, age-ordered list of cycle lengths, and the Erdős-Turán law. Then the asymptotic analogy between cycles in permutations and prime factors in random integers are highlighted (theory in one side is generally inspiring for that in the other). Chapter 2 describes decomposable combinatorial structures, aiming mainly at assemblies (set construction for labelled structures), multisets and selections (set construction for unlabelled structures), and their variations by refining, coloring and tilting, the viewpoint being consistently probabilistic. Examples include integer partitions, set partitions, polynomials over finite fields, binary search trees, additive arithmetic semigroups, coagulation fragmentation processes, etc. Chapter 3 gives an overview of the main results in the book, beginning with the conditioning relation and logarithmic condition. A section on total variation and Wasserstein distances and another on Stein’s method are included. Chapters 4 and 5 give a detailed discussion of the Ewens sampling formula. The notion of size-biasing is first introduced. Chapter 4 then focusses on distributional properties of the random variables \(n^{-1} \sum_{b<j\leq n} jZ_j\), where \(\mathcal{L}(Z_j) =\mathcal{L}(\text{Po}(1/j))\), \(0\leq b<n\). The main reason of studying such partial sums is that the total variation distance between two random vectors are reduced to some weighted sum of two random variables (one-dimensional) of such a type. Chapter 5 then deals with the more general process approximations. Chapter 6 turns to general logarithmic combinatorial structures and follows the same pattern of treatment for Ewens sampling formula by considering first distributions of \(n^{-1}\sum_{b<j\leq n}jC_j^{(n)}\) and then the associated process approximations. It derives, in particular, asymptotics of the total variation distance between the process of component counts in assembly, multiset and selection constructions and the associated limiting process. From Chapter 7 on, the treatment is for more general (not only for assembly, multiset and selection) logarithmic structures but under stronger analytic assumptions in order to prove more effective bounds on the total variation distances. “Chapter 8 gives a number of consequences of the approximation theorems of the preceding chapter, illustrating the power inherent in discrete functional limit theorems and approximations. Each is based on earlier limiting results, improving upon them in two ways. First, the context is broadened from an often quite restrictive setting to that of a very general logarithmic combinatorial structure. Secondly, the limit theorems are supplied with error bounds.”

This book, whose title was already cited in the paper by R. Arratia and S. Tavaré [Adv. Math. 104, 90–154 (1994; Zbl 0802.60008)], represents the culmination of a decade-long research by the authors. It is very well-written and informative. Highly recommended.

Reviewer: Hsien-Kuei Hwang (Taipei)

##### MSC:

60-02 | Research exposition (monographs, survey articles) pertaining to probability theory |

05-02 | Research exposition (monographs, survey articles) pertaining to combinatorics |

05A05 | Permutations, words, matrices |

05A40 | Umbral calculus |

60D05 | Geometric probability and stochastic geometry |