A weak convergence approach to the theory of large deviations.

*(English)*Zbl 0904.60001
Wiley Series in Probability and Statistics. Chichester: John Wiley & Sons. xviii, 479 p. (1997).

This book presents a novel approach to the theory of large deviations by using ideas and results from control theory and weak convergence. The basic setup is a sequence \((X^n)\) of random variables with values in a Polish space \({\mathcal X}\). This sequence is said to satisfy a large deviation principle (LDP) with rate function \(I\) if
\[
\begin{aligned} \liminf{1\over n}\log P[X^n\in G]\geq -I(G)\quad &\text{for all open subsets \(G\) of }{\mathcal X},\\ \limsup{1\over n}\log P[X^n\in F]\leq -I(F)\quad &\text{for all closed subsets \(F\) of }{\mathcal X}.\end{aligned}
\]
The sequence \((X^n)\) satisfies a Laplace principle (LP) with rate function \(I\) if
\[
\lim_{n\to\infty} W^n:= \lim_{n\to\infty} {1\over n}\log E[e^{-nh(X^n)}]=- \inf_{x\in{\mathcal X}} (h(x)+ I(x))
\]
for all bounded continuous functions \(h\) on \({\mathcal X}\). The first key result is then that an LDP with \(I\) is equivalent to an LP with \(I\), and so one can study the asymptotics of \((W^n)\) to obtain an LDP via an LP. The second ingredient is the following classical variation formula for any bounded measurable function \(k\) and any probability measure \(\vartheta\) on \({\mathcal X}\):
\[
-\log \int_{\mathcal X} e^{-k}d\vartheta= \inf\Biggl\{H(\gamma\mid\vartheta)+ \int_{\mathcal X} kd\gamma \biggl| \gamma\text{ probability measure on }{\mathcal X}\Biggr\},
\]
where \(H(\gamma\mid\vartheta)\) denotes the relative entropy of \(\gamma\) with respect to \(\vartheta\).

The idea for the approach presented in this book is now as follows: 1) Use the variation formula to rewrite \(W^n\) and obtain an expression, say \((*)\), for \(W^n\). 2) Rewrite the resulting expression under the infimum in \((*)\) in such a way that \((*)\) looks like an equation, say \((**)\), of dynamic programming. This step is problem-dependent and explained in more detail by examples in Chapter 4. 3) Read off from \((**)\) what the corresponding control problem looks like; then \(W^n\) can be viewed as the value function of this control problem. 4) To obtain information about the asymptotic behaviour of \((W^n)\), analyze the sequence of control problems from step 3) by using weak convergence techniques.

In Chapters 2 and 3, this program is illustrated in detail by considering two familiar examples from large deviation theory: Sanov’s theorem in Chapter 2 and Mogulskij’s theorem in Chapter 3. The goal here is not to provide new results, but to explain how the approach works in order to prepare for more general situations in later chapters.

The remainder of the book is devoted to two large classes of examples that generalize the above two cases. Class I is a random walk model of the form \(Y^n_{j+1}= Y^n_j+{1\over n}v_j(Y^n_j)\), where \(v_j(x)\) for \(j= 0,1,2,\dots\) and \(x\in\mathbb{R}^d\) are i.i.d. random \(\mathbb{R}^d\)-valued functions with distribution \(\mu(\cdot\mid x)\). \(X^n\) is then the piecewise linear interpolation on \([0,1]\) of the \(Y^n_j\). If \(\mu(\cdot\mid x)\) does not depend on \(x\), this is just the model underlying Mogulskij’s theorem. Chapters 5 to 7 prove a Laplace principle for the more general situation where \(\mu(\cdot\mid x)\) can depend on \(x\). Chapter 5 contains preparatory results and Chapter 6 considers the case where \(\mu(\cdot\mid x)\) is continuous in \(x\). Chapter 7 extends this to the case where \(\mu(\cdot\mid x)\) is only continuous on each of two half-spaces of \(\mathbb{R}^d\); this provides large deviation results for a model of random motion in a discontinuous medium. Finally, Chapter 10 gives an LP for a continuous-time version of this model.

Class II of examples starts from a Markov chain \((Y_j)\) with values in some Polish space \({\mathcal S}\) and stationary transition probabilities. In this case, \(X^n\) is given by the empirical measure \(X^n:={1\over n} \sum^{n- 1}_{j= 0}\delta_{Y_j}\). For the case where the \(Y_j\) are i.i.d., this is the situation of Sanov’s theorem. Chapters 8 and 9 prove a Laplace principle for the more general Markovian situation; Chapter 8 does this under the classical (Donsker/Varadhan) assumption of a Feller transition probability function and Chapter 9 contains two extensions (weakening of the Feller property; stronger topology on the set of probability measures on \({\mathcal S}\)).

On the whole, this is an interesting and promising new approach to proving large deviation results. The book is rather technical (a total of five appendices covering almost 100 pages bear witness to this), but written and presented in a very careful manner, and comments between the results help to understand the key ideas. It will be interesting to see to which extent this methodology allows to extend the scope of existing large deviations theory.

The idea for the approach presented in this book is now as follows: 1) Use the variation formula to rewrite \(W^n\) and obtain an expression, say \((*)\), for \(W^n\). 2) Rewrite the resulting expression under the infimum in \((*)\) in such a way that \((*)\) looks like an equation, say \((**)\), of dynamic programming. This step is problem-dependent and explained in more detail by examples in Chapter 4. 3) Read off from \((**)\) what the corresponding control problem looks like; then \(W^n\) can be viewed as the value function of this control problem. 4) To obtain information about the asymptotic behaviour of \((W^n)\), analyze the sequence of control problems from step 3) by using weak convergence techniques.

In Chapters 2 and 3, this program is illustrated in detail by considering two familiar examples from large deviation theory: Sanov’s theorem in Chapter 2 and Mogulskij’s theorem in Chapter 3. The goal here is not to provide new results, but to explain how the approach works in order to prepare for more general situations in later chapters.

The remainder of the book is devoted to two large classes of examples that generalize the above two cases. Class I is a random walk model of the form \(Y^n_{j+1}= Y^n_j+{1\over n}v_j(Y^n_j)\), where \(v_j(x)\) for \(j= 0,1,2,\dots\) and \(x\in\mathbb{R}^d\) are i.i.d. random \(\mathbb{R}^d\)-valued functions with distribution \(\mu(\cdot\mid x)\). \(X^n\) is then the piecewise linear interpolation on \([0,1]\) of the \(Y^n_j\). If \(\mu(\cdot\mid x)\) does not depend on \(x\), this is just the model underlying Mogulskij’s theorem. Chapters 5 to 7 prove a Laplace principle for the more general situation where \(\mu(\cdot\mid x)\) can depend on \(x\). Chapter 5 contains preparatory results and Chapter 6 considers the case where \(\mu(\cdot\mid x)\) is continuous in \(x\). Chapter 7 extends this to the case where \(\mu(\cdot\mid x)\) is only continuous on each of two half-spaces of \(\mathbb{R}^d\); this provides large deviation results for a model of random motion in a discontinuous medium. Finally, Chapter 10 gives an LP for a continuous-time version of this model.

Class II of examples starts from a Markov chain \((Y_j)\) with values in some Polish space \({\mathcal S}\) and stationary transition probabilities. In this case, \(X^n\) is given by the empirical measure \(X^n:={1\over n} \sum^{n- 1}_{j= 0}\delta_{Y_j}\). For the case where the \(Y_j\) are i.i.d., this is the situation of Sanov’s theorem. Chapters 8 and 9 prove a Laplace principle for the more general Markovian situation; Chapter 8 does this under the classical (Donsker/Varadhan) assumption of a Feller transition probability function and Chapter 9 contains two extensions (weakening of the Feller property; stronger topology on the set of probability measures on \({\mathcal S}\)).

On the whole, this is an interesting and promising new approach to proving large deviation results. The book is rather technical (a total of five appendices covering almost 100 pages bear witness to this), but written and presented in a very careful manner, and comments between the results help to understand the key ideas. It will be interesting to see to which extent this methodology allows to extend the scope of existing large deviations theory.

Reviewer: M.Schweizer (Berlin)