Markov decision processes with applications to finance.

*(English)*Zbl 1236.90004
Universitext. Berlin: Springer (ISBN 978-3-642-18323-2/pbk; 978-3-642-18324-9/ebook). xvi, 388 p. (2011).

This book consists of four parts. In Part I, entitled “Finite horizon optimization problems and financial markets”, the authors introduce a Markov decision process (MDP) in discrete time on a general state space with possibly unbounded reward functions \(r_n\). These functions may depend on the number of stages. Such an approach also contains the discounted case, that is, \(r_n= \beta^n\cdot r\), where \(\beta\in (0,1)\) is a discount factor.

For the expected total reward criterion they show the existence of a Markov optimal policy, present a backward induction algorithm and prove the principle of dynamic programming. Chapter 3 is devoted to an introduction to financial markets and contains the description of such prominent models as the binomial model and the Black-Scholes-Newton model. The last chapter in this part is dedicated to the application of the theory of Markov decision processes to selected financial optimization problems. In particular, the authors consider various utility functions to certain specific models and derive optimal policies and value functions. The interesting and alternative approach to the aforementioned one is an idea of measuring risk either by a variance or by an average-value-at-risk criterion. Such a procedure is desirable, because it takes into account not only the expected return but also the risk of the decision maker in a multistage decision problem and is a trade-off between these two performance criteria.

The consecutive part is concerned with partially observable Markov decision processes (POMDP). Two classes of these processes are of special interest: the hidden Markov models and adaptive Markov processes. The next chapter applies the theory of POMDP to two models from finance, to the terminal wealth problem and the dynamic mean-variance problem.

Part III deals with infinite time horizon optimization models. The authors start with a description of three classes of processes/models: positive, negative and discounted. In each case, they provide conditions under which the Bellman equation has a solution, and the decision maker possesses an optimal policy. Further, the authors focus on piecewise deterministic Markov decision processes, which evolve through random jumps at random time points, whilst the behaviour between jumps is governed by an ordinary differential equation. The control problem can be tracted either via the Hamilton-Jacobi-Bellman equation or by the methods used for discrete-time MDPs. Indeed, the latter approach is applicable, since evolution between jumps is deterministic and such a model can be reduced to a discrete-time MDP.

This chapter is followed by the application of infinite time horizon models to finance and insurance optimization problems.

Part IV is devoted to the theory of optimal stopping problems in finite and infinite time horizons and its use in financial models.

The book ends with an appendix that contains some facts from analysis, probability and mathematical finance.

This book presents Markov decision processes with general state and action spaces and includes various state-of-the-art applications that stem from finance and operations research.

I find it very helpful, not only for graduate students, but also for researchers working in the field of MDPs and finance. The authors do not focus only on discrete-time MDPs, but provide the description of different classes of Markov models such as: continuous-time MDPs, piecewise deterministic MDPs, partially observable MDPs or optimal stopping problems. Each chapter ends with remarks, where the potential reader may find further hints concerning references.

For the expected total reward criterion they show the existence of a Markov optimal policy, present a backward induction algorithm and prove the principle of dynamic programming. Chapter 3 is devoted to an introduction to financial markets and contains the description of such prominent models as the binomial model and the Black-Scholes-Newton model. The last chapter in this part is dedicated to the application of the theory of Markov decision processes to selected financial optimization problems. In particular, the authors consider various utility functions to certain specific models and derive optimal policies and value functions. The interesting and alternative approach to the aforementioned one is an idea of measuring risk either by a variance or by an average-value-at-risk criterion. Such a procedure is desirable, because it takes into account not only the expected return but also the risk of the decision maker in a multistage decision problem and is a trade-off between these two performance criteria.

The consecutive part is concerned with partially observable Markov decision processes (POMDP). Two classes of these processes are of special interest: the hidden Markov models and adaptive Markov processes. The next chapter applies the theory of POMDP to two models from finance, to the terminal wealth problem and the dynamic mean-variance problem.

Part III deals with infinite time horizon optimization models. The authors start with a description of three classes of processes/models: positive, negative and discounted. In each case, they provide conditions under which the Bellman equation has a solution, and the decision maker possesses an optimal policy. Further, the authors focus on piecewise deterministic Markov decision processes, which evolve through random jumps at random time points, whilst the behaviour between jumps is governed by an ordinary differential equation. The control problem can be tracted either via the Hamilton-Jacobi-Bellman equation or by the methods used for discrete-time MDPs. Indeed, the latter approach is applicable, since evolution between jumps is deterministic and such a model can be reduced to a discrete-time MDP.

This chapter is followed by the application of infinite time horizon models to finance and insurance optimization problems.

Part IV is devoted to the theory of optimal stopping problems in finite and infinite time horizons and its use in financial models.

The book ends with an appendix that contains some facts from analysis, probability and mathematical finance.

This book presents Markov decision processes with general state and action spaces and includes various state-of-the-art applications that stem from finance and operations research.

I find it very helpful, not only for graduate students, but also for researchers working in the field of MDPs and finance. The authors do not focus only on discrete-time MDPs, but provide the description of different classes of Markov models such as: continuous-time MDPs, piecewise deterministic MDPs, partially observable MDPs or optimal stopping problems. Each chapter ends with remarks, where the potential reader may find further hints concerning references.

Reviewer: Anna Jaskiewicz (Wrocław)

##### MSC:

90-02 | Research exposition (monographs, survey articles) pertaining to operations research and mathematical programming |

91-02 | Research exposition (monographs, survey articles) pertaining to game theory, economics, and finance |

90C40 | Markov and semi-Markov decision processes |

91G80 | Financial applications of other theories |

91G70 | Statistical methods; risk measures |