zbMATH — the first resource for mathematics

Markov control process with the expected total cost criterion: Optimality, stability, and transient models. (English) Zbl 0964.93086
The authors study discrete-time Markov Control Processes (MCPs) on Borel spaces under the Expected Total Cost (ETC) criterion \[ V(\pi, x)= E^\pi_x\Biggl[ \sum^\infty_{t= 0} c(x_ t,a_t)\Biggr], \] where \(c(x_t, a_t)\) is the cost-per-stage function and is possibly unbounded [for the basic concepts and notations of the MCPs, cf. O. Hernández-Lerma and J. B. Lasserre, Discrete-time Markov control processes: Basic optimality criteria, Springer-Verlag, New York (1995; Zbl 0840.93001)]. A lot of optimality questions are answered affirmatively here. Conditions for a control policy to be ETC-optimal and conditions for the ETC-value function to be a solution to the dynamic programming equation are well provided. It is also shown that the finiteness of the ETC function may lead to two kinds of stability: Lagrange stability and stability with probability one. In addition, transient control models [cf. S. R. Pliska, Dynamic programming and its applications, Proc. Int. Conf., Vancouver 1977, 335-349 (1978; Zbl 0458.90082)] are fully analyzed. In fact, with the authors’ new results, the paper provides a fairly complete, up-dated, survey-like presentation of the ETC criterion for MCPs.

93E20 Optimal stochastic control
90C40 Markov and semi-Markov decision processes
Full Text: DOI