×

zbMATH — the first resource for mathematics

Time-varying Markov decision processes with state-action-dependent discount factors and unbounded costs. (English) Zbl 1449.93262
Summary: In this paper we are concerned with a class of time-varying discounted Markov decision models \(\mathcal{M}_n\) with unbounded costs \(c_n\) and state-action dependent discount factors. Specifically we study controlled systems whose state process evolves according to the equation \(x_{n+1}=G_n(x_n,a_n,\xi_n)\), \(n=0,1,\ldots\), with state-action dependent discount factors of the form \(\alpha_n(x_n,a_n)\), where \(a_n\) and \(\xi_n\) are the control and the random disturbance at time \(n\), respectively. Assuming that the sequences of functions \(\lbrace\alpha_n\rbrace\),\(\lbrace c_n\rbrace\) and \(\lbrace G_n\rbrace\) converge, in certain sense, to \(\alpha_\infty\), \(c_\infty\) and \(G_\infty\), our objective is to introduce a suitable control model for this class of systems and then, to show the existence of optimal policies for the limit system \(\mathcal{M}_\infty\) corresponding to \(\alpha_\infty\), \(c_\infty\) and \(G_\infty\). Finally, we illustrate our results and their applicability in a class of semi-Markov control models.
MSC:
93E20 Optimal stochastic control
90C40 Markov and semi-Markov decision processes
PDF BibTeX XML Cite
Full Text: DOI Link
References:
[1] Bastin, G.; Dochain, D., On-line Estimation and Adaptive Control of Bioreactors., Elsevier, Amsterdam 2014
[2] Bertsekas, D. P., Approximate policy iteration: a survey and some new methods., J. Control Theory Appl. 9 (2011), 310-335
[3] Dynkin, E. B.; Yushkevich, A. A., Controlled Markov Processes., Springer-Verlag, New York 1979
[4] González-Hernández, J.; López-Martínez, R. R.; Minjárez-Sosa, J. A., Approximation, estimation and control of stochastic systems under a randomized discounted cost criterion., Kybernetika 45 (2009), 737-754
[5] Gordienko, E. I.; Minjárez-Sosa, J. A., Adaptive control for discrete-time Markov processes with unbounded costs: discounted criterion., Kybernetika 34 (1998), 217-234
[6] Hernández-Lerma, O.; Lasseerre, J. B., Discrete-Time Markov Control Processes: Basic Optimality Criteria., Springer, New York 1996
[7] Hernández-Lerma, O.; Lasserre, J. B., Further Topics on Discrete-time Markov Control Processes., Springer-Verlag, New York 1999
[8] Hernández-Lerma, O.; Hilgert, N., Limiting optimal discounted-cost control of a class of time-varying stochastic systems., Syst. Control Lett. 40 (2000), 1, 37-42
[9] Hilgert, N.; Minjárez-Sosa, J. A., Adaptive policies for time-varying stochastic systems under discounted criterion., Math. Meth. Oper. Res. 54 (2001), 3, 491-505
[10] Hilgert, N.; Minjárez-Sosa, J. A., Adaptive control of stochastic systems with unknown disturbance distribution: discounted criteria., Math. Meth. Oper. Res. 63 (2006), 443-460
[11] Hilgert, N.; Senoussi, R.; Vila, J. P., Nonparametric estimation of time-varying autoregressive nonlinear processes., C. R. Acad. Sci. Paris Série 1 1996), 232, 1085-1090
[12] Lewis, M. E.; Paul, A., Uniform turnpike theorems for finite Markov decision processes., Math. Oper. Res
[13] Luque-Vásquez, F.; Minjárez-Sosa, J. A., Semi-Markov control processes with unknown holding times distribution under a discounted criterion., Math. Meth. Oper. Res. 61 (2005), 455-468
[14] Luque-Vásquez, F.; Minjárez-Sosa, J. A.; Rosas-Rosas, L. C., Semi-Markov control processes with partially known holding times distribution: Discounted and average criteria., Acta Appl. Math. 114 (2011), 3, 135-156
[15] Luque-Vásquez, F.; Minjárez-Sosa, J. A.; Rosas-Rosas, L. C., Semi-Markov control processes with unknown holding times distribution under an average criterion cost., Appl. Math. Optim. Theory Appl. 61 (2010), 3, 317-336
[16] Minjárez-Sosa, J. A., Markov control models with unknown random state-action-dependent discount factors., TOP 23 (2015), 743-772
[17] Minjárez-Sosa, J. A., Approximation and estimation in Markov control processes under discounted criterion., Kybernetika 40 (2004), 6, 681-690
[18] Powell, W. B., Approximate Dynamic Programming. Solving the Curse of Dimensionality, John Wiley and Sons Inc, 2007
[19] Puterman, M. L., Markov Decision Processes. Discrete Stochastic Dynamic Programming., John Wiley and Sons 1994
[20] Rieder, U., Measurable selection theorems for optimization problems., Manuscripta Math. 24 (1978), 115-131
[21] Robles-Alcaráz, M. T.; Vega-Amaya, O.; Minjárez-Sosa, J. A., Estimate and approximate policy iteration algorithm for discounted Markov decision models with bounded costs and Borel spaces., Risk Decision Analysis 6 (2017), 2, 79-95
[22] Royden, H. L., Real Analysis., Prentice Hall 1968
[23] Schäl, M., Conditions for optimality and for the limit on n-stage optimal policies to be optimal., Z. Wahrs. Verw. Gerb. 32 (1975), 179-196
[24] Shapiro, J. F., Turnpike planning horizon for a markovian decision model., Magnament Sci. 14 (1968), 292-300
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.