×

Continuous-time robust dynamic programming. (English) Zbl 1429.93415

Summary: This paper presents a new theory, known as robust dynamic programming, for a class of continuous-time dynamical systems. Different from traditional dynamic programming (DP) methods, this new theory serves as a fundamental tool to analyze the robustness of DP algorithms, and, in particular, to develop novel adaptive optimal control and reinforcement learning methods. In order to demonstrate the potential of this new framework, two illustrative applications in the fields of stochastic and decentralized optimal control are presented. Two numerical examples arising from both finance and engineering industries are also given, along with several possible extensions of the proposed framework.

MSC:

93E20 Optimal stochastic control
93C40 Adaptive control/observation systems
93B35 Sensitivity (robustness)
90C39 Dynamic programming
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] J. Abounadi, D. P. Bertsekas, and V. Borkar, Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms, SIAM J. Control Optim., 41 (2002), pp. 1-22, https://doi.org/10.1137/S0363012998346621. · Zbl 1063.62567
[2] A. Arapostathis, V. S. Borkar, and M. K. Ghosh, Ergodic Control of Diffusion Processes, Cambridge University Press, New York, 2012. · Zbl 1236.93137
[3] R. W. Beard, Improving the Closed-Loop Performance of Nonlinear Systems, Ph.D. thesis, Rensselaer Polytechnic Institute, 1995.
[4] R. Bellman, On the theory of dynamic programming, Proc. Nat. Acad. Sci. U.S.A., 38 (1952), pp. 716-719. · Zbl 0047.13802
[5] R. E. Bellman, Dynamic Programming, Princeton University Press, Princeton, NJ, 1957. · Zbl 0077.13605
[6] D. P. Bertsekas, Dynamic Programming and Optimal Control, Vol. 1, 3rd ed., Athena Scientific, Belmont, MA, 2005. · Zbl 1125.90056
[7] D. P. Bertsekas, Dynamic Programming and Optimal Control, Vol. 2, 3rd ed., Athena Scientific, Belmont, MA, 2007. · Zbl 1209.90343
[8] D. P. Bertsekas, Abstract Dynamic Programming, Athena Scientific, Belmont, MA, 2013. · Zbl 1312.90086
[9] D. P. Bertsekas, Value and policy iterations in optimal control and adaptive dynamic programming, IEEE Trans. Neural Netw. Learn. Syst., 28 (2017), pp. 500-509.
[10] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, MA, 1996. · Zbl 0924.68163
[11] T. Bian, Y. Jiang, and Z.-P. Jiang, Adaptive dynamic programming and optimal control of nonlinear nonaffine systems, Automatica J. IFAC, 50 (2014), pp. 2624-2632. · Zbl 1301.49081
[12] T. Bian and Z.-P. Jiang, Stochastic adaptive dynamic programming for robust optimal control design, in Control of Complex Systems: Theory and Applications, K. G. Vamvoudakis and S. Jagannathan, eds., Butterworth-Heinemann, Cambridge, MA, 2016, pp. 211-245.
[13] T. Bian and Z.-P. Jiang, Value iteration, adaptive dynamic programming, and optimal control of nonlinear systems, in Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, 2016, pp. 3375-3380.
[14] T. Bian and Z.-P. Jiang, Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design, Automatica J. IFAC, 71 (2016), pp. 348-360. · Zbl 1343.93095
[15] V. S. Borkar, Ergodic control of diffusion processes, in Proceedings of the International Congress of Mathematicians, Vol. III, 2006, pp. 1299-1309. · Zbl 1108.93076
[16] J. W. Brewer, Kronecker products and matrix calculus in system theory, IEEE Trans. Circuits Syst., 25 (1978), pp. 772-781. · Zbl 0397.93009
[17] D. L. Burkholder, B. J. Davis, and R. F. Gundy, Integral inequalities for convex functions of operators on martingales, in Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 2, Berkeley, CA, 1972, pp. 223-240. · Zbl 0253.60056
[18] H.-F. Chen and Y. Zhu, Stochastic approximation procedures with randomly varying truncations, Sci. Sinica Ser. A, 29 (1986), pp. 914-926. · Zbl 0613.62107
[19] M. Chilali, P. Gahinet, and P. Apkarian, Robust pole placement in LMI regions, IEEE Trans. Automat. Control, 44 (1999), pp. 2257-2270. · Zbl 1136.93352
[20] R. A. Howard, Dynamic Programming and Markov Processes, The MIT Press, Cambridge, MA, 1960. · Zbl 0091.16001
[21] A. Isidori, Nonlinear Control Systems II, Springer, London, 1999. · Zbl 0931.93005
[22] G. N. Iyengar, Robust dynamic programming, Math. Oper. Res., 30 (2005), pp. 257-280. · Zbl 1082.90123
[23] Y. Jiang and Z.-P. Jiang, Robust adaptive dynamic programming and feedback stabilization of nonlinear systems, IEEE Trans. Neural Netw. Learn. Syst., 25 (2014), pp. 882-893.
[24] Y. Jiang and Z.-P. Jiang, Robust Adaptive Dynamic Programming, Wiley-IEEE Press, Hoboken, NJ, 2017. · Zbl 1406.90003
[25] Z.-P. Jiang and Y. Jiang, Robust adaptive dynamic programming for linear and nonlinear systems: An overview, Eur. J. Control, 19 (2013), pp. 417-425. · Zbl 1293.49053
[26] Z.-P. Jiang and T. Liu, Small-gain theory for stability and control of dynamical networks: A survey, Annu. Rev. Control, 46 (2018), pp. 58-79.
[27] Z.-P. Jiang, A. R. Teel, and L. Praly, Small-gain theorem for ISS systems and applications, Math. Control Signals Systems, 7 (1994), pp. 95-120. · Zbl 0836.93054
[28] H. K. Khalil, Nonlinear Systems, 3rd ed., Prentice Hall, Upper Saddle River, NJ, 2002. · Zbl 1003.34002
[29] R. Khas’minskii, Stochastic Stability of Differential Equations, Springer, Berlin, Heidelberg, 2012. · Zbl 1241.60002
[30] D. L. Kleinman, On an iterative technique for Riccati equation computations, IEEE Trans. Automat. Control, 13 (1968), pp. 114-115.
[31] M. Krstić, I. Kanellakopoulos, and P. V. Kokotović, Nonlinear and Adaptive Control Design, John Wiley & Sons, New York, 1995. · Zbl 0763.93043
[32] V. Kučera, A review of the matrix Riccati equation, Kybernetika (Prague), 9 (1973), pp. 42-61. · Zbl 0279.49015
[33] H. J. Kushner and G. G. Yin, Stochastic Approximation and Recursive Algorithms and Applications, Springer, New York, 2003. · Zbl 1026.62084
[34] H. Kwakernaak and R. Sivan, The maximally achievable accuracy of linear optimal regulators and linear optimal filters, IEEE Trans. Automat. Control, 17 (1972), pp. 79-86. · Zbl 0259.93057
[35] F. L. Lewis and D. Liu, Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, John Wiley & Sons, Piscataway, NJ, 2013.
[36] D. Liberzon, Calculus of Variations and Optimal Control Theory: A Concise Introduction, Princeton University Press, Princeton, NJ, 2012. · Zbl 1239.49001
[37] S. H. Lim, H. Xu, and S. Mannor, Reinforcement learning in robust Markov decision processes, in Advances in Neural Information Processing Systems 26, C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, eds., Curran Associates, 2013, pp. 701-709. · Zbl 1348.68197
[38] T. Liu, Z.-P. Jiang, and D. J. Hill, Nonlinear Control of Dynamic Networks, CRC Press, Boca Raton, FL, 2014. · Zbl 1354.93002
[39] A. Nilim and L. El Ghaoui, Robust control of Markov decision processes with uncertain transition matrices, Oper. Res., 53 (2005), pp. 780-798. · Zbl 1165.90674
[40] H. Pham, Continuous-time Stochastic Control and Optimization with Financial Applications, Springer-Verlag, Berlin, Heidelberg, 2009. · Zbl 1165.93039
[41] L. Praly and Y. Wang, Stabilization in spite of matched unmodeled dynamics and an equivalent definition of input-to-state stability, Math. Control Signals Systems, 9 (1996), pp. 1-33. · Zbl 0869.93040
[42] M. Reed and B. Simon, Methods of Modern Mathematical Physics: Functional Analysis, Academic Press, San Diego, 1980. · Zbl 0459.46001
[43] S. Sastry, Nonlinear Systems: Analysis, Stability, and Control, Springer-Verlag, New York, 1999. · Zbl 0924.93001
[44] A. V. Skorokhod, Limit theorems for stochastic processes, Theory Probab. Appl., 1 (1956), pp. 261-290.
[45] E. D. Sontag, Mathematical Control Theory: Deterministic Finite Dimensional Systems, 2nd ed., Springer, New York, 1998. · Zbl 0945.93001
[46] E. D. Sontag, Input to state stability: Basic concepts and results, in Nonlinear and Optimal Control Theory: Lectures given at the C.I.M.E. Summer School held in Cetraro, Italy, June 19-29, 2004, P. Nistri and G. Stefani, eds., Springer-Verlag, Berlin, Heidelberg, 2008, pp. 163-220. · Zbl 1175.93001
[47] A. W. Starr and Y. C. Ho, Nonzero-sum differential games, J. Optim. Theory Appl., 3 (1969), pp. 184-206. · Zbl 0169.12301
[48] A. W. Starr and Y. C. Ho, Further properties of nonzero-sum differential games, J. Optim. Theory Appl., 3 (1969), pp. 207-219. · Zbl 0169.12303
[49] J. M. Steele, Stochastic Calculus and Financial Applications, Springer, New York, 2001. · Zbl 0962.60001
[50] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed., The MIT Press, Cambridge, MA, 2018. · Zbl 1407.68009
[51] R. S. Sutton, A. G. Barto, and R. J. Williams, Reinforcement learning is direct adaptive optimal control, IEEE Control Syst. Mag., 12 (1992), pp. 19-22.
[52] G. Tao, Adaptive Control Design and Analysis, John Wiley & Sons, Hoboken, NJ, 2003. · Zbl 1061.93004
[53] E. Todorov, Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system, Neural Comput., 17 (2005), pp. 1084-1108. · Zbl 1108.93082
[54] J. N. Tsitsiklis, Asynchronous stochastic approximation and Q-learning, Machine Learning, 16 (1994), pp. 185-202. · Zbl 0820.68105
[55] J. N. Tsitsiklis and B. Van Roy, An analysis of temporal-difference learning with function approximation, IEEE Trans. Automat. Control, 42 (1997), pp. 674-690. · Zbl 0914.93075
[56] A. van der Schaft, \(L_2\)-Gain and Passivity Techniques in Nonlinear Control, 3rd ed., Springer International, Cham, 2017. · Zbl 1410.93004
[57] D. Wang, H. He, and D. Liu, Adaptive critic nonlinear robust control: A survey, IEEE Trans. Cybernet., 47 (2017), pp. 3429-3451.
[58] D. Wang and D. Liu, Neural robust stabilization via event-triggering mechanism and adaptive learning technique, Neural Networks, 102 (2018), pp. 27-35. · Zbl 1441.93214
[59] J. L. Willems, Least squares stationary optimal control and the algebraic Riccati equation, IEEE Trans. Automat. Control, 16 (1971), pp. 621-634.
[60] G. Zames, On the input-output stability of time-varying nonlinear feedback systems part one: Conditions derived using concepts of loop gain, conicity, and positivity, IEEE Trans. Automat. Control, 11 (1966), pp. 228-238.
[61] X. Y. Zhou and D. Li, Continuous-time mean-variance portfolio selection: A stochastic LQ framework, Appl. Math. Optim., 42 (2000), pp. 19-33. · Zbl 0998.91023
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.