×

Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning. (English) Zbl 1429.93195

Summary: This paper presents a data-based robust adaptive control methodology for a class of nonlinear constrained-input systems with completely unknown dynamics. By introducing a value function for the nominal system, the robust control problem is transformed into a constrained optimal control problem. Due to the unavailability of system dynamics, a data-based integral reinforcement learning (RL) algorithm is developed to solve the constrained optimal control problem. Based on the present algorithm, the value function and the control policy can be updated simultaneously using only system data. The convergence of the developed algorithm is proved via an established equivalence relationship. To implement the integral RL algorithm, an actor neural network (NN) and a critic NN are separately utilized to approximate the control policy and the value function, and the least squares method is employed to estimate the unknown parameters. By using Lyapunov’s direct method, the obtained approximate optimal control is verified to guarantee the unknown nonlinear system to be stable in the sense of uniform ultimate boundedness. Two examples are provided to demonstrate the effectiveness and applicability of the theoretical results.

MSC:

93C40 Adaptive control/observation systems
93B35 Sensitivity (robustness)
93C10 Nonlinear systems in control theory
49N90 Applications of optimal control and differential games
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Abu-Khalaf, M.; Lewis, F. L., Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach, Automatica, 41, 779-791 (2005) · Zbl 1087.49022
[2] Adhyaru, D.; Kar, I.; Gopal, M., Bounded robust control of nonlinear systems using neural network-based HJB solution, Neural Comput. Appl., 20, 91-103 (2011)
[3] Basar, T.; Bernhard, P., \(H_∞\) Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach (1995), Birkhauser: Birkhauser Boston · Zbl 0835.93001
[4] Bhasin, S.; Kamalapurkar, R.; Johnson, M.; Vamvoudakis, K. G.; Lewis, F. L.; Dixon, W. E., A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems, Automatica, 49, 82-92 (2013) · Zbl 1257.93055
[5] Bu, X.; Wu, X.; Wei, D.; Huang, J., Neural-approximation-based robust adaptive control of flexible air-breathing hypersonic vehicles with parametric uncertainties and control input constraints, Inf. Sci., 346-347, 29-43 (2016) · Zbl 1398.93261
[6] Chen, M.; Wu, Q.; Jiang, C.; Jiang, B., Guaranteed transient performance based control with input saturation for near space vehicles, Sci. China Inf. Sci., 57, 1-12 (2014) · Zbl 1331.93045
[7] Cui, X.; Zhang, H.; Luo, Y.; Zu, P., Online finite-horizon optimal learning algorithm for nonzero-sum games with partially unknown dynamics and constrained inputs, Neurocomputing, 185, 37-44 (2016)
[8] Esfandiari, K.; Abdollahi, F.; Talebi, H. A., Adaptive control of uncertain nonaffine nonlinear systems with input saturation using neural networks, IEEE Trans. Neural Netw. Learn. Syst., 26, 2311-2322 (2015)
[9] Fang, X.; Zheng, D.; He, H.; Ni, Z., Data-driven heuristic dynamic programming with virtual reality, Neurocomputing, 166, 244-255 (2015)
[10] Heydari, A., Revisiting approximate dynamic programming and its convergence, IEEE Trans. Cybernet., 44, 2733-2743 (2014)
[11] Heydari, A.; Balakrishnan, S., Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics, IEEE Trans. Neural Netw. Learn. Syst., 24, 145-157 (2013)
[12] Hornik, K.; Stinchcombe, M.; White, H., Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks, Neural Netw., 3, 551-560 (1990)
[13] Huang, Z.; Xu, X.; Zuo, L., Reinforcement learning with automatic basis construction based on isometric feature mapping, Inf. Sci., 286, 209-227 (2014)
[14] Jiang, Y.; Jiang, Z. P., Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics, Automatica, 48, 2699-2704 (2012) · Zbl 1271.93088
[15] Jiang, Y.; Jiang, Z. P., Robust adaptive dynamic programming and feedback stabilization of nonlinear systems, IEEE Trans. Neural Netw. Learn. Syst., 25, 882-893 (2014)
[16] Jiang, Y.; Jiang, Z. P., Global adaptive dynamic programming for continuous-time nonlinear systems, IEEE Trans. Autom. Control, 60, 2917-2929 (2015) · Zbl 1360.49017
[17] Khalil, H. K., Nonlinear Systems (2002), Prentice-Hall: Prentice-Hall New Jersey · Zbl 1003.34002
[18] Kiumarsi, B.; Lewis, F. L., Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems, IEEE Trans. Neural Netw. Learn. Syst., 26, 140-151 (2015)
[19] Kravaris, C.; Palanki, S., A lyapunov approach for robust nonlinear state feedback synthesis, IEEE Trans. Autom. Control, 33, 1188-1191 (1988) · Zbl 0655.93034
[20] Lee, J.; Park, J. B.; Choi, Y. H., Integral reinforcement learning for continuous-time input-affine nonlinear systems with simultaneous invariant explorations, IEEE Trans. Neural Netw. Learn. Syst., 26, 916-932 (2015)
[21] Lewis, F.; Jagannathan, S.; Yesildirak, A., Neural Network Control of Robot Manipulators and Nonlinear Systems (1999), Taylor & Francis: Taylor & Francis London
[22] Lewis, F. L.; Liu, D., Reinforcement Learning and Approximate Dynamic Programming for Feedback Control (2013), Wiley-IEEE Press: Wiley-IEEE Press New Jersey
[23] Lewis, F. L.; Vrabie, D.; Vamvoudakis, K. G., Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers, IEEE Control Syst., 32, 76-105 (2012) · Zbl 1395.93584
[24] Lin, F., An optimal control approach to robust control design, Int. J. Control, 73, 177-186 (2000) · Zbl 1004.93040
[25] Lin, F., Robust Control Design: An Optimal Control Approach (2007), John Wiley & Sons: John Wiley & Sons England
[26] Lin, F.; Brandt, R. D., An optimal control approach to robust control of robot manipulators, IEEE Trans. Robot. Autom., 14, 69-77 (1998)
[27] Littman, M. L., Reinforcement learning improves behaviour from evaluative feedback, Nature, 521, 445-451 (2015)
[28] Liu, D.; Li, H.; Wang, D., Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm, Neurocomputing, 110, 92-100 (2013)
[29] Liu, D.; Wang, D.; Yang, X., An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs, Inf. Sci., 220, 331-342 (2013) · Zbl 1291.49018
[30] Liu, Y. J.; Tang, L.; Tong, S.; Chen, C.; Li, D. J., Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems, IEEE Trans. Neural Netw. Learn. Syst., 26, 165-176 (2015)
[31] Luo, B.; Wu, H. N.; Huang, T., Off-policy reinforcement learning for \(h_∞\) control design, IEEE Trans. Cybernet., 45, 65-76 (2015)
[32] Luo, B.; Wu, H. N.; Huang, T.; Liu, D., Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design, Automatica, 50, 3281-3290 (2014) · Zbl 1309.93188
[33] Lyshevski, S. E., Optimal control of nonlinear continuous-time systems: Design of bounded controllers via generalized nonquadratic functionals, American Control Conference, Philadelphia, Pennsylvania, USA, 205-209 (1998)
[34] Marino, R.; Tomei, P., Nonlinear Control Design: Geometric, Adaptive and Robust (1995), Prentice-Hall: Prentice-Hall New York · Zbl 0833.93003
[35] Mehraeen, S.; Dierks, T.; Jagannathan, S.; Crow, M. L., Zero-sum two-player game theoretic formulation of affine nonlinear discrete-time systems using neural networks, IEEE Trans. Cybernet., 43, 1641-1655 (2013)
[36] Modares, H.; Lewis, F. L.; Naghibi-Sistani, M., Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks, IEEE Trans. Neural Netw. Learn. Syst., 24, 1513-1525 (2013)
[37] Modares, H.; Lewis, F. L.; Sistani, M. N., Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems, Automatica, 50, 193-202 (2014) · Zbl 1298.49042
[38] Mu, C.; Ni, Z.; Sun, C.; He, H., Data-driven tracking control with adaptive dynamic programming for a class of continuous-time nonlinear systems, IEEE Trans. Cybernet. (2016)
[39] Murray, J. J.; Cox, C. J.; Lendaris, G. G.; Saeks, R., Adaptive dynamic programming, IEEE Trans. Syst. Man. Cybernet. Part C, 32, 140-153 (2002)
[40] Na, J.; Herrmann, G., Online adaptive approximate optimal tracking control with simplified dual approximation structure for continuous time unknown nonlinear systems, IEEE/CAA J. Autom. Sinica, 1, 412-422 (2014)
[41] Nascimento, J.; Powell, W. B., An optimal approximate dynamic programming algorithm for concave, scalar storage problems with vector-valued controls, IEEE Trans. Autom. Control, 58, 2995-3010 (2013) · Zbl 1369.49035
[42] Ni, Z.; He, H.; Zhao, D.; Xu, X.; Prokhorov, D. V., GrDHP: a general utility function representation for dual heuristic dynamic programming, IEEE Trans. Neural Netw. Learn. Syst., 26, 614-627 (2015)
[43] Padhi, R.; Unnikrishnan, N.; Wang, X.; Balakrishnan, S., A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems, Neural Netw., 19, 1648-1660 (2006) · Zbl 1120.90065
[44] Petersen, I. R.; Tempo, R., Robust control of uncertain systems: Classical results and recent developments, Automatica, 50, 1315-1335 (2014) · Zbl 1296.93048
[45] Rudin, W., Principles of Mathematical Analysis (1976), McGraw-Hill Publishing Co.: McGraw-Hill Publishing Co. New York · Zbl 0148.02903
[46] Safonov, M. G., Origins of robust control: Early history and future speculations, Ann. Rev. Control, 36, 173-181 (2012)
[47] Song, R.; Lewis, F. L.; Wei, Q.; Zhang, H., Off-policy actor-critic structure for optimal control of unknown systems with disturbances, IEEE Trans. Cybernet., 46, 1041-1050 (2016)
[48] Song, R.; Lewis, F. L.; Wei, Q.; Zhang, H.; Jiang, Z. P.; Levine, D., Multiple actor-critic structures for continuous-time optimal control using input-output data, IEEE Trans. Neural Netw. Learn. Syst., 26, 851-865 (2015)
[49] Song, R.; Xiao, W.; Sun, C., A new self-learning optimal control laws for a class of discrete-time nonlinear systems based on ESN architecture, Sci. China Inf. Sci., 57, 1-10 (2014) · Zbl 1337.93107
[50] Sutton, R. S.; Barto, A. G., Reinforcement Learning: An Introduction (1998), MIT press: MIT press Cambridge, MA
[51] Vamvoudakis, K. G.; Lewis, F. L., Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem, Automatica, 46, 878-888 (2010) · Zbl 1191.49038
[52] Vamvoudakis, K. G.; Miranda, M. F.; Hespanha, J. P., Asymptotically stable adaptive optimal control algorithm with saturating actuators and relaxed persistence of excitation, IEEE Trans. Neural Netw. Learn. Syst. (2015)
[53] Wang, D.; Li, C.; Liu, D.; Mu, C., Data-based robust optimal control of continuous-time affine nonlinear systems with matched uncertainties, Inf. Sci., 366, 121-133 (2016) · Zbl 1430.49030
[54] Wang, D.; Liu, D.; Li, H.; Ma, H., Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming, Inf. Sci., 282, 167-179 (2014) · Zbl 1354.93045
[55] Wang, D.; Liu, D.; Zhang, Q.; Zhao, D., Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics, IEEE Trans. Syst. Man Cybernet. (2015)
[56] Wei, Q.; Liu, D., Numerical adaptive learning control scheme for discrete-time nonlinear systems, IET Control Theory Appl., 7, 1472-1486 (2013)
[57] Wei, Q.; Liu, D., Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification, IEEE Trans. Autom. Sci. Eng., 11, 1020-1036 (2014)
[58] Wei, Q.; Liu, D., A novel iterative \(θ\)-adaptive dynamic programming for discrete-time nonlinear systems, IEEE Trans. Autom. Sci. Eng., 11, 1176-1190 (2014)
[59] Wei, Q.; Liu, D., Data-driven neuro-optimal temperature control of water-gas shift reaction using stable iterative adaptive dynamic programming, IEEE Trans. Indus. Electron., 61, 6399-6408 (2014)
[60] Wei, Q.; Liu, D.; Lewis, F., Optimal distributed synchronization control for continuous-time heterogeneous multi-agent differential graphical games, Inf. Sci., 317, 96-113 (2015) · Zbl 1386.93023
[61] Wei, Q.; Wang, F.; Liu, D.; Yang, X., Finite-approximation-error-based discrete-time iterative adaptive dynamic programming, IEEE Trans. Cybernet., 44, 2820-2833 (2014)
[62] Werbos, P. J., Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences (1974), Harvard University, USA, Ph. d. thesis
[63] Yang, Q.; Jagannathan, S., Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators, IEEE Trans. Syst. Man Cybernet. Part B, 42, 377-390 (2012)
[64] Yang, X.; Liu, D.; Ma, H.; Xu, Y., Online approximate solution of HJI equation for unknown constrained-input nonlinear continuous-time systems, Inf. Sci., 328, 435-454 (2016) · Zbl 1391.49053
[65] Yang, X.; Liu, D.; Wang, D., Reinforcement learning for adaptive optimal control of unknown continuous-time nonlinear systems with input constraints, Int. J. Control, 87, 553-566 (2014) · Zbl 1317.93158
[66] Yang, X.; Liu, D.; Wang, D.; Wei, Q., Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning, Neural Netw., 55, 30-41 (2014) · Zbl 1308.93116
[67] Yang, X.; Liu, D.; Wei, Q., Online approximate optimal control for affine nonlinear systems with unknown internal dynamics using adaptive dynamic programming, IET Control Theory Appl., 8, 1676-1688 (2014)
[68] Yang, X.; Liu, D.; Wei, Q.; Wang, D., Guaranteed cost neural tracking control for a class of uncertain nonlinear systems using adaptive dynamic programming, Neurocomputing, 198, 80-90 (2016)
[69] Zhang, H.; Cui, L.; Zhang, X.; Luo, Y., Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method, IEEE Trans. Neural Netw., 22, 2226-2236 (2011)
[70] Zhang, H.; Liu, D.; Luo, Y.; Wang, D., Adaptive Dynamic Programming for Control: Algorithms and Stability (2013), Springer: Springer London · Zbl 1279.49017
[71] Zhang, H.; Qin, C.; Luo, Y., Neural-network-based constrained optimal control scheme for discrete-time switched nonlinear system using dual heuristic programming, IEEE Trans. Autom. Sci. Eng., 11, 839-849 (2014)
[72] Zhang, H.; Wei, Q.; Liu, D., An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games, Automatica, 47, 207-214 (2011) · Zbl 1231.91028
[73] Zhao, D.; Zhang, Q.; Wang, D.; Zhu, Y., Experience replay for optimal control of nonzero-sum game systems with unknown dynamics, IEEE Trans. Cybernet., 46, 854-865 (2016)
[74] Zhao, Q.; Xu, H.; Jagannathan, S., Near optimal output feedback control of nonlinear discrete-time systems based on reinforcement neural network learning, IEEE/CAA J. Autom. Sinica, 1, 372-384 (2014)
[75] Zhong, X.; He, H., An event-triggered ADP control approach for continuous-time system with unknown internal states, IEEE Trans. Cybernet. (2016)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.