##
**Stochastic optimal control of unknown linear networked control system in the presence of random delays and packet losses.**
*(English)*
Zbl 1244.93177

Summary: The stochastic optimal control of linear Networked Control System (NCS) with uncertain system dynamics and in the presence of network imperfections such as random delays and packet losses is derived. The proposed stochastic optimal control method uses an Adaptive Estimator (AE) and ideas from Q-learning to solve the infinite horizon optimal regulation of unknown NCS with time-varying system matrices. Next, a stochastic suboptimal control scheme which uses AE and Q-learning is introduced for the regulation of unknown linear time-invariant NCS that is derived using certainty equivalence property. Update laws for online tuning the unknown parameters of the AE to obtain the Q-function are derived. Lyapunov theory is used to show that all signals are Asymptotically Stable (AS) and that the estimated control signals converge to optimal or suboptimal control inputs. Simulation results are included to show the effectiveness of the proposed schemes. The result is an optimal control scheme that operates forward-in-time manner for unknown linear systems in contrast with standard Riccati equation-based schemes which function backward-in-time.

### MSC:

93E20 | Optimal stochastic control |

93C05 | Linear systems in control theory |

93E10 | Estimation and detection in stochastic control theory |

93D20 | Asymptotic stability in control theory |

### Software:

Approxrl
Full Text:
DOI

### References:

[1] | Al-Tamimi, A.; Lewis, F. L.; Abu-Khalaf, M., Model-free \(Q\)-learning designs for linear discrete-time zero-sum games with application to \(H\)-infinity control, Automatica, 43, 473-481 (2007) · Zbl 1137.93321 |

[2] | Antsaklis, P.; Baillieul, J., Special issue on networked control systems, IEEE Transactions on Automatic Control, 49, 1421-1423 (2004) · Zbl 1365.93005 |

[3] | Åstrom, K. J., Introduction to stochastic control theory (1970), Academic Press: Academic Press New York · Zbl 0226.93027 |

[4] | Azimi-Sadjadi, B. (2003) Stability of networked control systems in the presence of packet losses. In Proceedings of conference on decision and control (pp. 676-681).; Azimi-Sadjadi, B. (2003) Stability of networked control systems in the presence of packet losses. In Proceedings of conference on decision and control (pp. 676-681). |

[5] | Barto, A. G.; Sutton, R. S.; Anderson, C. W., Neuron like elements that can solve difficult learning control problems, IEEE Transactions on Systems, Man and Cybernetics, 13, 835-846 (1983) |

[6] | Bertsekas, D. P.; Shreve, S. E., Stochastic optimal control: the discrete time case (1978), Academic Press: Academic Press New York · Zbl 0471.93002 |

[7] | Branicky, M.S., Phillips, S.M., & Zhang, W. (2000) Stability of networked control systems: explicit analysis of delay. In Proceedings of american control conference, (pp. 2352-2357).; Branicky, M.S., Phillips, S.M., & Zhang, W. (2000) Stability of networked control systems: explicit analysis of delay. In Proceedings of american control conference, (pp. 2352-2357). |

[8] | Busoniu, L.; Babuska, R.; Schutter, B. D., Reinforcement learning and dynamic programming using function approximators (2010), CRC Press: CRC Press New York |

[9] | Carnevale, D.; Teel, A. R.; Nesic, D., A Lyapnov proof of improved maximum allowable transfer interval for networked control systems, IEEE Transactions on Automatic Control, 52, 892-897 (2007) · Zbl 1366.93431 |

[10] | Cloosterman, M. B.G; van de Wouw, N.; Heemels, W. P.M. H.; Nijmeijer, H., Stability of networked control systems with uncertain time-varying delays, IEEE Transactions on Automatic Control, 54, 1575-1580 (2009) · Zbl 1367.93459 |

[11] | Dierks, T., & Jagannathan, S. (2009) Optimal control of affine nonlinear discrete-time systems with unknown internal dynamics. In Proceedings of conference on decision and control, (pp. 6750-6755).; Dierks, T., & Jagannathan, S. (2009) Optimal control of affine nonlinear discrete-time systems with unknown internal dynamics. In Proceedings of conference on decision and control, (pp. 6750-6755). |

[12] | Dierks, T., Thumati, T.B., & Jagannathan, S. (2009) Adaptive dynamics programming-based optimal control of unknown affine nonlinear discrete-time systems In Proc. Intern. Joint Conf. Neur. Netwo. Vol. 711-716.; Dierks, T., Thumati, T.B., & Jagannathan, S. (2009) Adaptive dynamics programming-based optimal control of unknown affine nonlinear discrete-time systems In Proc. Intern. Joint Conf. Neur. Netwo. Vol. 711-716. · Zbl 1338.49074 |

[13] | Franklin, G. F.; Powell, J. D.; Emani-Naeini, A., Feedback control of dynamic systems (1994), Addison-Wesley: Addison-Wesley Reading, Massachusetts |

[14] | Goldsmith, A., Wireless communications (2003), Cambridge University Press: Cambridge University Press Cambridge, UK · Zbl 1077.94002 |

[15] | Green, M.; Moore, J. B., Persistency of excitation in linear systems, Systems and Control Letters, 7, 351-360 (1986) · Zbl 0607.93062 |

[16] | Guo, L., Estimating time-varying parameters by the Kalman filter based algorithm: stability and convergence, IEEE Transactions on Automatic Control, 35, 141-147 (1990) · Zbl 0704.93067 |

[17] | Halevi, Y.; Ray, A., Integrated communication and control systems: part I—analysis, Journal of Dynamic Systems Measurement and Control, 110, 367-373 (1988) |

[18] | Heemels, W. P.M. H.; Teel, A. R.; van de Wouw, N.; Nesic, D., Networked control systems with communication constraints: tradeoff between transmission intervals, delays and performance, IEEE Transactions on Automatic Control, 55, 1781-1796 (2010) · Zbl 1368.93627 |

[19] | Hesphanha, J.P., Naghshtabrizi, P., & Xu, Y. (2007) A survey of recent results in networked control system. In Proceedings of IEEE. 95. (pp. 138-162).; Hesphanha, J.P., Naghshtabrizi, P., & Xu, Y. (2007) A survey of recent results in networked control system. In Proceedings of IEEE. 95. (pp. 138-162). |

[20] | Hu, S. S.; Zhu, Q. X., Stochastic optimal control and analysis of stability of networked control systems with long delay, Automatica, 39, 1877-1884 (2003) · Zbl 1175.93240 |

[21] | Jagannathan, S., Neural network control of nonlinear discrete-time systems (2006), CRC Press · Zbl 1123.93010 |

[22] | Lewis, F. L.; Syrmos, V. L., Optimal control (1995), Wiley: Wiley New York |

[23] | Lian, F.; Moyne, J.; Tilbury, D., Modeling and optimal controller design of networked control systems with multiple delays, International Journal of Control, 76, 591-606 (2003) · Zbl 1050.93038 |

[24] | Liou, L. W.; Ray, A., A stochastic regulator for integrated communication and control systems: part I—formulation of control law, ASME Journal of Dynamic Systems Measurement and Control, 4, 604-611 (1991) · Zbl 0752.93075 |

[25] | Maybeck, P. S., Stochastic models, estimation and control (1982), Academic press: Academic press New York · Zbl 0546.93063 |

[26] | Middleton, R. H.; Goodwin, G. C., Adaptive control of time-varying linear systems, IEEE Transactions on Automatic Control, 33, 150-155 (1988) · Zbl 0637.93041 |

[27] | Nilsson, J.; Bernhardsson, B.; Wittenmark, B., Stochastic analysis and control of real-time systems with random time delays, Automatica, 34, 57-64 (1998) · Zbl 0908.93073 |

[28] | Papoulis, A., Probability, random variables, and stochastic processes (1991), McGraw-Hill: McGraw-Hill New York · Zbl 0191.46704 |

[29] | Sastry, S.; Bodson, M., Adaptive control: stability, convergence, and robustness (1989), Prentice Hall: Prentice Hall Englewood Cliffs, NJ · Zbl 0721.93046 |

[30] | Schenato, L., Sinopoli, B., Franceschetti, M., Poolla, K., & Sastry, S. (2007). Foundations of control and estimation over lossy networks. In Proceedings of IEEE. 95. (pp. 163-187).; Schenato, L., Sinopoli, B., Franceschetti, M., Poolla, K., & Sastry, S. (2007). Foundations of control and estimation over lossy networks. In Proceedings of IEEE. 95. (pp. 163-187). |

[31] | Stengel, R. F., Stochastic optimal control, theory and application (1986), John Wiley and Sons: John Wiley and Sons New York · Zbl 0666.93126 |

[32] | Walsh, G.C., Ye, H., & Bushnell, L. (1999) Stability analysis of networked control systems. In Proceedings of american control conference (pp. 2876-2880).; Walsh, G.C., Ye, H., & Bushnell, L. (1999) Stability analysis of networked control systems. In Proceedings of american control conference (pp. 2876-2880). |

[33] | Watkins, C. (1989). Learning from delayed rewards, Ph.D. Thesis, Cambridge, England: Cambridge University.; Watkins, C. (1989). Learning from delayed rewards, Ph.D. Thesis, Cambridge, England: Cambridge University. |

[34] | Werbos, P. J., A menu of designs for reinforcement learning over time (1991), MIT Press: MIT Press MA |

[35] | Werbos, P. J., Approximate dynamic programming for real-time control and neural modeling, (Handbook of intelligent control (1992), Van Nostrand Reinhold: Van Nostrand Reinhold New York) |

[36] | Wonham, W. M., On a matrix Riccati equation of stochastic control, SIAM Journal of Control, 6, 681-697 (1968) · Zbl 0182.20803 |

[37] | Wu, J.; Chen, T., Design of networked control systems with packet dropouts, IEEE Transactions on Automatic Control, 52, 1314-1319 (2007) · Zbl 1366.93215 |

[38] | Zhang, W.; Branicky, M. S.; Phillips, S., Stability of networked control systems, IEEE Control Systems Magazine, 21, 84-99 (2001) |

[39] | Zhang, H.; Luo, Y.; Liu, D., Neural network based near optimal control for a class of discrete-time affine nonlinear system with control constraints, IEEE Transactions on Neural Network, 20, 1490-1503 (2009) |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.