×

Constraint learning for control tasks with limited duration barrier functions. (English) Zbl 1461.93144

Summary: When deploying autonomous agents in unstructured environments over sustained periods of time, adaptability and robustness often times outweigh optimality as a primary consideration. In other words, safety and survivability constraints play a key role and in this paper, we present a novel, constraint-learning framework for control tasks built on the idea of constraints-driven control. However, since control policies that keep a dynamical agent within state constraints over infinite horizons are not always available, this work instead considers constraints that can be satisfied over some finite time horizon \(T > 0\), which we refer to as limited-duration safety. Consequently, value function learning can be used as a tool to help us find limited-duration safe policies. We show that, in some applications, the existence of limited-duration safe policies is actually sufficient for long-duration autonomy. This idea is illustrated on a swarm of simulated robots that are tasked with covering a given area, but that sporadically need to abandon this task to charge batteries. We show how the battery-charging behavior naturally emerges as a result of the constraints. Additionally, using a cart-pole simulation environment, we show how a control policy can be efficiently transferred from the source task, balancing the pole, to the target task, moving the cart to one direction without letting the pole fall down.

MSC:

93B47 Iterative learning control
93A16 Multi-agent systems

Software:

MuJoCo; PyTorch
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Ames, A. D.; Xu, X.; Grizzle, J. W.; Tabuada, P., Control barrier function based quadratic programs for safety critical systems, IEEE Transactions on Automatic Control, 62, 8, 3861-3876 (2017) · Zbl 1373.90092
[2] Barto, A. G.; Sutton, R. S.; Anderson, C. W., Neuronlike adaptive elements that can solve difficult learning control problems, IEEE Transactions on Systems, Man, and Cybernetics, 5, 834-846 (1983)
[3] Cortés, J.; Egerstedt, M., Coordinated control of multi-robot systems: A survey, SICE Journal of Control, Measurement, and System Integration, 10, 6, 495-503 (2017)
[4] Cortes, J.; Martinez, S.; Karatas, T.; Bullo, F., Coverage control for mobile sensing networks, IEEE Transactions on robotics and Automation, 20, 2, 243-255 (2004)
[5] Egerstedt, M.; Pauli, J. N.; Notomista, G.; Hutchinson, S., Robot ecology: Constraint-based control design for long duration autonomy, Elsevier Annual Reviews in Control, 46, 1-7 (2018)
[6] Freeman, R. A.; Kokotovic, P. V., Inverse optimality in robust stabilization, SIAM Journal on Control and Optimization, 34, 4, 1365-1391 (1996) · Zbl 0863.93075
[7] Glotfelter, P.; Cortés, J.; Egerstedt, M., Nonsmooth barrier functions with applications to multi-robot systems, IEEE Control Systems Letters, 1, 2, 310-315 (2017)
[8] Kervadec, H.; Dolz, J.; Yuan, J.; Desrosiers, C.; Granger, E.; Ayed, I. B., Log-barrier constrained CNNs (2019), arXiv preprint arXiv:1904.04205
[9] Khalil, H. K., Nonlinear systems, Vol. 3 (2002), Prentice-Hall · Zbl 1003.34002
[10] Khansari-Zadeh, S. M.; Billard, A., Learning control Lyapunov function to ensure stability of dynamical system-based robot reaching motions, Robotics and Autonomous Systems, 62, 6, 752-765 (2014)
[11] Lakshmikantham, V.; Leela, S., Differential and integral inequalities: theory and applications: volume i: ordinary differential equations (1969), Academic press · Zbl 0177.12403
[12] Lewis, F. L.; Vrabie, D., Reinforcement learning and adaptive dynamic programming for feedback control, IEEE Circuits and Systems Magazine, 9, 3, 32-50 (2009)
[13] Liberzon, D., Calculus of variations and optimal control theory: A concise introduction (2011), Princeton University Press
[14] Lillicrap, T. P.; Hunt, J.; Jonathan, J.; Pritzel, A.; Heess, N.; Erez, T., Continuous control with deep reinforcement learning (2015), arXiv preprint arXiv:1509.02971
[15] Lozano-Pérez, T., & Kaelbling, L. P. (2014). A constraint-based method for solving sequential manipulation planning problems. In IEEE proc. IROS (pp. 3684-3691).
[16] Morris, B., Powell, M. J., & Ames, A. D. (2013). Sufficient conditions for the Lipschitz continuity of QP-based multi-objective control of humanoid robots. In Proc. CDC (pp. 2920-2926).
[17] Notomista, G.; Ruf, S. F.; Egerstedt, M., Persistification of robotic tasks using control barrier functions, IEEE Robotics and Automation Letters, 3, 2, 758-763 (2018)
[18] Ohnishi, M.; Wang, L.; Notomista, G.; Egerstedt, M., Barrier-certified adaptive reinforcement learning with applications to brushbot navigation, IEEE Transactions on Robotics, 35, 5, 1186-1205 (2019)
[19] Ohnishi, M., Yukawa, M., Johansson, M., & Sugiyama, M. (2018). Continuous-time value function approximation in reproducing kernel Hilbert spaces. In Proc. NeurIPS (pp. 2813-2824).
[20] Pan, S. J.; Yang, Q., A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering, 22, 10, 1345-1359 (2010)
[21] Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z., Automatic differentiation in pytorch (2017)
[22] Pickem, D., Glotfelter, P., Wang, L., Mote, M., Ames, A., & Feron, E., et al. (2017). The Robotarium: A remotely accessible swarm robotics research testbed. In IEEE Proc. ICRA (pp. 1699-1706).
[23] Ratschan, S., Converse theorems for safety and barrier certificates, IEEE Transactions on Automatic Control, 63, 8, 2628-2632 (2018) · Zbl 1423.93107
[24] Rimon, E.; Koditschek, D. E., Exact robot navigation using artificial potential functions, IEEE Transactions on Robotics and Automation, 8, 5, 501-518 (1992)
[25] Skinner, B. F., Science and human behavior, Vol. 92904 (1953), Simon and Schuster
[26] Sontag, E. D., A universal construction of artstein’s theorem on nonlinear stabilization, Systems & Control Letters, 13, 2, 117-123 (1989) · Zbl 0684.93063
[27] Sutton, R. S.; Barto, A. G., Reinforcement learning: An introduction (1998), MIT Press
[28] Sutton, R. S., McAllester, D. A., Singh, S. P., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Proc. NeurIPS (pp. 1057-1063).
[29] Tassa, Y., Doron, Y., Muldal, A., Erez, T., Li, Y., & Casas, D. de L., et al. (2013). Projected natural actor-critic. In Proc. NeurIPS (pp. 2337-2345).
[30] Tassa, Y.; Doron, Y.; Muldal, A.; Erez, T.; Li, Y.; Casas, D. de L., Deepmind control suite (2018), arXiv preprint arXiv:1801.00690
[31] Thrun, S.; Mitchell, T. M., Lifelong robot learning, (The biology and technology of intelligent autonomous agents (1995), Springer), 165-196
[32] Todorov, E., Erez, T., & Tassa, Y. (2012). Mujoco: A physics engine for model-based control. In IEEE/RSJ international conference on intelligent robots and systems (pp. 5026-5033).
[33] Wang, L., Han, D., & Egerstedt, M. (0000). Permissive barrier certificates for safe stabilization using sum-of-squares. In Proc. ACC (pp. 585-590).
[34] Wieland, P.; Allgøwer, F., Constructive safety using control barrier functions, Proceedings of the IFAC, 40, 12, 462-467 (2007)
[35] Wisniewski, R.; Sloth, C., Converse barrier certificate theorems, IEEE Transactions on Automatic Control, 61, 5, 1356-1361 (2016) · Zbl 1359.93130
[36] Xu, X.; Tabuada, P.; Grizzle, J. W.; Ames, A. D., Robustness of control barrier functions for safety critical control, Proceedings of the IFAC, 48, 27, 54-61 (2015)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.