EAQR: a multiagent Q-learning algorithm for coordination of multiple agents. (English) Zbl 1407.68421

Summary: We propose a cooperative multiagent Q-learning algorithm called exploring actions according to Q-value ratios (EAQR). Our aim is to design a multiagent reinforcement learning algorithm for cooperative tasks where multiple agents need to coordinate their behavior to achieve the best system performance. In EAQR, Q-value represents the probability of getting the maximal reward, while each action is selected according to the ratio of its Q-value to the sum of all actions’ Q-value and the exploration rate \(\varepsilon\). Seven cooperative repeated games are used as cases to study the dynamics of EAQR. Theoretical analyses show that in some cases the optimal joint strategies correspond to the stable critical points of EAQR. Moreover, comparison experiments on stochastic games with finite steps are conducted. One is the box-pushing, and the other is the distributed sensor network problem. Experimental results show that EAQR outperforms the other algorithms in the box-pushing problem and achieves the theoretical optimal performance in the distributed sensor network problem.


68T05 Learning and adaptive systems in artificial intelligence
68T42 Agent technology and artificial intelligence


Full Text: DOI


[1] Cui, R.; Yang, C.; Li, Y.; Sharma, S., Adaptive neural network control of AUVs with control input nonlinearities using reinforcement learning, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 47, 6, 1019-1029, (2017)
[2] Wang, Z.; Liu, L.; Zhang, H.; Xiao, G., Fault-tolerant controller design for a class of nonlinear MIMO discrete-time systems via online reinforcement learning algorithm, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 46, 5, 611-622, (2016)
[3] Kebriaei, H.; Rahimi-Kian, A.; Ahmadabadi, M. N., Model-based and learning-based decision making in incomplete information Cournot games: a state estimation approach, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45, 4, 713-718, (2015)
[4] Zhao, D.; Xia, Z.; Wang, D., Model-free optimal control for affine nonlinear systems with convergence analysis, IEEE Transactions on Automation Science and Engineering, 12, 4, 1461-1468, (2015)
[5] Busoniu, L.; Babuska, R.; de Schutter, B., A comprehensive survey of multiagent reinforcement learning, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38, 2, 156-172, (2008)
[6] Vamvoudakis, K. G.; Lewis, F. L.; Hudas, G. R., Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality, Automatica, 48, 8, 1598-1611, (2012) · Zbl 1267.93190
[7] Zhao, D.; Zhang, Q.; Wang, D.; Zhu, Y., Experience replay for optimal control of nonzero-sum game systems with unknown dynamics, IEEE Transactions on Cybernetics, 46, 3, 854-865, (2016)
[8] Hu, J.; Wellman, M. P., Nash Q-learning for general-sum stochastic games, Journal of Machine Learning Research, 4, 1039-1069, (2003) · Zbl 1094.68076
[9] Singh, S.; Kearns, M.; Mansour, Y., Nash convergence of gradient dynamics in general-sum games, Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
[10] Bowling, M.; Veloso, M., Multiagent learning using a variable learning rate, Artificial Intelligence, 136, 2, 215-250, (2002) · Zbl 0995.68075
[11] Awheda, M. D.; Schwartz, H. M., Exponential moving average based multiagent reinforcement learning algorithms, Artificial Intelligence Review, 45, 3, 299-332, (2016)
[12] Abdallah, S.; Lesser, V., A multiagent reinforcement learning algorithm with non-linear dynamics, Journal of Artificial Intelligence Research, 33, 521-549, (2008) · Zbl 1165.91328
[13] Babes, M.; Wunder, M.; Littman, M. L., Q-Learning in Two-Player Two-Action Games, (2009), AAMAS
[14] Tuyls, K.; Nowé, A., Evolutionary game theory and multi-agent reinforcement learning, Knowledge Engineering Review, 20, 1, 63-90, (2005)
[15] Tuyls, K.; Parsons, S., What evolutionary game theory tells us about multiagent learning, Artificial Intelligence, 171, 7, 406-416, (2007) · Zbl 1168.68497
[16] Bloembergen, D.; Tuyls, K.; Hennes, D.; Kaisers, M., Evolutionary dynamics of multi-agent learning: a survey, Journal of Artificial Intelligence Research, 53, 659-697, (2015) · Zbl 1336.68210
[17] Kianercy, A.; Galstyan, A., Dynamics of Boltzmann Q learning in two-player two-action games, Physical Review E, 85, 4, 1145-1154, (2012)
[18] Kao, Y.; Yang, T.; Park, J. H., Exponential stability of switched Markovian jumping neutral-type systems with generally incomplete transition rates, International Journal of Robust and Nonlinear Control, 28, 5, 1583-1596, (2018) · Zbl 1390.93835
[19] Jiang, B.; Yonggui, K.; Karimi, H. R.; Gao, C. C., Stability and stabilization for singular switching semi-Markovian jump systems with generally uncertain transition rates, IEEE Transactions on Automatic Control, 1, (2018)
[20] Shamma, J. S.; Arslan, G., Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria, IEEE Transactions on Automatic Control, 50, 3, 312-327, (2005) · Zbl 1366.91028
[21] Rummery, G. A.; Niranjan, M., On-Line Q-Learning Using Connectionist Systems, (1994), Engineering Dept., Cambridge University, Tech. Rep. CUED/F-INFENG/TR 166
[22] Syed, A.; Koenig, S.; Tambe, M., Preprocessing techniques for accelerating the DCOP algorithm ADOPT, Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
[23] Kok, J.; Vlassis, N., Collaborative multiagent reinforcement learning by payoff propagation, Journal of Machine Learning Research, 7, 1789-1828, (2006) · Zbl 1222.68235
[24] Chen, G.; Cao, W.; Chen, X.; Wu, M., Multi-agent Q-learning with joint state value approximation, Proceedings of the 30th Chinese Control Conference
[25] Bab, A.; Brafman, R., Multi-agent reinforcement learning in common interest and fixed sum stochastic games: an experimental study, Journal of Machine Learning Research, 9, 4, 2635-2675, (2008) · Zbl 1225.68145
[26] Taniguchi, T.; Sawaragi, T., Adaptive organization of generalized behavioral concepts for autonomous robots: schema-based modular reinforcement learning, 2005 International Symposium on Computational Intelligence in Robotics and Automation
[27] Matignon, L.; Laurent, G. J.; Fort-Piat, N. L., Hysteretic Q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems
[28] Bengio, Y.; Courville, A.; Vincent, P., Unsupervised feature learning and deep learning: a review and new perspectives, 1, (2012)
[29] Zhao, D.; Chen, Y.; Lv, L., Deep reinforcement earning with visual attention for vehicle classification, IEEE Transactions on Cognitive and Developmental Systems, 9, 4, 356-367, (2017)
[30] He, K.; Zhang, X.; Ren, S.; Sun, J., Deep residual learning for image recognition, (2015)
[31] Rong, L.; Shen, H., Distributed containment control of second order multiagent systems with input delays under general protocols, Complexity, 21, 6, 112-120, (2016)
[32] Zhao, H.; Park, J. H., Dynamic output feedback consensus of continuous time networked multiagent systems, Complexity, 20, 5, 35-42, (2015)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.