×

Multi-player bandits: the adversarial case. (English) Zbl 1499.68296

Summary: We consider a setting where multiple players sequentially choose among a common set of actions (arms). Motivated by an application to cognitive radio networks, we assume that players incur a loss upon colliding, and that communication between players is not possible. Existing approaches assume that the system is stationary. Yet this assumption is often violated in practice, e.g., due to signal strength fluctuations. In this work, we design the first multi-player Bandit algorithm that provably works in arbitrarily changing environments, where the losses of the arms may even be chosen by an adversary. This resolves an open problem posed by J. Rosenski et al. [“Multi-player bandits: a musical chairs approach”, in: Proceedings of the 33rd international conference on machine learning, ICML’16. New York, NY: Association for Computing Machinery (ACM). 155–163 (2016)].

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62C25 Compound decision problems in statistical decision theory
62L10 Sequential statistical analysis
68W27 Online algorithms; streaming algorithms
90B18 Communication networks in operations research
PDFBibTeX XMLCite
Full Text: arXiv Link

References:

[1] Animashree Anandkumar, Nithin Michael, Ao Kevin Tang, and Ananthram Swami. Distributed algorithms for learning and cognitive medium access with logarithmic regret.IEEE Journal on Selected Areas in Communications, 29(4):731-745, 2011.
[2] Jean-Yves Audibert, Sébastien Bubeck, and Gábor Lugosi. Regret in online combinatorial optimization.Mathematics of Operations Research, 39(1):31-45, 2013. · Zbl 1341.68309
[3] Peter Auer, Nicolò Cesa-bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem.SIAM Journal on Computing, 32:2002, 2002. · Zbl 1029.68087
[4] Orly Avner and Shie Mannor. Concurrent bandits and cognitive radio networks. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 66-81. Springer, 2014.
[5] Orly Avner and Shie Mannor. Multi-user lax communications: a multi-armed bandit approach. In IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications, pages 1-9. IEEE, 2016.
[6] Orly Avner and Shie Mannor. Multi-user communication networks: A coordinated multi-armed bandit approach.arXiv preprint arXiv:1808.04875, 2018.
[7] Baruch Awerbuch and Robert Kleinberg. Competitive collaborative learning.Journal of Computer and System Sciences, 74(8):1271-1288, 2008. · Zbl 1160.68488
[8] Ilai Bistritz and Amir Leshem. Distributed multi-player bandits-a game of thrones approach. In Advances in Neural Information Processing Systems, pages 7222-7232, 2018.
[9] Sébastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multiarmed bandit problems.Foundations and TrendsRin Machine Learning, 5(1):1-122, 2012. · Zbl 1281.91051
[10] Sébastien Bubeck, Yuanzhi Li, Yuval Peres, and Mark Sellke. Non-stochastic multi-player multiarmed bandits: Optimal rate with collision information, sublinear without.arXiv preprint arXiv:1904.12233, 2019.
[11] Nicolo Cesa-Bianchi and Gábor Lugosi.Prediction, learning, and games. Cambridge university press, 2006. · Zbl 1114.91001
[12] Nicolo Cesa-Bianchi, Claudio Gentile, Yishay Mansour, and Alberto Minora. Delay and cooperation in nonstochastic bandits.JOURNAL OF MACHINE LEARNING RESEARCH, 49:605-622, 2016.
[13] Richard Combes, Mohammad Sadegh Talebi Mazraeh Shahi, Alexandre Proutiere, et al. Combinatorial bandits revisited. InAdvances in Neural Information Processing Systems, pages 2116-2124, 2015.
[14] Ofer Dekel, Ambuj Tewari, and Raman Arora. Online bandit learning against an adaptive adversary: from regret to policy regret. InICML, 2012.
[15] Alex Kulesza and Ben Taskar.Determinantal Point Processes for Machine Learning. Now Publishers Inc., Hanover, MA, USA, 2012. ISBN 1601986289, 9781601986283.
[16] Lifeng Lai, Hai Jiang, and H Vincent Poor. Medium access in cognitive radio networks: A competitive multi-armed bandit framework. InSignals, Systems and Computers, 2008 42nd Asilomar Conference on, pages 98-102. IEEE, 2008.
[17] Tor Lattimore and Csaba Szepesvári. Bandit algorithms.preprint, 2018.
[18] Haoyang Liu, Keqin Liu, Qing Zhao, et al. Learning in a changing world: Restless multi-armed bandit with unknown dynamics.IEEE Trans. Information Theory, 59(3):1902-1916, 2013. · Zbl 1364.91036
[19] Keqin Liu and Qing Zhao. Distributed learning in multi-armed bandit with multiple players.IEEE Transactions on Signal Processing, 58(11):5667-5681, 2010. · Zbl 1391.62006
[20] Brendan McMahan and Ofer Dekel. Cse599s: Online learning, 2014. URLhttps://courses. cs.washington.edu/courses/cse599s/14sp/scribes.html.
[21] Herbert Robbins. Some aspects of the sequential design of experiments.Bulletin of the American Mathematical Society, 58(5):527-535, 1952. · Zbl 0049.37009
[22] Jonathan Rosenski, Ohad Shamir, and Liran Szlak. Multi-player bandits: A musical chairs approach. InProceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, 2016.
[23] Taishi Uchiya, Atsuyoshi Nakamura, and Mineichi Kudo. Algorithms for adversarial bandit problems with multiple plays. InInternational Conference on Algorithmic Learning Theory, pages 375-389. Springer, 2010. · Zbl 1306.68059
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.