×

Robust asynchronous stochastic gradient-push: asymptotically optimal and network-independent performance for strongly convex functions. (English) Zbl 07255089

Summary: We consider the standard model of distributed optimization of a sum of functions \(F(\mathbf{z}) = \sum_{i=1}^n f_i(\mathbf{z})\), where node \(i\) in a network holds the function \(f_i(\mathbf{z})\). We allow for a harsh network model characterized by asynchronous updates, message delays, unpredictable message losses, and directed communication among nodes. In this setting, we analyze a modification of the Gradient-Push method for distributed optimization, assuming that (i) node \(i\) is capable of generating gradients of its function \(f_i(\mathbf{z})\) corrupted by zero-mean bounded-support additive noise at each step, (ii) \(F(\mathbf{z})\) is strongly convex, and (iii) each \(f_i(\mathbf{z})\) has Lipschitz gradients. We show that our proposed method asymptotically performs as well as the best bounds on centralized gradient descent that takes steps in the direction of the sum of the noisy gradients of all the functions \(f_1(\mathbf{z}), \ldots, f_n(\mathbf{z})\) at each step.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

Software:

HOGWILD; ADD-OPT
PDF BibTeX XML Cite
Full Text: arXiv Link

References:

[1] Alekh Agarwal and John C Duchi. Distributed delayed stochastic optimization. InAdvances in Neural Information Processing Systems, pages 873-881, 2011.
[2] Mohammad Akbari, Bahman Gharesifard, and Tam´as Linder. Distributed online convex optimization on time-varying directed graphs.IEEE Transactions on Control of Network Systems, 4(3):417-428, 2017. · Zbl 06988964
[3] Tansu Alpcan and Christian Bauckhage. A distributed machine learning framework. In48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference, pages 2546-2551. IEEE, 2009.
[4] Mahmoud Assran and Michael Rabbat. Asynchronous subgradient-push.arXiv preprint arXiv:1803.08950, 2018.
[5] Florence B´en´ezit, Vincent Blondel, Patrick Thiran, John Tsitsiklis, and Martin Vetterli. Weighted gossip: Distributed averaging using non-doubly stochastic matrices. In2010 IEEE International Symposium on Information Theory (ISIT), pages 1753-1757. IEEE, 2010.
[6] Theodora S Brisimi, Ruidi Chen, Theofanie Mela, Alex Olshevsky, Ioannis Ch Paschalidis, and Wei Shi. Federated learning of predictive models from federated electronic health records.International journal of medical informatics, 112:59-67, 2018.
[7] Tsung-Hui Chang, Mingyi Hong, Wei-Cheng Liao, and Xiangfeng Wang. Asynchronous distributed ADMM for large-scale optimizationpart I: algorithm and convergence analysis. IEEE Transactions on Signal Processing, 64(12):3118-3130, 2016a. · Zbl 1414.94106
[8] Tsung-Hui Chang, Wei-Cheng Liao, Mingyi Hong, and Xiangfeng Wang. Asynchronous distributed ADMM for large-scale optimizationpart II: Linear convergence analysis and numerical performance.IEEE Transactions on Signal Processing, 64(12):3131-3144, 2016b. · Zbl 1414.94107
[9] Jianshu Chen and Ali H Sayed. On the learning behavior of adaptive networkspart II: Performance analysis.IEEE Transactions on Information Theory, 61(6):3518-3548, 2015. · Zbl 1359.68248
[10] Paolo Di Lorenzo and Gesualdo Scutari. Next: In-network nonconvex optimization.IEEE Transactions on Signal and Information Processing over Networks, 2(2):120-136, 2016.
[11] Alejandro D Dominguez-Garcia and Christoforos N Hadjicostis. Distributed matrix scaling and application to average consensus in directed graphs.IEEE Transactions on Automatic Control, 58(3):667-681, 2013. · Zbl 1369.93021
[12] Alejandro D Dom´ınguez-Garc´ıa and Christoforos N Hadjicostis. Convergence rate of a distributed algorithm for matrix scaling to doubly stochastic form. In53rd IEEE Conference on Decision and Control, pages 3240-3245. IEEE, 2014.
[13] Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson. An asynchronous minibatch algorithm for regularized stochastic optimization.IEEE Transactions on Automatic Control, 61(12):3740-3754, 2016. · Zbl 1359.90080
[14] Bahman Gharesifard and Jorge Cort´es.Distributed strategies for generating weightbalanced and doubly stochastic digraphs.European Journal of Control, 18(6):539-557, 2012. · Zbl 1291.93290
[15] Christoforos N Hadjicostis, Nitin H Vaidya, and Alejandro D Dom´ınguez-Garc´ıa. Robust distributed average consensus via exchange of running sums.IEEE Transactions on Automatic Control, 61(6):1492-1507, 2016. · Zbl 1359.94979
[16] Christoforos N Hadjicostis, Alejandro D Dom´ınguez-Garc´ıa, Themistokis Charalambous, et al. Distributed averaging and balancing in network systems: with applications to coordination and control.Foundations and TrendsRin Systems and Control, 5(2-3): 99-292, 2018.
[17] Shibo He, Dong-Hoon Shin, Junshan Zhang, Jiming Chen, and Youxian Sun. Full-view area coverage in camera sensor networks: Dimension reduction and near-optimal solutions. IEEE Transactions on Vehicular Technology, 65(9):7448-7461, 2015.
[18] Mingyi Hong. A distributed, asynchronous and incremental algorithm for nonconvex optimization: An ADMM approach.IEEE Transactions on Control of Network Systems, 2017. · Zbl 07044961
[19] David Kempe, Alin Dobra, and Johannes Gehrke. Gossip-based computation of aggregate information. InFoundations of Computer Science, 2003. 44th Annual IEEE Symposium on, pages 482-491. IEEE, 2003.
[20] Anastasiia Koloskova, Sebastian Urban Stich, and Martin Jaggi. Decentralized stochastic optimization and gossip algorithms with compressed communication.Machine Learning Research, 97(CONF), 2019.
[21] Guanghui Lan, Soomin Lee, and Yi Zhou. Communication-efficient algorithms for decentralized and stochastic optimization.Mathematical Programming, pages 1-48, 2018. · Zbl 1437.90125
[22] Mu Li, David G Andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J Shekita, and Bor-Yiing Su. Scaling distributed machine learning with the parameter server. In11th{USENIX}Symposium on Operating Systems Design and Implementation ({OSDI}14), pages 583-598, 2014.
[23] Xiangru Lian, Yijun Huang, Yuncheng Li, and Ji Liu. Asynchronous parallel stochastic gradient for nonconvex optimization. InAdvances in Neural Information Processing Systems, pages 2737-2745, 2015.
[24] Xiangru Lian, Ce Zhang, Huan Zhang, Cho-Jui Hsieh, Wei Zhang, and Ji Liu. Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent. InAdvances in Neural Information Processing Systems, pages 5330-5340, 2017.
[25] Xiangru Lian, Wei Zhang, Ce Zhang, and Ji Liu.Asynchronous decentralized parallel stochastic gradient descent. InInternational Conference on Machine Learning (ICML), pages 3043-3052, 2018.
[26] Ilan Lobel and Asuman Ozdaglar. Distributed subgradient methods for convex optimization over random networks.IEEE Transactions on Automatic Control, 56(6):1291-1306, 2010. · Zbl 1368.90125
[27] Fatemeh Mansoori and Ermin Wei.Superlinearly convergent asynchronous distributed network newton method. In2017 IEEE 56th Annual Conference on Decision and Control (CDC), pages 2874-2879. IEEE, 2017.
[28] Gemma Morral, Pascal Bianchi, Gersende Fort, and J´er´emie Jakubowicz.Distributed stochastic approximation: The price of non-double stochasticity. InSignals, Systems and Computers (ASILOMAR), 2012 Conference Record of the Forty Sixth Asilomar Conference on, pages 1473-1477. IEEE, 2012.
[29] Gemma Morral, Pascal Bianchi, and Gersende Fort. Success and failure of adaptationdiffusion algorithms with decaying step size in multiagent networks.IEEE Transactions on Signal Processing, 65(11):2798-2813, 2017. · Zbl 1414.94423
[30] Angelia Nedic. Asynchronous broadcast-based convex optimization over a network.IEEE Transactions on Automatic Control, 56(6):1337-1351, 2011. · Zbl 1368.90126
[31] Angelia Nedic and Alex Olshevsky. Distributed optimization over time-varying directed graphs.IEEE Transactions on Automatic Control, 60(3):601-615, 2015. · Zbl 1360.90262
[32] Angelia Nedic and Alex Olshevsky. Stochastic gradient-push for strongly convex functions on time-varying directed graphs.IEEE Transactions on Automatic Control, 61(12):3936- 3947, 2016. · Zbl 1359.90142
[33] Angelia Nedic and Asuman Ozdaglar. Distributed subgradient methods for multi-agent optimization.IEEE Transactions on Automatic Control, 54(1):48-61, 2009. · Zbl 1367.90086
[34] Angelia Nedic, Alex Olshevsky, and Wei Shi. Achieving geometric convergence for distributed optimization over time-varying graphs.SIAM Journal on Optimization, 27(4): 2597-2633, 2017. · Zbl 1387.90189
[35] Angelia Nedic, Alex Olshevsky, and Michael G Rabbat.Network topology and communication-computation tradeoffs in decentralized optimization.IEEE, 106(5):953- 976, 2018.
[36] Arkadi Nemirovski, Anatoli Juditsky, Guanghui Lan, and Alexander Shapiro.Robust stochastic approximation approach to stochastic programming.SIAM Journal on optimization, 19(4):1574-1609, 2009. · Zbl 1189.90109
[37] Alex Olshevsky. Linear time average consensus and distributed optimization on fixed graphs. SIAM Journal on Control and Optimization, 55(6):3990-4014, 2017. · Zbl 1386.93015
[38] Alex Olshevsky, Ioannis Ch Paschalidis, and Artin Spiridonoff. Fully asynchronous pushsum with growing intercommunication intervals.American Control Conference, pages 591-596, 2018.
[39] Boris N Oreshkin, Mark J Coates, and Michael G Rabbat. Optimization and analysis of distributed averaging with short node memory.IEEE Transactions on Signal Processing, 58(5):2850-2865, 2010. · Zbl 1392.94371
[40] Zhouhua Peng, Jun Wang, and Dan Wang. Distributed maneuvering of autonomous surface vehicles based on neurodynamic optimization and fuzzy approximation.IEEE Transactions on Control Systems Technology, 26(3):1083-1090, 2017.
[41] Shi Pu and Alfredo Garcia. A flocking-based approach for distributed stochastic optimization.Operations Research, 66(1):267-281, 2017.
[42] Shi Pu and Angelia Nedic. A distributed stochastic gradient tracking method. In2018 IEEE Conference on Decision and Control (CDC), pages 963-968. IEEE, 2018.
[43] Guannan Qu and Na Li. Harnessing smoothness to accelerate distributed optimization. IEEE Transactions on Control of Network Systems, 2017. · Zbl 07044988
[44] Guannan Qu and Na Li. Accelerated distributed Nesterov gradient descent.IEEE Transactions on Automatic Control, 2019.
[45] Alexander Rakhlin, Ohad Shamir, Karthik Sridharan, et al. Making gradient descent optimal for strongly convex stochastic optimization. In29th International Conference on Machine Learning (ICML), volume 12, pages 1571-1578. Citeseer, 2012.
[46] S Sundhar Ram, Angelia Nedic, and Venugopal V Veeravalli. Distributed stochastic subgradient projection algorithms for convex optimization.Journal of optimization theory and applications, 147(3):516-545, 2010. · Zbl 1254.90171
[47] Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. InAdvances in Neural Information Processing Systems, pages 693-701, 2011.
[48] Jason DM Rennie and Nathan Srebro. Loss functions for preference levels: Regression with discrete ordered labels. InIJCAI multidisciplinary workshop on advances in preference handling, pages 180-186. Kluwer Norwell, MA, 2005.
[49] Kevin Scaman, Francis Bach, S´ebastien Bubeck, Yin Tat Lee, and Laurent Massouli´e. Optimal algorithms for smooth and strongly convex distributed optimization in networks. In34th International Conference on Machine Learning (ICML)-Volume 70, pages 3027- 3036. JMLR. org, 2017.
[50] Wei Shi, Qing Ling, Gang Wu, and Wotao Yin. Extra: An exact first-order algorithm for decentralized consensus optimization.SIAM Journal on Optimization, 25(2):944-966, 2015. · Zbl 1328.90107
[51] Benjamin Sirb and Xiaojing Ye. Consensus optimization with delayed and stochastic gradients on decentralized networks. In2016 IEEE International Conference on Big Data (Big Data), pages 76-85. IEEE, 2016. · Zbl 1396.65098
[52] Kunal Srivastava and Angelia Nedic. Distributed asynchronous constrained stochastic optimization.IEEE Journal of Selected Topics in Signal Processing, 5(4):772-790, 2011.
[53] Lili Su and Nitin H Vaidya. Fault-tolerant multi-agent optimization: optimal iterative distributed algorithms. InProceedings of the 2016 ACM Symposium on Principles of Distributed Computing, pages 425-434. ACM, 2016a. · Zbl 1375.68206
[54] Lili Su and Nitin H Vaidya. Non-bayesian learning in the presence of byzantine agents. In International Symposium on Distributed Computing, pages 414-427. Springer, 2016b. · Zbl 1393.68157
[55] Lili Su and Nitin H. Vaidya. Reaching approximate byzantine consensus with multi-hop communication.Information and Computation, 255:352 - 368, 2017. ISSN 0890-5401. doi: https://doi.org/10.1016/j.ic.2016.12.003. URLhttp://www.sciencedirect.com/ science/article/pii/S0890540116301262. SSS 2015. · Zbl 1371.68029
[56] Ying Sun, Gesualdo Scutari, and Daniel Palomar. Distributed nonconvex multiagent optimization over time-varying networks. InSignals, Systems and Computers, 2016 50th Asilomar Conference on, pages 788-794. IEEE, 2016.
[57] Ye Tian, Ying Sun, and Gesualdo Scutari. Asy-sonata: Achieving linear convergence in distributed asynchronous multiagent optimization. In2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 543-551. IEEE, 2018.
[58] Konstantinos I Tsianos, Sean Lawlor, and Michael G Rabbat. Consensus-based distributed optimization: Practical issues and applications in large-scale machine learning. InCommunication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on, pages 1543-1550. IEEE, 2012a.
[59] Konstantinos I Tsianos, Sean Lawlor, and Michael G Rabbat. Push-sum distributed dual averaging for convex optimization. In2012 51st IEEE Conference on Decision and Control (CDC), pages 5453-5458. IEEE, 2012b.
[60] John Tsitsiklis, Dimitri Bertsekas, and Michael Athans. Distributed asynchronous deterministic and stochastic gradient optimization algorithms.IEEE Transactions on Automatic Control, 31(9):803-812, 1986. · Zbl 0602.90120
[61] Tianyu Wu, Kun Yuan, Qing Ling, Wotao Yin, and Ali H Sayed. Decentralized consensus optimization with asynchrony and delays.IEEE Transactions on Signal and Information Processing over Networks, 4(2):293-307, 2018.
[62] Chenguang Xi and Usman A Khan. Dextra: A fast algorithm for optimization over directed graphs.IEEE Transactions on Automatic Control, 62(10):4980-4993, 2017a. · Zbl 1390.90553
[63] Chenguang Xi and Usman A Khan. Distributed subgradient projection algorithm over directed graphs.IEEE Transactions on Automatic Control, 62(8):3986-3992, 2017b. · Zbl 1373.90110
[64] Chenguang Xi, Ran Xin, and Usman A Khan. Add-opt: Accelerated distributed directed optimization.IEEE Transactions on Automatic Control, 63(5):1329-1339, 2018. · Zbl 1395.90204
[65] Jinming Xu, Shanying Zhu, Yeng Chai Soh, and Lihua Xie. Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes. In2015 54th IEEE Conference on Decision and Control (CDC), pages 2055-2060. IEEE, 2015.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.