WONDER
swMATH ID:  35432 
Software Authors:  Dobriban, Edgar; Sheng, Yue 
Description:  WONDER: weighted oneshot distributed ridge regression in high dimensions. In many areas, practitioners need to analyze large data sets that challenge conventional singlemachine computing. To scale up data analysis, distributed and parallel computing approaches are increasingly needed. Here we study a fundamental and highly important problem in this area: How to do ridge regression in a distributed computing environment? Ridge regression is an extremely popular method for supervised learning, and has several optimality properties, thus it is important to study. We study oneshot methods that construct weighted combinations of ridge regression estimators computed on each machine. By analyzing the mean squared error in a highdimensional randomeffects model where each predictor has a small effect, we discover several new phenomena. Infiniteworker limit: The distributed estimator works well for very large numbers of machines, a phenomenon we call “infiniteworker limit”. Optimal weights: The optimal weights for combining local estimators sum to more than unity, due to the downward bias of ridge. Thus, all averaging methods are suboptimal. We also propose a new Weighted ONeshot DistributEd Ridge regression algorithm (WONDER). We test WONDER in simulation studies and using the Million Song Dataset as an example. There it can save at least 100x in computation time, while nearly preserving test accuracy. 
Homepage:  https://jmlr.csail.mit.edu/papers/v21/19277.html 
Keywords:  distributed learning; ridge regression; highdimensional statistics; random matrix theory 
Related Software:  MLbase; GPT3 
Cited in:  4 Publications 
Standard Articles
1 Publication describing the Software, including 1 Publication in zbMATH  Year 

WONDER: weighted oneshot distributed ridge regression in high dimensions. Zbl 1498.68232 Dobriban, Edgar; Sheng, Yue 
2020

all
top 5
Cited by 7 Authors
2  Dobriban, Edgar 
1  Fan, Jianqing 
1  Lin, Licong 
1  Sheng, Yue 
1  Silin, Igor 
1  Sun, Hongwei 
1  Wu, Qiang 
Cited in 2 Serials
3  Journal of Machine Learning Research (JMLR) 
1  The Annals of Statistics 
Cited in 3 Fields
3  Computer science (68XX) 
2  Statistics (62XX) 
1  Probability theory and stochastic processes (60XX) 