PLS for Big Data: a unified parallel algorithm for regularised group PLS. (English) Zbl 1431.62249

This article surveys partial least squares methods for two blocks of data. A general framework to deal with both symmetric and asymmetric methods is built. Group structure is also explored. Variable selection techniques based on penalized singular value decomposition are employed in a new unified algorithm that can perform different Partial Least Squares methods, and their regularized versions. Further extensions to deal with massive data sets are presented. The optimization criteria and algorithmic computation are detailed. Different approaches to decrease the computational time are explored. The performance of the algorithm and its scalability to large sample sizes is demonstrated on simulated data sets. The first simulation considers asymmetric model on group structured data while the second presents an extension to discriminant analysis.


62H20 Measures of association (correlation, canonical correlation, etc.)
62R07 Statistical aspects of big data and data science
62J07 Ridge regression; shrinkage estimators (Lasso)
Full Text: DOI arXiv Euclid


