A partially linear framework for massive heterogeneous data. (English) Zbl 1358.62050

The paper under review deals with a partially linear framework for modeling massive heterogeneous data with the objective to extract common features across all subpopulations while exploring heterogeneity of each. The authors propose an aggregation type estimator for the commonality parameter with the same minimax optimal bound and asymptotic distribution as in the case when there is no heterogeneity. This result holds when the number of subpopulations does not grow too fast. Next, a plug-in estimator for the heterogeneity parameter is provided, which has the same asymptotic distribution as in the case when commonality information is available. Also, the heterogeneity among a large number of subpopulations is tested by employing approximation theory results from V. Chernozhukov et al. [Ann. Stat. 41, No. 6, 2786–2819 (2013; Zbl 1292.62030)]. Finally, the “divide-and-conquer” method based on the obtained results is applied to the subpopulation with a huge sample size that cannot be processed in a single computer.


62G20 Asymptotic properties of nonparametric inference
62F25 Parametric tolerance and confidence regions
62F10 Point estimation
62F12 Asymptotic properties of parametric estimators
62J07 Ridge regression; shrinkage estimators (Lasso)


Zbl 1292.62030


SemiPar; gss
Full Text: DOI arXiv Euclid


[1] Aitkin, M. and Rubin, D. B. (1985). Estimation and hypothesis testing in finite mixture models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 47 67-75. · Zbl 0576.62038
[2] Bach, F. (2012). Sharp analysis of low-rank kernel matrix approximations. Preprint. Available at . arXiv:1208.2015
[3] Berlinet, A. and Thomas-Agnan, C. (2004). Reproducing Kernel Hilbert Spaces in Probability and Statistics . Kluwer Academic, Boston, MA. · Zbl 1145.62002
[4] Bickel, P. J. and Rosenblatt, M. (1973). On some global measures of the deviations of density function estimates. Ann. Statist. 1 1071-1095. · Zbl 0275.62033 · doi:10.1214/aos/1176342558
[5] Birman, M. S. and Solomyak, M. Z. (1967). Piecewise-polynomial approximations of functions of the classes \(w_{p}^{\alpha}\). Mat. Sb. 115 331-355. · Zbl 0173.16001 · doi:10.1070/SM1967v002n03ABEH002343
[6] Chen, X. and Xie, M. (2012). A split-and-conquer approach for analysis of extraordinarily large data, Technical Report 2012-01, Dept. Statistics, Rutgers Univ., Piscataway, NJ. · Zbl 1480.62258
[7] Cheng, G. and Shang, Z. (2015). Joint asymptotics for semi-nonparametric regression models with partially linear structure. Ann. Statist. 43 1351-1390. · Zbl 1320.62087 · doi:10.1214/15-AOS1313
[8] Cheng, G., Zhang, H. H. and Shang, Z. (2015). Sparse and efficient estimation for partial spline models with increasing dimension. Ann. Inst. Statist. Math. 67 93-127. · Zbl 1331.65028 · doi:10.1007/s10463-013-0440-y
[9] Chernozhukov, V., Chetverikov, D. and Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Statist. 41 2786-2819. · Zbl 1292.62030 · doi:10.1214/13-AOS1161
[10] Fan, J. and Zhang, W. (1999). Statistical estimation in varying coefficient models. Ann. Statist. 27 1491-1518. · Zbl 0977.62039 · doi:10.1214/aos/1017939139
[11] Figueiredo, M. A. and Jain, A. K. (2002). Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24 381-396.
[12] Gu, C. (2013). Smoothing Spline ANOVA Models , 2nd ed. Springer, New York. · Zbl 1269.62040 · doi:10.1007/978-1-4614-5369-7
[13] Guo, W. (2002). Inference in smoothing spline analysis of variance. J. R. Stat. Soc. Ser. B. Stat. Methodol. 64 887-898. · Zbl 1067.62070 · doi:10.1111/1467-9868.00367
[14] Härdle, W., Liang, H. and Gao, J. (2000). Partially Linear Models . Physica, Heidelberg. · Zbl 0968.62006
[15] Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models. J. Roy. Statist. Soc. Ser. B 55 757-796. · Zbl 0796.62060
[16] Huang, J. and Zhang, T. (2010). The benefit of group sparsity. Ann. Statist. 38 1978-2004. · Zbl 1202.62052 · doi:10.1214/09-AOS778
[17] Kleiner, A., Talwalkar, A., Sarkar, P. and Jordan, M. (2012). The big data bootstrap. Preprint. Available at . arXiv:1206.6415
[18] Kosorok, M. R. (2008). Introduction to Empirical Processes and Semiparametric Inference . Springer, New York. · Zbl 1180.62137
[19] Krasikov, I. (2004). New bounds on the Hermite polynomials. East J. Approx. 10 355-362. · Zbl 1113.33011
[20] Lafferty, J. and Lebanon, G. (2005). Diffusion kernels on statistical manifolds. J. Mach. Learn. Res. 6 129-163. · Zbl 1222.68240
[21] Li, R. and Liang, H. (2008). Variable selection in semiparametric regression modeling. Ann. Statist. 36 261-286. · Zbl 1132.62027 · doi:10.1214/009053607000000604
[22] Li, R., Lin, D. K. J. and Li, B. (2013). Statistical inference in massive data sets. Appl. Stoch. Models Bus. Ind. 29 399-409.
[23] Mammen, E. and van de Geer, S. (1997). Penalized quasi-likelihood estimation in partial linear models. Ann. Statist. 25 1014-1035. · Zbl 0906.62033 · doi:10.1214/aos/1069362736
[24] McDonald, R., Hall, K. and Mann, G. (2010). Distributed training strategies for the structured perceptron. In Human Language Technologies : The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics . Association for Computational Linguistics, Los Angeles, CA.
[25] McLachlan, G. and Peel, D. (2000). Finite Mixture Models . Wiley, New York. · Zbl 0963.62061
[26] Meinshausen, N. and Bühlmann, P. (2015). Maximin effects in inhomogeneous large-scale data. Ann. Statist. 43 1801-1830. · Zbl 1317.62059 · doi:10.1214/15-AOS1325
[27] Mendelson, S. (2002). Geometric parameters of kernel machines. In Computational Learning Theory ( Sydney , 2002). Lecture Notes in Computer Science 2375 29-43. Springer, Berlin. · Zbl 1050.68070 · doi:10.1007/3-540-45435-7_3
[28] Nardi, Y. and Rinaldo, A. (2008). On the asymptotic properties of the group lasso estimator for linear models. Electron. J. Stat. 2 605-633. · Zbl 1320.62167 · doi:10.1214/08-EJS200
[29] Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2008). Union support recovery in high-dimensional multivariate regression. In 46 th Annual Allerton Conference on Communication , Control , and Computing . IEEE, Allerton House, UIUC, IL. · Zbl 1373.62372
[30] Raskutti, G., Wainwright, M. J. and Yu, B. (2014). Early stopping and non-parametric regression: An optimal data-dependent stopping rule. J. Mach. Learn. Res. 15 335-366. · Zbl 1318.62136
[31] Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression . Cambridge Univ. Press, Cambridge. · Zbl 1038.62042 · doi:10.1017/CBO9780511755453
[32] Saunders, C., Gammerman, A. and Vovk, V. (1998). Ridge regression learning algorithm in dual variables. In Proceedings of the 15 th International Conference on Machine Learning ( ICML- 1998). Morgan Kaufmann, San Mateo, CA.
[33] Shang, Z. and Cheng, G. (2013). Local and global asymptotic inference in smoothing spline models. Ann. Statist. 41 2608-2638. · Zbl 1293.62107 · doi:10.1214/13-AOS1164
[34] Shawe-Taylor, J. and Cristianini, N. (2004). Kernel Methods for Pattern Analysis . Cambridge Univ. Press, Cambridge. · Zbl 0994.68074
[35] Sollich, P. and Williams, C. K. (2005). Understanding Gaussian process regression using the equivalent kernel. In Deterministic and Statistical Methods in Machine Learning 211-228. Springer, Berlin. · Zbl 1133.68410 · doi:10.1007/11559887_13
[36] Städler, N., Bühlmann, P. and van de Geer, S. (2010). \(\ell_{1}\)-penalization for mixture regression models. TEST 19 209-256. · Zbl 1203.62128 · doi:10.1007/s11749-010-0197-z
[37] Steinwart, I., Hush, D. R., Scovel, C. et al. (2009). Optimal rates for regularized least squares regression. In Conference on Learning Theory . Montreal, Canada. · Zbl 1127.68090
[38] Stone, C. J. (1985). Additive regression and other nonparametric models. Ann. Statist. 13 689-705. · Zbl 0605.62065 · doi:10.1214/aos/1176349548
[39] Wang, Y. (2011). Smoothing Splines : Methods and Applications . CRC Press, Boca Raton, FL. · Zbl 1223.65011 · doi:10.1201/b10954
[40] Wang, X. and Dunson, D. B. (2013). Parallel mcmc via weierstrass sampler. Preprint. Available at . arXiv:1312.4605
[41] Yatchew, A. (2003). Semiparametric Regression for the Applied Econometrician . Cambridge Univ. Press, Cambridge. · Zbl 1067.62041 · doi:10.1017/CBO9780511615887
[42] Zhang, T. (2005). Learning bounds for kernel regression using effective data dimensionality. Neural Comput. 17 2077-2098. · Zbl 1080.68044 · doi:10.1162/0899766054323008
[43] Zhang, Y., Duchi, J. and Wainwright, M. (2013). Divide and conquer kernel ridge regression. In Conference on Learning Theory . Princeton, NJ. · Zbl 1351.62142
[44] Zhao, T., Cheng, G. and Liu, H. (2016). Supplement to “A partially linear framework for massive heterogeneous data.” . · Zbl 1358.62050 · doi:10.1214/15-AOS1410
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.