A partially linear framework for massive heterogeneous data. (English) Zbl 1358.62050

The paper under review deals with a partially linear framework for modeling massive heterogeneous data with the objective to extract common features across all subpopulations while exploring heterogeneity of each. The authors propose an aggregation type estimator for the commonality parameter with the same minimax optimal bound and asymptotic distribution as in the case when there is no heterogeneity. This result holds when the number of subpopulations does not grow too fast. Next, a plug-in estimator for the heterogeneity parameter is provided, which has the same asymptotic distribution as in the case when commonality information is available. Also, the heterogeneity among a large number of subpopulations is tested by employing approximation theory results from V. Chernozhukov et al. [Ann. Stat. 41, No. 6, 2786–2819 (2013; Zbl 1292.62030)]. Finally, the “divide-and-conquer” method based on the obtained results is applied to the subpopulation with a huge sample size that cannot be processed in a single computer.


62G20 Asymptotic properties of nonparametric inference
62F25 Parametric tolerance and confidence regions
62F10 Point estimation
62F12 Asymptotic properties of parametric estimators
62J07 Ridge regression; shrinkage estimators (Lasso)


Zbl 1292.62030


SemiPar; gss
Full Text: DOI arXiv Euclid


This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.