×

Semi-supervised learning with summary statistics. (English) Zbl 1430.68272

Summary: Nowadays, the extensive collection and analyzing of data is stimulating widespread privacy concerns, and therefore is increasing tensions between the potential sources of data and researchers. A privacy-friendly learning framework can help to ease the tensions, and to free up more data for research. We propose a new algorithm, LESS (Learning with Empirical feature-based Summary statistics from Semi-supervised data), which uses only summary statistics instead of raw data for regression learning. The selection of empirical features serves as a trade-off between prediction precision and the protection of privacy. We show that LESS achieves the minimax optimal rate of convergence in terms of the size of the labeled sample. LESS extends naturally to the applications where data are separately held by different sources. Compared with the existing literature on distributed learning, LESS removes the restriction of minimum sample size on single data sources.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62G08 Nonparametric regression and quantile regression
62G20 Asymptotic properties of nonparametric inference
68P27 Privacy of data

Software:

RSS
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bauer, F., Pereverzev, S. and Rosasco, L., On regularization algorithms in learning theory, J. Complexity23(1) (2007) 52-72. · Zbl 1109.68088
[2] Bertozzi, A. L., Luo, X., Stuart, A. M. and Zygalakis, K. C., Uncertainty quantification in graph-based classification of high dimensional data, SIAM/ASA J. Uncertain. Quantif.6(2) (2018) 568-595. · Zbl 1394.62083
[3] Bhatia, R., Matrix Analysis, , Vol. 169 (Springer-Verlag, New York, 1997). · Zbl 0863.15001
[4] Bhatia, R. and Elsner, L., The Hoffman-Wielandt inequality in infinite dimensions, Proc. Indian Acad. Sci. Math. Sci.104(3) (1994) 483-494. · Zbl 0805.47017
[5] Blanchard, G. and Krämer, N., Optimal learning rates for kernel conjugate gradient regression, in Advances in Neural Information Processing Systems eds. Lafferty, J. D., Williams, C. K. I., Shawe-Taylor, J., Zemel, R. S., and Culotta, A., Vol. 23 (Curran Associates Inc., 2010) pp. 226-234.
[6] Blanchard, G. and Krämer, N., Convergence rates of kernel conjugate gradient for random design regression, Anal. Appl.14(6) (2016) 763-794. · Zbl 1349.62125
[7] Caponnetto, A. and De Vito, E., Optimal rates for the regularized least-squares algorithm, Found. Comput. Math.7(3) (2007) 331-368. · Zbl 1129.68058
[8] Chang, X., Lin, S.-B. and Zhou, D.-X., Distributed semi-supervised learning with kernel ridge regression, J. Mach. Learn. Res.18 (2017) 1493-1514. · Zbl 1431.68106
[9] Chaudhuri, K., Monteleoni, C. and Sarwate, A. D., Differentially private empirical risk minimization, J. Mach. Learn. Res.12 (2011) 1069-1109. · Zbl 1280.62073
[10] Coifman, R. R. and Lafon, S., Diffusion maps, Appl. Comput. Harmon. Anal.21(1) (2006) 5-30. · Zbl 1095.68094
[11] Cucker, F. and Zhou, D.-X., Learning Theory: An Approximation Theory Viewpoint, , Vol. 24 (Cambridge University Press, Cambridge, 2007). · Zbl 1274.41001
[12] X. Guo, T. Hu, and Q. Wu, Distributed minimum error entropy algorithms, preprint (2019). · Zbl 1517.68330
[13] Guo, X. and Zhou, D.-X., An empirical feature-based learning algorithm producing sparse approximations, Appl. Comput. Harmon. Anal.32(3) (2012) 389-400. · Zbl 1319.62119
[14] Guo, Z.-C., Lin, S.-B. and Zhou, D.-X., Learning theory of distributed spectral algorithms, Inverse Problems33(7) (2017) 074009. · Zbl 1372.65162
[15] Guo, Z.-C., Shi, L. and Wu, Q., Learning theory of distributed regression with bias corrected regularization kernel network, J. Mach. Learn. Res.18(118) (2017) 1-25. · Zbl 1435.68260
[16] Hoffman, A. J. and Wielandt, H. W., The variation of the spectrum of a normal matrix, Duke Math. J.20 (1953) 37-39. · Zbl 0051.00903
[17] Hu, T., Fan, J., Wu, Q. and Zhou, D.-X., Regularization schemes for minimum error entropy principle, Anal. Appl. (Singap.)13(4) (2015) 437-455. · Zbl 1329.68216
[18] Kato, T., Variation of discrete spectra, Comm. Math. Phys.111(3) (1987) 501-504. · Zbl 0632.47002
[19] Lin, S.-B., Guo, X. and Zhou, D.-X., Distributed learning with regularized least squares, J. Mach. Learn. Res.18(92) (2017) 1-31. · Zbl 1435.68273
[20] Lin, S.-B. and Zhou, D.-X., Distributed kernel-based gradient descent algorithms, Constr. Approx.47(2) (2018) 249-276. · Zbl 1390.68542
[21] Little, G. and Reade, J. B., Eigenvalues of analytic kernels, SIAM J. Math. Anal.15(1) (1984) 133-136. · Zbl 0536.45004
[22] J. Liu, C. Yang, Y. Jiao and J. Huang, ssLasso: A summary-statistic-based regression using Lasso, preprint (2017).
[23] Pinelis, I., Optimum bounds for the distributions of martingales in Banach spaces, Ann. Probab.22(4) (1994) 1679-1706. · Zbl 0836.60015
[24] Reade, J. B., Eigenvalues of positive definite kernels. II, SIAM J. Math. Anal.15(1) (1984) 137-142. · Zbl 0555.45001
[25] Shi, L., Distributed learning with indefinite kernels, Anal. Appl. (2019) 1-29. · Zbl 1440.68238
[26] Smale, S. and Zhou, D.-X., Learning theory estimates via integral operators and their approximations, Constr. Approx.26(2) (2007) 153-172. · Zbl 1127.68088
[27] Steinwart, I. and Christmann, A., Support Vector Machines, (Springer, New York, 2008). · Zbl 1203.68171
[28] Steinwart, I., Hush, D. R. and Scovel, C., Optimal rates for regularized least squares regression, in COLT 2009 — The 22nd Conf. Learning Theory, Montreal, Quebec, Canada, , June 18-21 (2009), pp. 79-93.
[29] Wahba, G., Spline Models for Observational Data, (SIAM, Philadelphia, 1990). · Zbl 0813.62001
[30] Wang, C. and Hu, T., Online minimum error entropy algorithm with unbounded sampling, Anal. Appl. (Singap.)17(2) (2019) 293-322. · Zbl 1410.68327
[31] Yao, Y., Rosasco, L. and Caponnetto, A., On early stopping in gradient descent learning, Constr. Approx.26(2) (2007) 289-315. · Zbl 1125.62035
[32] Zhang, T., Effective dimension and generalization of kernel learning, in Advances in Neural Information Processing Systems (MIT Press, Cambridge, MA, 2002), pp. 454-461.
[33] Zhao, Y., Fan, J. and Shi, L., Learning rates for regularized least squares ranking algorithm, Anal. Appl. (Singap.)15(6) (2017) 815-836. · Zbl 1420.68182
[34] Zhu, X. and Stephens, M., Bayesian large-scale multiple regression with summary statistics from genome-wide association studies, Ann. Appl. Stat.11(3) (2017) 1561-1592. · Zbl 1380.62263
[35] Zwald, L. and Blanchard, G., On the convergence of eigenspaces in kernel principal component analysis, in Advances in Neural Information Processing Systems (MIT Press, Cambridge, MA, 2006), pp. 1649-1656.
[36] Zwald, L., Blanchard, G., Massart, P. and Vert, R., Kernel projection machine: A new tool for pattern recognition, in Advances in Neural Information Processing Systems (MIT Press, Cambridge, MA, 2005), pp. 1649-1656.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.