×

Learning under \((1 + \epsilon)\)-moment conditions. (English) Zbl 1442.62150

Summary: We study the theoretical underpinning of a robust empirical risk minimization (RERM) scheme which has been finding numerous successful applications across various data science fields owing to its robustness to outliers and heavy-tailed noises. The specialties of RERM lie in its nonconvexity and that it is induced by a loss function with an integrated scale parameter trading off the robustness and the prediction accuracy. The nonconvexity of RERM and the integrated scale parameter also bring barriers when assessing its learning performance theoretically. In this paper, concerning the study of RERM, we make the following main contributions. First, we establish a no-free-lunch result, showing that there is no hope of distribution-free learning of the truth without adjusting the scale parameter. Second, by imposing the \((1 + \epsilon)\)-th (with \(\epsilon > 0)\) order moment condition on the response variable, we establish a comparison theorem that characterizes the relation between the excess generalization error of RERM and its prediction error. Third, with a diverging scale parameter, we establish almost sure convergence rates for RERM under the \((1 + \epsilon)\)-moment condition. Notably, the \((1 + \epsilon)\)-moment condition allows the presence of noise with infinite variance. Last but not least, the learning theory analysis of RERM conducted in this study, on one hand, showcases the merits of RERM on robustness and the trade-off role that the scale parameter plays, and on the other hand, brings us inspirational insights into robust machine learning.

MSC:

62J02 General nonlinear regression
62G35 Nonparametric robustness
68T05 Learning and adaptive systems in artificial intelligence

Software:

robustbase
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Acerbi, Luigi; Vijayakumar, Sethu; Wolpert, Daniel M., On the origins of suboptimality in human probabilistic inference, PLoS Comput. Biol., 10, 6, Article e1003661 pp. (2014)
[2] Acerbi, Luigi; Vijayakumar, Sethu; Wolpert, Daniel M., Target uncertainty mediates sensorimotor error correction, PLoS ONE, 12, 1, Article e0170466 pp. (2017)
[3] Allen-Zhu, Zeyuan; Li, Yuanzhi; Liang, Yingyu, Learning and generalization in overparameterized neural networks, going beyond two layers, (Advances in Neural Information Processing Systems (2019)), 6155-6166
[4] Bao, Linchao; Yang, Qingxiong, Robust piecewise-constant smoothing: M-smoother revisited (2014), preprint
[5] Bartlett, Peter L.; Harvey, Nick; Liaw, Christopher; Mehrabian, Abbas, Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks, J. Mach. Learn. Res., 20, 63, 1-17 (2019) · Zbl 1489.62302
[6] Boughorbel, Faysal; Koschan, Andreas; Abidi, Besma; Abidi, Mongi, Gaussian fields: a new criterion for 3d rigid registration, Pattern Recognit., 37, 7, 1567-1571 (2004)
[7] Boughorbel, Faysal; Mercimek, Muharrem; Koschan, Andreas; Abidi, Mongi, A new method for the registration of three-dimensional point-sets: the Gaussian fields framework, Image Vis. Comput., 28, 1, 124-137 (2010)
[8] Brownlees, Christian; Joly, Emilien; Lugosi, Gábor, Empirical risk minimization for heavy-tailed losses, Ann. Stat., 43, 6, 2507-2536 (2015) · Zbl 1326.62066
[9] Candes, Emmanuel J., Modern statistical estimation via oracle inequalities, Acta Numer., 15, 257-325 (2006) · Zbl 1141.62001
[10] Cashaback, Joshua G. A.; McGregor, Heather R.; Mohatarem, Ayman; Gribble, Paul L., Dissociating error-based and reinforcement-based loss functions during sensorimotor learning, PLoS Comput. Biol., 13, 7, Article e1005623 pp. (2017)
[11] Chui, Charles K.; Han, Ningning; Mhaskar, Hrushikesh N., Theory inspired deep network for instantaneous-frequency extraction and signal components recovery from discrete blind-source data (2020), preprint
[12] Chui, Charles K.; Lin, Shao-Bo; Zhou, Ding-Xuan, Deep neural networks for rotation-invariance approximation and learning (2019), preprint · Zbl 1423.68378
[13] Chui, Charles K.; Mhaskar, Hrushikesh N., Deep nets for local manifold learning, Front. Appl. Math. Stat., 4, 12 (2018)
[14] Cucker, Felipe; Zhou, Ding-Xuan, Learning Theory: An Approximation Theory Viewpoint (2007), Cambridge University Press · Zbl 1274.41001
[15] Cudney, Elizabeth A.; Drain, David; Sharma, Naresh K., Determining the optimum manufacturing target using the inverted normal loss function, Int. J. Qual. Eng. Technol., 2, 2, 173-184 (2011)
[16] Dennis, John E.; Welsch, Roy E., Techniques for nonlinear least squares and robust regression, Commun. Stat., Simul. Comput., 7, 4, 345-359 (1978) · Zbl 0395.62046
[17] Drain, David; Gough, Andrew M., Applications of the upside-down normal loss function, IEEE Trans. Semicond. Manuf., 9, 1, 143-145 (1996)
[18] Eldar, Eran; Niv, Yael, Interaction between emotional state and learning underlies mood instability, Nat. Commun., 6, 1, 1-10 (2015)
[19] Fan, Jun; Hu, Ting; Wu, Qiang; Zhou, Ding-Xuan, Consistency analysis of an empirical minimum error entropy algorithm, Appl. Comput. Harmon. Anal., 41, 1, 164-189 (2016) · Zbl 1382.94034
[20] Feng, Yunlong; Fan, Jun; Suykens, Johan A. K., A statistical learning approach to modal regression, J. Mach. Learn. Res., 21, 2, 1-35 (2020) · Zbl 1497.68418
[21] Feng, Yunlong; Huang, Xiaolin; Shi, Lei; Yang, Yuning; Suykens, Johan A. K., Learning with the maximum correntropy criterion induced losses for regression, J. Mach. Learn. Res., 16, 993-1034 (2015) · Zbl 1351.62131
[22] Feng, Yunlong; Ying, Yiming, Learning with correntropy-induced losses for regression with mixture of symmetric stable noise, Appl. Comput. Harmon. Anal., 48, 2, 795-810 (2020) · Zbl 1436.62308
[23] Ham, Bumsub; Cho, Minsu; Ponce, Jean, Robust guided image filtering using nonconvex potentials, IEEE Trans. Pattern Anal. Mach. Intell., 40, 1, 192-207 (2018)
[24] Hampel, Frank R.; Ronchetti, Elvezio M.; Rousseeuw, Peter J.; Stahel, Werner A., Robust Statistics: The Approach Based on Influence Functions (2011), John Wiley & Sons · Zbl 0593.62027
[25] Hasanbelliu, Erion; Sanchez Giraldo, Luis; Príncipe, José C., Information theoretic shape matching, IEEE Trans. Pattern Anal. Mach. Intell., 36, 12, 2436-2451 (2014)
[26] He, Ran; Sun, Zhenan; Tan, Tieniu; Zheng, Wei-Shi, Recovery of corrupted low-rank matrices via half-quadratic based nonconvex minimization, (Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on (2011), IEEE), 2889-2896
[27] He, Ran; Tan, Tieniu; Wang, Liang; Zheng, Wei-Shi, \( \ell_{2 , 1}\)-regularized correntropy for robust feature selection, (IEEE Conference on Computer Vision and Pattern Recognition, 2012 (2012), IEEE), 2504-2511
[28] He, Ran; Zheng, Wei-Shi; Hu, Bao-Gang, Maximum correntropy criterion for robust face recognition, IEEE Trans. Pattern Anal. Mach. Intell., 33, 8, 1561-1576 (2011)
[29] He, Ran; Zheng, Wei-Shi; Tan, Tieniu; Sun, Zhenan, Half-quadratic-based iterative minimization for robust sparse representation, IEEE Trans. Pattern Anal. Mach. Intell., 36, 2, 261-275 (2014)
[30] Holland, Paul W.; Welsch, Roy E., Robust regression using iteratively reweighted least-squares, Commun. Stat., Theory Methods, 6, 9, 813-827 (1977) · Zbl 0376.62035
[31] Hsu, Daniel; Sabato, Sivan, Heavy-tailed regression with a generalized median-of-means, (International Conference on Machine Learning (2014)), 37-45
[32] Hu, Ting; Fan, Jun; Wu, Qiang; Zhou, Ding-Xuan, Learning theory approach to minimum error entropy criterion, J. Mach. Learn. Res., 14, 377-397 (2013) · Zbl 1320.62096
[33] Huber, Peter J., Robust estimation of a location parameter, Ann. Math. Stat., 35, 1, 73-101 (1964) · Zbl 0136.39805
[34] Huber, Peter J.; Ronchetti, Elvezio, Robust Statistics (2009), Wiley · Zbl 1276.62022
[35] Jiang, Yunlu; Ji, Qinghua; Xie, Baojian, Robust estimation for the varying coefficient partially nonlinear models, J. Comput. Appl. Math., 326, 31-43 (2017) · Zbl 1368.62137
[36] Khan, Faisal; Wang, Hangzhou; Yang, Ming, Application of loss functions in process economic risk assessment, Chem. Eng. Res. Des., 111, 371-386 (2016)
[37] Köksoy, Onur; Fan, Shu-Kai S., An upside-down normal loss function-based method for quality improvement, Eng. Optim., 44, 8, 935-945 (2012)
[38] Körding, Konrad Paul; Wolpert, Daniel M., The loss function of sensorimotor learning, Proc. Natl. Acad. Sci., 101, 26, 9839-9842 (2004)
[39] Lecué, Guillaume; Lerasle, Matthieu, Robust machine learning by median-of-means: theory and practice (2017), preprint · Zbl 1435.62175
[40] Leung, Bartholomew P. K.; Spiring, Fred A., Some properties of the family of inverted probability loss functions, Qual. Technol. Quant. Manag., 1, 1, 125-147 (2004)
[41] Li, Shaomin; Wang, Kangning; Ren, Yanyan, Robust estimation and empirical likelihood inference with exponential squared loss for panel data models, Econ. Lett., 164, 19-23 (2018) · Zbl 1401.62077
[42] Lin, Shao-Bo; Chui, Charles K.; Zhou, Ding-Xuan, Deep net tree structure for balance of capacity and approximation ability, Front. Appl. Math. Statist., 5, 46 (2019)
[43] Liu, Weifeng; Pokharel, Puskal P.; Príncipe, José C., Correntropy: properties and applications in non-Gaussian signal processing, IEEE Trans. Signal Process., 55, 11, 5286-5298 (2007) · Zbl 1390.94277
[44] Lu, Canyi; Tang, Jinhui; Lin, Min; Lin, Liang; Yan, Shuicheng; Lin, Zhouchen, Correntropy induced l2 graph for robust subspace clustering, (Proceedings of the IEEE International Conference on Computer Vision (2013)), 1801-1808
[45] Lv, Jing; Yang, Hu; Guo, Chaohui, An efficient and robust variable selection method for longitudinal generalized linear models, Comput. Stat. Data Anal., 82, 74-88 (2015) · Zbl 1507.62123
[46] Ma, Jiayi; Zhao, Ji; Ma, Yong; Tian, Jinwen, Non-rigid visible and infrared face registration via regularized gaussian fields criterion, Pattern Recognit., 48, 3, 772-784 (2015)
[47] Mallat, Stéphane, Understanding deep convolutional networks, Philos. Trans. R. Soc. A, Math. Phys. Eng. Sci., 374, 2065, Article 20150203 pp. (2016)
[48] Maronna, Ricardo; Martin, Douglas; Yohai, Victor, Robust Statistics: Theory and Methods (2006), John Wiley & Sons: John Wiley & Sons Chichester · Zbl 1094.62040
[49] Meer, Peter; Mintz, Doron; Rosenfeld, Azriel; Kim, Dong Yoon, Robust regression methods for computer vision: a review, Int. J. Comput. Vis., 6, 1, 59-70 (1991)
[50] Middleton, David, An Introduction to Statistical Communication Theory (1960), McGraw-Hill: McGraw-Hill New York · Zbl 0111.32501
[51] Middleton, David, Non-Gaussian Statistical Communication Theory (2012), John Wiley & Sons · Zbl 1262.94004
[52] Mrázek, Pavel; Weickert, Joachim; Bruhn, Andrés, On robust estimation and smoothing with spatial and tonal kernels, (Geometric Properties for Incomplete Data (2006), Springer), 335-352
[53] Poggio, Tomaso; Mhaskar, Hrushikesh; Rosasco, Lorenzo; Miranda, Brando; Liao, Qianli, Why and when can deep-but not shallow-networks avoid the curse of dimensionality: a review, Int. J. Autom. Comput., 14, 5, 503-519 (2017)
[54] Qomi, Naghizadeh M.; Nematollahi, Nader; Parsian, Ahmad, Estimation after selection under reflected normal loss function, Commun. Stat., Theory Methods, 41, 6, 1040-1051 (2012) · Zbl 1237.62031
[55] Deutsch, Ralph, Estimation Theory (1965), Prentice Hall: Prentice Hall Cambridge, Massachusetts · Zbl 0141.34304
[56] Spiring, Fred A., The reflected normal loss function, Can. J. Stat., 21, 3, 321-330 (1993) · Zbl 0788.62089
[57] Spiring, Fred A.; Yeung, Anthony S., A general class of loss functions with industrial applications, J. Qual. Technol., 30, 2, 152-162 (1998)
[58] Steinwart, Ingo; Christmann, Andreas, Support Vector Machines (2008), Springer: Springer New York · Zbl 1203.68171
[59] Steinwart, Ingo; Hush, Don R.; Scovel, Clint, Optimal rates for regularized least squares regression, (COLT (2009)) · Zbl 1127.68090
[60] Sun, Feng-Bin; Laramée, Jean-Yves; Ramberg, John S., On Spiring’s normal loss function, Can. J. Stat., 24, 2, 241-249 (1996) · Zbl 0858.62092
[61] Wang, Gang; Chen, Yufei; Zheng, Xiangwei, Gaussian field consensus: a robust nonparametric matching method for outlier rejection, Pattern Recognit., 74, 305-316 (2018)
[62] Wang, Gang; Wang, Zhicheng; Chen, Yufei; Zhou, Qiangqiang; Zhao, Weidong, Removing mismatches for retinal image registration via multi-attribute-driven regularized mixture model, Inf. Sci., 372, 492-504 (2016)
[63] Wang, Gang; Zhou, Qiangqiang; Chen, Yufei, Robust non-rigid point set registration using spatially constrained Gaussian fields, IEEE Trans. Image Process., 26, 4, 1759-1769 (2017) · Zbl 1409.94605
[64] Wang, Kangning; Lin, Lu, Robust structure identification and variable selection in partial linear varying coefficient models, J. Stat. Plan. Inference, 174, 153-168 (2016) · Zbl 1338.62098
[65] Wang, Xueqin; Jiang, Yunlu; Huang, Mian; Zhang, Heping, Robust variable selection with exponential squared loss, J. Am. Stat. Assoc., 108, 502, 632-643 (2013) · Zbl 06195966
[66] Wang, Zengmao; Du, Bo; Zhang, Lefei; Zhang, Liangpei; Fang, Meng; Tao, Dacheng, Multi-label active learning based on maximum correntropy criterion: towards robust and discriminative labeling, (European Conference on Computer Vision (2016), Springer), 453-468
[67] Weiss, Lionel, Estimation with a Gaussian gain function, Stat. Decis., supp. issue, 47-59 (1984) · Zbl 0558.62025
[68] Weiss, Lionel, Estimating normal means with symmetric gain functions, Stat. Probab. Lett., 6, 1, 7-9 (1987) · Zbl 0638.62027
[69] Weiss, Lionel, Estimating multivariate normal means using a class of bounded loss functions, Stat. Risk. Model., 6, 3, 203-208 (1988) · Zbl 0677.62050
[70] Wu, Qiang; Ying, Yiming; Zhou, Ding-Xuan, Learning rates of least-square regularized regression, Found. Comput. Math., 6, 2, 171-192 (2006) · Zbl 1100.68100
[71] Xu, Jie; Luo, Lei; Deng, Cheng; Huang, Heng, New robust metric learning model using maximum correntropy criterion, (Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018), ACM), 2555-2564
[72] Yu, Ping; Zhu, Zhongyi; Zhang, Zhongzhan, Robust exponential squared loss-based estimation in semi-functional linear regression models, Comput. Stat., 34, 2, 503-525 (2019) · Zbl 1417.62103
[73] Zeybek, Melis; Köksoy, Onur, Optimization of correlated multi-response quality engineering by the upside-down normal loss function, Eng. Optim., 48, 8, 1419-1431 (2016)
[74] Zhang, Yingya; Sun, Zhenan; He, Ran; Tan, Tieniu, Robust subspace clustering via half-quadratic minimization, (Proceedings of the IEEE International Conference on Computer Vision (2013)), 3096-3103
[75] Zhou, Ding-Xuan, Deep distributed convolutional neural networks: universality, Anal. Appl., 16, 06, 895-919 (2018) · Zbl 1442.68214
[76] Zhou, Ding-Xuan, Universality of deep convolutional neural networks, Appl. Comput. Harmon. Anal., 48, 2, 787-794 (2020) · Zbl 1434.68531
[77] Zurcher, Jim, The use of a Gaussian cost function in piecewise linear modelling, (SMC’98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No. 98CH36218), Vol. 2 (1998), IEEE), 1417-1420
[78] Zurcher, Jim, A Gaussian based piecewise linear network, IFAC Proc. Vol., 32, 2, 419-424 (1999)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.