Convex and non-convex approaches for statistical inference with class-conditional noisy labels. (English) Zbl 07306869

Summary: We study the problem of estimation and testing in logistic regression with class-conditional noise in the observed labels, which has an important implication in the Positive-Unlabeled (PU) learning setting. With the key observation that the label noise problem belongs to a special sub-class of generalized linear models (GLM), we discuss convex and non-convex approaches that address this problem. A non-convex approach based on the maximum likelihood estimation produces an estimator with several optimal properties, but a convex approach has an obvious advantage in optimization. We demonstrate that in the low-dimensional setting, both estimators are consistent and asymptotically normal, where the asymptotic variance of the non-convex estimator is smaller than the convex counterpart. We also quantify the efficiency gap which provides insight into when the two methods are comparable. In the high-dimensional setting, we show that both estimation procedures achieve \(\ell_2\)-consistency at the minimax optimal \(\sqrt{s\log p/n}\) rates under mild conditions. Finally, we propose an inference procedure using a de-biasing approach. We validate our theoretical findings through simulations and a real-data example.


68T05 Learning and adaptive systems in artificial intelligence
Full Text: arXiv Link


[1] D. Angluin and P. Laird. Learning from noisy examples.Mach. Learn., 2(4):343-370, Apr. 1988.
[2] A. Beck.First-Order Methods in Optimization. SIAM, Oct. 2017. · Zbl 1384.65033
[3] C. R. Bollinger and M. H. David. Modeling discrete choice with response error: Food stamp participation.J. Am. Stat. Assoc., 92(439):827-835, 1997. · Zbl 0889.62098
[4] J. Bootkrajang and A. Kab´an. Label-Noise robust logistic regression and its applications. InMachine Learning and Knowledge Discovery in Databases, pages 143-158. Springer Berlin Heidelberg, 2012.
[5] R. J. Carroll, D. Ruppert, L. A. Stefanski, and C. M. Crainiceanu.Measurement Error in Nonlinear Models: A Modern Perspective, Second Edition. Chapman and Hall/CRC, 2 edition edition, June 2006. · Zbl 1119.62063
[6] A. T. Chaganty and P. Liang. Estimating Latent-Variable graphical models using moments and likelihoods. InProceedings of the 31st International Conference on Machine Learning, volume 32 ofProceedings of Machine Learning Research, pages 1872-1880, Bejing, China, Jan. 2014.
[7] R. Dezeure, P. B¨uhlmann, L. Meier, and N. Meinshausen. High-Dimensional inference: Confidence intervals,p-Values and R-Software hdi.Stat. Sci., 30(4):533-558, 2015. · Zbl 1426.62183
[8] M. Du Plessis, G. Niu, and M. Sugiyama. Convex formulation for learning from positive and unlabeled data. InInternational Conference on Machine Learning, pages 1386-1394, June 2015.
[9] C. Elkan and K. Noto. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08, pages 213-220, New York, NY, USA, 2008.
[10] L. Fahrmeir and G. Tutz.Multivariate Statistical Modelling Based on Generalized Linear Models. Springer Series in Statistics. Springer-Verlag New York, 2 edition, 2001. · Zbl 0980.62052
[11] D. M. Fowler and S. Fields. Deep mutational scanning: a new style of protein science.Nat. Methods, 11(8):801-807, Aug. 2014.
[12] B. Fr´enay and M. Verleysen. Classification in the presence of label noise: a survey.IEEE Trans Neural Netw Learn Syst, 25(5):845-869, May 2014.
[13] V. P. Godambe. An optimum property of regular maximum likelihood estimation.Ann. Math. Stat., 31(4):1208-1211, 1960. · Zbl 0118.34301
[14] T. Hastie and W. Fithian. Inference from presence-only data; the ongoing controversy. Ecography, 36(8):864-867, Aug. 2013.
[15] J. A. Hausman, J. Abrevaya, and F. M. Scott-Morton. Misclassification of the dependent variable in a discrete-response setting.J. Econom., 87(2):239-269, Dec. 1998. · Zbl 0943.62116
[16] P. G. Ipeirotis, F. Provost, and J. Wang. Quality management on amazon mechanical turk. InProceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP ’10, pages 64-67, New York, NY, USA, 2010.
[17] S. Jain, M. White, and P. Radivojac. Recovering true classifier performance in positiveunlabeled learning. InThirty-First AAAI Conference on Artificial Intelligence, 2017.
[18] T. Kato.Perturbation theory for linear operators. Springer Science & Business Media, June 2013.
[19] K. Knight and W. Fu. Asymptotics for lasso-type estimators.Ann. Stat., 28(5):1356-1378, Oct. 2000. · Zbl 1105.62357
[20] A. Kontorovich. Concentration in unbounded metric spaces and algorithmic stability. In International Conference on Machine Learning, pages 28-36, Jan. 2014.
[21] A. H. Li and J. Bradic. Boosting in the presence of outliers: Adaptive classification with nonconvex loss functions.Journal of the American Statistical Association, 113(522):660- 674, 2018. · Zbl 1398.62167
[22] B. Liu, Y. Dai, X. Li, W. S. Lee, and P. S. Yu. Building text classifiers using positive and unlabeled examples. InThird IEEE International Conference on Data Mining, pages 179-186, Nov. 2003.
[23] Convex and non-convex approaches for inference with class-conditional noisy labels
[24] P.-L. Loh. Statistical consistency and asymptotic normality for high-dimensional robust M-estimators.Ann. Stat., 45(2):866-896, Apr. 2017. · Zbl 1371.62023
[25] P.-L. Loh and M. J. Wainwright. High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity.Ann. Stat., 40(3):1637-1664, June 2012. · Zbl 1257.62063
[26] P.-L. Loh and M. J. Wainwright. Regularized m-estimators with nonconvexity: Statistical and algorithmic theory for local optima.J. Mach. Learn. Res., 16(1):559-616, Jan. 2015. · Zbl 1360.62276
[27] R. H. Lyles, L. Tang, H. M. Superak, C. C. King, D. D. Celentano, Y. Lo, and J. D. Sobel. Validation data-based adjustments for outcome misclassification in logistic regression: an illustration.Epidemiology, 22(4):589-597, July 2011.
[28] L. S. Magder and J. P. Hughes. Logistic regression when the outcome is measured with uncertainty.Am. J. Epidemiol., 146(2):195-203, July 1997.
[29] P. McCullagh and J. A. Nelder.Generalized Linear Models, Second Edition. CRC Press, Aug. 1989. · Zbl 0744.62098
[30] S. Mei, Y. Bai, and A. Montanari. The landscape of empirical risk for nonconvex losses. Ann. Stat., 46(6A):2747-2774, Dec. 2018. · Zbl 1409.62117
[31] R. Morton. Efficiency of estimating equations and the use of pivots.Biometrika, 68(1): 227-233, 1981. ISSN 0006-3444. · Zbl 0469.62023
[32] N. Natarajan, I. S. Dhillon, P. Ravikumar, and A. Tewari. Cost-Sensitive learning with noisy labels.J. Mach. Learn. Res., 18(155):1-33, 2018. · Zbl 1467.68151
[33] S. N. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. A unified framework for High-Dimensional analysis ofM-Estimators with decomposable regularizers.Stat. Sci., 27(4):538-557, Nov. 2012. · Zbl 1331.62350
[34] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming.SIAM J. Optim., 19(4):1574-1609, Jan. 2009. · Zbl 1189.90109
[35] J. M. Neuhaus. Bias and efficiency loss due to misclassified responses in binary regression. Biometrika, 86(4):843-855, 1999. · Zbl 0942.62074
[36] W. K. Newey and D. McFadden. Chapter 36 large sample estimation and hypothesis testing. InHandbook of Econometrics, volume 4, pages 2111-2245. Elsevier, Jan. 1994.
[37] M. S. Pepe. Inference using surrogate outcome data and a validation sample.Biometrika, 79(2):355-365, June 1992. · Zbl 0751.62049
[38] G. Raskutti, M. J. Wainwright, and B. Yu.Minimax rates of estimation for HighDimensional linear regression Over‘q-Balls.IEEE Trans. Inf. Theory, 57(10):6976-6994, Oct. 2011. · Zbl 1365.62276
[39] P. A. Romero, T. M. Tran, and A. R. Abate. Dissecting enzyme function with microfluidicbased deep mutational scanning.Proc. Natl. Acad. Sci. U. S. A., 112(23):7159-7164, June 2015.
[40] D. Sculley and G. V. Cormack. Filtering email spam in the presence of noisy user feedback. InCEAS, 2008.
[41] J. Shao.Mathematical Statistics. Springer Science & Business Media, July 2003. · Zbl 1018.62001
[42] R. R. Singhania, A. K. Patel, R. K. Sukumaran, C. Larroche, and A. Pandey. Role and significance of beta-glucosidases in the hydrolysis of cellulose for bioethanol production. Bioresour. Technol., 127:500-507, Jan. 2013.
[43] P. Smyth, U. M. Fayyad, M. C. Burl, P. Perona, and P. Baldi. Inferring ground truth from subjective labelling of venus images. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors,Advances in Neural Information Processing Systems 7, pages 1085-1092. MIT Press, 1995.
[44] H. Song and G. Raskutti. PUlasso: High-dimensional variable selection with presence-only data.J. Am. Stat. Assoc., pages 1-41, Dec. 2018. · Zbl 1437.62239
[45] S. van de Geer, P. Bhlmann, Y. Ritov, and R. Dezeure. On asymptotically optimal confidence regions and tests for high-dimensional models.Ann. Statist., 42(3):1166-1202, 06 2014. · Zbl 1305.62259
[46] A. van den Hout and P. G. M. van der Heijden. Randomized response, statistical disclosure control and misclassification: A review.International Statistical Review / Revue Internationale de Statistique, 70(2):269-288, 2002. · Zbl 1217.62011
[47] A. W. van der Vaart.Asymptotic Statistics. Cambridge University Press, 1998. · Zbl 0910.62001
[48] R. Vershynin.High-Dimensional Probability by Roman Vershynin. Cambridge University Press, Sept. 2018.
[49] G. Ward, T. Hastie, S. Barry, J. Elith, and J. R. Leathwick. Presence-only data and the em algorithm.Biometrics, 65(2):554-563, June 2009. · Zbl 1167.62098
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.