×

Regression with comparisons: escaping the curse of dimensionality with ordinal information. (English) Zbl 1525.68129

Summary: In supervised learning, we typically leverage a fully labeled dataset to design methods for function estimation or prediction. In many practical situations, we are able to obtain alternative feedback, possibly at a low cost. A broad goal is to understand the usefulness of, and to design algorithms to exploit, this alternative feedback. In this paper, we consider a semi-supervised regression setting, where we obtain additional ordinal (or comparison) information for the unlabeled samples. We consider ordinal feedback of varying qualities where we have either a perfect ordering of the samples, a noisy ordering of the samples or noisy pairwise comparisons between the samples. We provide a precise quantification of the usefulness of these types of ordinal feedback in both nonparametric and linear regression, showing that in many cases it is possible to accurately estimate an underlying function with a very small labeled set, effectively escaping the curse of dimensionality. We also present lower bounds, that establish fundamental limits for the task and show that our algorithms are optimal in a variety of settings. Finally, we present extensive experiments on new datasets that demonstrate the efficacy and practicality of our algorithms and investigate their robustness to various sources of noise and model misspecification.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62-08 Computational methods for problems pertaining to statistics
62F07 Statistical ranking and selection procedures
62G08 Nonparametric regression and quantile regression
62J02 General nonlinear regression
62J15 Paired and multiple comparisons; multiple testing

Software:

FaceNet
PDFBibTeX XMLCite
Full Text: arXiv Link

References:

[1] Arpit Agarwal, Shivani Agarwal, Sepehr Assadi, and Sanjeev Khanna. Learning with limited rounds of adaptivity: Coin tossing, multi-armed bandits, and ranking from pairwise comparisons. InConference on Learning Theory, pages 39-75, 2017.
[2] E Agustsson, R Timofte, S Escalera, X Baro, I Guyon, and R Rothe. Apparent and real age estimation in still images with deep residual regressors on appa-real database. In
[3] Pranjal Awasthi, Maria Florina Balcan, and Philip M Long. The power of localization for efficiently learning linear separators with noise. InProceedings of the forty-sixth annual · Zbl 1315.68162
[4] Pranjal Awasthi, Maria-Florina Balcan, Nika Haghtalab, and Hongyang Zhang. Learning and 1-bit compressed sensing under asymmetric noise. InConference on Learning Theory, pages 152-192, 2016.
[5] R.E. Barlow.Statistical Inference Under Order Restrictions: The Theory and Application of Isotonic Regression. J. Wiley, 1972. · Zbl 0246.62038
[6] Andrew R. Barron.Complexity Regularization with Application to Artificial Neural Networks, chapter 7, pages 561-576. Springer Netherlands, 1991. · Zbl 0739.62001
[7] Pierre C Bellec. Sharp oracle inequalities for least squares estimators in shape restricted regression.The Annals of Statistics, 46(2):745-780, 2018. · Zbl 1408.62066
[8] Pierre C Bellec and Alexandre B Tsybakov. Sharp oracle bounds for monotone and convex regression through aggregation.Journal of Machine Learning Research, 16:1879-1892, 2015. · Zbl 1351.62088
[9] Ralph Allan Bradley and Milton E Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons.Biometrika, 39(3/4):324-345, 1952. · Zbl 0047.12903
[10] Mark Braverman and Elchanan Mossel. Sorting from noisy information.arXiv preprint arXiv:0910.1191, 2009. · Zbl 1192.94077
[11] Rui M Castro and Robert D Nowak. Minimax bounds for active learning.IEEE Transactions on Information Theory, 54(5):2339-2353, 2008. · Zbl 1330.68246
[12] Sabyasachi Chatterjee, Adityanand Guntuboyina, and Bodhisattva Sen. On risk bounds in isotonic and other shape restricted regression problems.Ann. Statist., 43(4):1774-1800, 08 2015. · Zbl 1317.62032
[13] Kamalika Chaudhuri and Sanjoy Dasgupta. Rates of convergence for the cluster tree. In Advances in Neural Information Processing Systems, pages 343-351, 2010.
[14] Kamalika Chaudhuri, Sham M Kakade, Praneeth Netrapalli, and Sujay Sanghavi. Convergence rates of active learning for maximum likelihood estimation. InAdvances in Neural
[15] Cecil C Craig. On the Tchebychef inequality of Bernstein.The Annals of Mathematical Statistics, 4(2):94-102, 1933. · Zbl 0007.02303
[16] Sanjoy Dasgupta and Michael Luby. Learning from partial correction.arXiv preprint arXiv:1705.08076, 2017.
[17] Persi Diaconis and Ronald L Graham. Spearman’s footrule as a measure of disarray.Journal of the Royal Statistical Society. Series B (Methodological), pages 262-268, 1977. · Zbl 0375.62045
[18] Felix A. Faber, Alexander Lindmaa, O. Anatole von Lilienfeld, and Rickard Armiento. Machine learning energies of 2 million Elpasolite(ABC2D6)crystals.Physical Review
[19] Edgar N Gilbert. A comparison of signalling alphabets.Bell System Technical Journal, 31 (3):504-522, 1952.
[20] L´aszl´o Gy¨orfi, Michael Kohler, Adam Krzyzak, and Harro Walk.A distribution-free theory of nonparametric regression. Springer Science & Business Media, 2006.
[21] Qiyang Han, Tengyao Wang, Sabyasachi Chatterjee, and Richard J Samworth. Isotonic regression in general dimensions.arXiv preprint arXiv:1708.09468, 2017. · Zbl 1437.62124
[22] Steve Hanneke.Theoretical foundations of active learning. ProQuest, 2009. · Zbl 1327.68193
[23] Wolfgang Karl H¨ardle, Marlene M¨uller, Stefan Sperlich, and Axel Werwatz.Nonparametric and semiparametric models. Springer Science & Business Media, 2012.
[24] Thorsten Joachims. Optimizing search engines using clickthrough data. InProceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining,
[25] Daniel M Kane, Shachar Lovett, Shay Moran, and Jiapeng Zhang. Active classification with comparison queries.arXiv preprint arXiv:1704.03564, 2017.
[26] L´aszl´o Lov´asz and Santosh Vempala. The geometry of logconcave functions and sampling algorithms.Random Structures & Algorithms, 30(3):307-358, 2007. · Zbl 1122.65012
[27] R Duncan Luce.Individual choice behavior: A theoretical analysis. Courier Corporation, 2005. · Zbl 1191.91003
[28] Robin L Plackett. The analysis of permutations.Applied Statistics, pages 193-202, 1975.
[29] Stefanos Poulis and Sanjoy Dasgupta. Learning with Feature Feedback: from Theory to Practice. InProceedings of the 20th International Conference on Artificial Intelligence
[30] Harish G Ramaswamy, Ambuj Tewari, Shivani Agarwal, et al. Consistent algorithms for multiclass classification with an abstain option.Electronic Journal of Statistics, 12(1): 530-554, 2018. · Zbl 1473.62229
[31] Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. InProceedings of the IEEE conference on computer
[32] Matthew Schultz and Thorsten Joachims. Learning a distance metric from relative comparisons. InAdvances in neural information processing systems, pages 41-48, 2004.
[33] Nihar Shah, Sivaraman Balakrishnan, Joseph Bradley, Abhay Parekh, Kannan Ramchandran, and Martin Wainwright. Estimation from pairwise comparisons: Sharp minimax bounds with topology dependence.Journal of Machine Learning Research, 2016a. · Zbl 1360.62409
[34] Nihar Shah, Sivaraman Balakrishnan, Aditya Guntuboyina, and Martin Wainwright. Stochastically transitive models for pairwise comparisons: Statistical and computational issues. · Zbl 1364.94253
[35] Nihar B. Shah, Sivaraman Balakrishnan, and Martin J. Wainwright. A permutation-based model for crowd labeling: Optimal estimation and robustness, 2016c.
[36] Juan Luis Su´arez, Salvador Garc´ıa, and Francisco Herrera. A tutorial on distance metric learning: Mathematical foundations, algorithms and software.arXiv preprint arXiv:1812.05944, 2018.
[37] Louis L Thurstone. A law of comparative judgment.Psychological review, 34(4):273, 1927.
[38] Kristi Tsukida and Maya R Gupta. How to analyze paired comparison data. Technical report, WASHINGTON UNIV SEATTLE DEPT OF ELECTRICAL ENGINEERING, 2011.
[39] Alexander B Tsybakov. Optimal aggregation of classifiers in statistical learning.The Annals of Statistics, 32(1):135-166, 2004. · Zbl 1105.62353
[40] Alexandre B. Tsybakov.Introduction to Nonparametric Estimation. Springer Publishing Company, Incorporated, 1st edition, 2008. ISBN 0387790519, 9780387790510. · Zbl 1176.62032
[41] RR Varshamov. Estimate of the number of signals in error correcting codes. InDokl. Akad. Nauk SSSR, 1957.
[42] Rebecca Willett, Robert Nowak, and Rui M Castro. Faster rates in regression via active learning. InAdvances in Neural Information Processing Systems, pages 179-186, 2006.
[43] Eric P. Xing, Michael I. Jordan, Stuart J Russell, and Andrew Y. Ng. Distance metric learning with application to clustering with side-information. In S. Becker, S. Thrun, and K. Obermayer, editors,Advances in Neural Information Processing Systems 15, pages 521-528. MIT Press, 2003.
[44] Yichong Xu, Hongyang Zhang, Kyle Miller, Aarti Singh, and Artur Dubrawski. Noise-tolerant interactive learning from pairwise comparisons with near-minimal label complexity.arXiv preprint arXiv:1704.05820, 2017.
[45] Dezhen Xue, Prasanna V. Balachandran, John Hogden, James Theiler, Deqing Xue, and Turab Lookman. Accelerated search for materials with targeted properties by adaptive design.Nature Communications, 7, 2016.
[46] Songbai Yan and Chicheng Zhang. Revisiting perceptron: Efficient and label-optimal learning of halfspaces. InAdvances in Neural Information Processing Systems, pages 1056-1066, 2017.
[47] Cun-Hui Zhang. Risk bounds in isotonic regression.The Annals of Statistics, 30(2):528-555, 2002. · Zbl 1012.62045
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.