×

Metric learning via cross-validation. (English) Zbl 1524.62173

Summary: In this paper, we propose a cross-validation metric learning approach to learn a distance metric for dimension reduction in the multiple-index model. We minimize a leave-one-out cross-validation-type loss function, where the unknown link function is approximated by a metric-based kernel-smoothing function. To the best of our knowledge, we are the first to reduce the dimensionality of multiple-index models in a framework of metric learning. The resulting metric contains crucial information on both the central mean subspace and the optimal kernel-smoothing bandwidth. Under weak assumptions on the design of the predictors, we establish asymptotic theories for the consistency and convergence rate of the estimated directions, as well as the optimal rate of the bandwidth. Furthermore, we develop a novel estimation procedure to determine the structural dimension of the central mean subspace. The proposed approach is relatively easy to implement numerically by employing fast gradient-based algorithms. Various empirical studies illustrate its advantages over other existing methods.

MSC:

62G08 Nonparametric regression and quantile regression
62G07 Density estimation
62H12 Estimation in multivariate analysis
62G20 Asymptotic properties of nonparametric inference
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Akritas, M. G. (2016). Projection pursuit multi-index (PPMI) models. Statist. Probab. Lett. 114, 99-103. · Zbl 1360.62308
[2] Alquier, P. and Biau, G. (2013). Sparse single-index model. J. Mach. Learn. Res. 14, 243-280. · Zbl 1320.62177
[3] Bellet, A., Habrard, A. and Sebban, M. (2013). A survey on metric learning for feature vectors and structured data. arXiv preprint arXiv:1306.6709.
[4] Bura, E. (2003). Using linear smoothers to assess the structural dimension of regressions. Statist. Sinica 13, 143-162. · Zbl 1017.62034
[5] Bura, E. and Cook, R. D. (2001a). Estimating the structural dimension of regressions via para-metric inverse regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 63, 393-410. · Zbl 0979.62041
[6] Bura, E. and Cook, R. D. (2001b). Extending sliced inverse regression: The weighted chi-squared test. J. Amer. Statist. Assoc. 96, 996-1003. · Zbl 1047.62035
[7] Cannings, T. I. and Samworth, R. J. (2017). Random-projection ensemble classification. J. R. Stat. Soc. Ser. B Stat. Methodol. 79, 959-1035. · Zbl 1373.62301
[8] Carroll, R. J., Fan, J., Gijbels, I. and Wand, M. P. (1997). Generalized partially linear single-index models. J. Amer. Statist. Assoc. 92, 477-489. · Zbl 0890.62053
[9] Chen, D., Hall, P. and Müller, H.-G. (2011). Single and multiple index functional regression models with nonparametric link. Ann. Statist. 39, 1720-1747. · Zbl 1220.62040
[10] Conn, D. and Li, G. (2019). An oracle property of the Nadaraya-Watson kernel estimator for high dimensional nonparametric regression. Scand. J. Statist. 46, 735-764. · Zbl 1435.62142
[11] Cook, R. D. (1998). Principal Hessian directions revisited. J. Amer. Statist. Assoc. 93, 84-94. · Zbl 0922.62057
[12] Cook, R. D. and Li, B. (2002). Dimension reduction for conditional mean in regression. Ann. Statist. 30, 455-474. · Zbl 1012.62035
[13] Cook, R. D. and Li, B. (2004). Determining the dimension of iterative Hessian transformation. Ann. Statist. 32, 2501-2531. · Zbl 1069.62033
[14] Cook, R. D. and Ni, L. (2005). Sufficient dimension reduction via inverse regression: A minimum discrepancy approach. J. Amer. Statist. Assoc. 100, 410-428. · Zbl 1117.62312
[15] Cortes, C. and Vapnik, V. (1995). Support-vector networks. Mach. Learn. 20, 273-297. · Zbl 0831.68098
[16] Cover T. and Hart, P. (1967). Nearest neighbor pattern classification. IEEE Trans. Inform. Theory 13, 21-27. · Zbl 0154.44505
[17] Dalalyan, A. S., Juditsky, A. and Spokoiny, V. (2008). A new algorithm for estimating the effective dimension-reduction subspace. J. Mach. Learn. Res. 9, 1647-1678. · Zbl 1225.62091
[18] Goldberger, J., Roweis, S., Hinton, G. and Salakhutdinov, R. (2005). Neighbourhood compo-nents analysis. In Advances in NIPS 17 (Edited by L. K. Saul, Y. Weiss and L. Bottou), 513-520. MIT Press, Cambridge.
[19] Hall, P. (1989). On projection pursuit regression. Ann. Statist. 17, 573-588. · Zbl 0698.62041
[20] Härdle, W. and Stoker, T. M. (1989). Investigating smooth multiple regression by the method of average derivatives. J. Amer. Statist. Assoc. 84, 986-995. · Zbl 0703.62052
[21] Härdle, W., Hall, P. and Ichimura, H. (1993). Optimal smoothing in single-index models. Ann. Statist. 21, 157-178. · Zbl 0770.62049
[22] Hristache, M., Juditsky, A., Polzehl, J. and Spokoiny, V. (2001). Structure adaptive approach for dimension reduction. Ann. Statist. 29, 1537-1566. · Zbl 1043.62052
[23] Li, K.-C. (1991). Sliced inverse regression for dimension reduction (with Discussion). J. Amer. Statist. Assoc. 86, 316-342. · Zbl 0742.62044
[24] Li, K. C. (1992). On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. J. Amer. Statist. Assoc. 87, 1025-1039. · Zbl 0765.62003
[25] Luo, R., Wang, H. and Tsai, C. L. (2009). Contour projected dimension reduction. Ann. Statist. 37, 3743-3778. · Zbl 1360.62184
[26] Luo, W. and Li, B. (2016). Combining eigenvalues and variation of eigenvectors for order deter-mination. Biometrika 103, 875-887. · Zbl 1506.62304
[27] Ma, Y. and Zhu, L. P. (2012). A semiparametric approach to dimension reduction. J. Amer. Statist. Assoc. 107, 168-179. · Zbl 1261.62037
[28] Ma, Y. and Zhu, L. (2014). On estimation efficiency of the central mean subspace. J. R. Stat. Soc. Ser. B Stat. Methodol. 76, 885-901. · Zbl 1411.62104
[29] Noh, Y. K., Sugiyama, M., Kim, K. E., Park, F. and Lee, D. D. (2017). Generative local metric learning for kernel regression. In Advances in NIPS 30 (Edited by I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan and R. Garnett), 2452-2462. Curran Associates, Montreal.
[30] Samarov, A. M. (1993). Exploring regression structure using nonparametric functional estima-tion. J. Amer. Statist. Assoc. 88, 836-847. · Zbl 0790.62035
[31] Shao, Y., Cook, R. D. and Weisberg, S. (2007). Marginal tests with sliced average variance estimation. Biometrika 94, 285-296. · Zbl 1133.62032
[32] Wang, H. and Xia, Y. (2008). Sliced regression for dimension reduction. J. Amer. Statist. Assoc. 103, 811-821. · Zbl 1306.62168
[33] Weinberger, K. Q., Blitzer, J. and Saul, L. K. (2006). Distance metric learning for large margin nearest neighbor classification. In Advances in NIPS 18 (Edited by Y. Weiss, B. Scholkopf and J. Platt), 1478-1480. MIT Press, Cambridge.
[34] Weinberger, K. Q. and Saul, L. K. (2009). Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207-244. · Zbl 1235.68204
[35] Weinberger, K. Q. and Tesauro, G. (2007). Metric learning for kernel regression. In Proceedings of International Conference on AISTATS, 612-619.
[36] Xia, Q., Xu, W. and Zhu, L. (2015). Consistently determining the number of factors in multi-variate volatility modelling. Statist. Sinica 25, 1025-1044. · Zbl 1415.62067
[37] Xia, Y. (2007). A constructive approach to the estimation of dimension reduction directions. Ann. Statist. 35, 2654-2690. · Zbl 1360.62196
[38] Xia, Y. (2008). A multiple-index model and dimension reduction. J. Amer. Statist. Assoc. 103, 1631-1640. · Zbl 1286.62021
[39] Xia, Y., Tong, H., Li, W. K. and Zhu, L. X. (2002). An adaptive estimation of dimension reduction space (with Discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 64, 363-410. · Zbl 1091.62028
[40] Xing, E. P., Jordan, M. I., Russell, S. J. and Ng, A. Y. (2003). Distance metric learning with application to clustering with side-information. In Advances in NIPS 15 (Edited by S. Becker, S. Thrun and K. Obermayer), 521-528. MIT Press, Cambridge.
[41] Ye, Z. and Weiss, R. E. (2003). Using the bootstrap to select one of a new class of dimension reduction methods. J. Am. Statist. Assoc. 98, 968-979. · Zbl 1045.62034
[42] Yin, X. and Cook, R. D. (2006). Dimension reduction via marginal high moments in regression. Statist. Probab. Lett. 76, 393-400. · Zbl 1118.62059
[43] Zhang, S., Guo, B., Dong, A., He, J., Xu, Z. and Chen, S. X. (2017). Cautionary tales on air-quality improvement in Beijing. Proc. Math. Phys. Eng. Sci. 473, 20170457.
[44] Zhu, L. P. and Zhu, L. X. (2007). On kernel method for sliced average variance estimation. J. Multivariate Anal. 98, 970-991. · Zbl 1113.62044
[45] Zhu, L. P., Yu, Z. and Zhu, L. X.(2010). A sparse eigen-decomposition estimation in semi-parametric regression. Comput. Statist. Data Anal. 54, 976-986. · Zbl 1464.62202
[46] Zhu, L. X., Miao, B. Q. and Peng, H. (2006). On sliced inverse regression with large dimensional covariates. J. Amer. Statist. Assoc. 101, 630-643. · Zbl 1119.62331
[47] Zhu, Y. and Zeng, P. (2006). Fourier methods for estimating the central subspace and the central mean subspace in regression. J. Amer. Statist. Assoc. 101, 1638-1651. · Zbl 1171.62325
[48] Zhu, X., Guo, X., Wang, T. and Zhu, L. X. (2020). Dimensionality determination: A threshold-ing double ridge ratio approach. Comput. Statist. Data Anal. 146, 106910. · Zbl 1510.62200
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.