×

Supervised dimensionality reduction via distance correlation maximization. (English) Zbl 1390.62105

Summary: In our work, we propose a novel formulation for supervised dimensionality reduction based on a nonlinear dependency criterion called Statistical Distance Correlation [G. J. Székely et al., Ann. Stat. 35, No. 6, 2769–2794 (2007; Zbl 1129.62059)]. We propose an objective which is free of distributional assumptions on regression variables and regression model assumptions. Our proposed formulation is based on learning a low-dimensional feature representation \(\mathbf{z}\), which maximizes the squared sum of Distance Correlations between low-dimensional features \(\mathbf{z}\) and response \(y\), and also between features \(\mathbf{z}\) and covariates \(\mathbf{x}\). We propose a novel algorithm to optimize our proposed objective using the Generalized Minimization Maximization method of [S. N. Parizi et al., “Generalized majorization-minimization”, Preprint, arXiv:1506.07613]. We show superior empirical results on multiple datasets proving the effectiveness of our proposed approach over several relevant state-of-the-art supervised dimensionality reduction methods.

MSC:

62H20 Measures of association (correlation, canonical correlation, etc.)
62G08 Nonparametric regression and quantile regression
62G20 Asymptotic properties of nonparametric inference

Citations:

Zbl 1129.62059

Software:

UCI-ml
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Amari, S.-I. (1998). Natural gradient works efficiently in learning., Neural Computation, 10(2):251-276.
[2] Berrendero José R, C. A. and Torrecilla, J. L. (2014). Variable selection in functional data classification: a maxima-hunting proposal., Statistica Sinica. · Zbl 1356.62079
[3] Borg, I. and Groenen, P. J. (2005)., Modern multidimensional scaling: Theory and applications. Springer Science & Business Media. · Zbl 1085.62079
[4] Buza, K. (2014). Feedback prediction for blogs. In, Data analysis, machine learning and knowledge discovery, pages 145-152. Springer.
[5] Chechik Gal, Globerson Amir, T. N. and Yair, W. (2005). Information bottleneck for gaussian variables., Journal of Machine Learning Research. · Zbl 1222.68166
[6] Chung, F. (1997). Lecture notes on spectral graph theory., Providence, RI: AMS Publications.
[7] Cook, R. and Forzani, L. (2009). Likelihood based sufficient dimension reduction., Journal of the American Statistical Association, 104:197-208. · Zbl 1388.62041
[8] Cook, R. D. (1996). Graphics for regressions with a binary response., Journal of the American Statistical Association, 91(435):983-992. · Zbl 0882.62060
[9] Dinkelbach, W. (1967). On nonlinear fractional programming., Management Science. Journal of the Institute of Management Science. Application and Theory Series, 13(7):492-498. · Zbl 0152.18402
[10] Fukumizu, K. and Leng, C. (2014). Gradient-based kernel dimension reduction for regression., Journal of the American Statistical Association, 109(505):359-370. · Zbl 1367.62118
[11] Graf, F., Kriegel, H.-P., Schubert, M., Pölsterl, S., and Cavallaro, A. (2011). 2d image registration in CT images using radial image descriptors. In, Medical Image Computing and Computer-Assisted Intervention-MICCAI 2011, pages 607-614. Springer.
[12] Harrison, D. and Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air., Journal of environmental economics and management, 5(1):81-102. · Zbl 0375.90023
[13] Kiefer, J. (1953). Sequential minimax search for a maximum., Proceedings of the American Mathematical Society, 4(3):502-506. · Zbl 0050.35702
[14] Kong, J., Wang, S., and Wahba, G. (2015). Using distance covariance for improved variable selection with application to learning genetic risk models., Statistics in medicine, 34(10):1708-1720.
[15] Lange, K. (2013). The MM algorithm. In, Optimization, volume 95 of Springer Texts in Statistics, pages 185-219. Springer New York.
[16] Lange, K., Hunter, D. R., and Yang, I. (2000). Optimization transfer using surrogate objective functions., Journal of Computational and Graphical Statistics, 9(1):1.
[17] Li, K.-C. (1991). Sliced inverse regression for dimension reduction., Journal of the American Statistical Association, 86(414):316-327. · Zbl 0742.62044
[18] Li, R., Zhong, W., and Zhu, L. (2012). Feature screening via distance correlation learning., Journal of the American Statistical Association, 107(499):1129-1139. · Zbl 1443.62184
[19] Lichman, M. (2013). UCI machine learning, repository.
[20] Lue, H. H. (2009). Sliced inverse regression for multivariate response regression., Journal of Statistical Planning and Inference, 139:2656-2664. · Zbl 1162.62366
[21] Nishimori, Y. and Akaho, S. (2005). Learning algorithms utilizing quasi-geodesic flows on the stiefel manifold., Neurocomputing, 67:106-135.
[22] Noam, S. (2002). The information bottleneck: Theory and applications., Ph.D. Thesis: Hebrew University of Jerusalem.
[23] Parizi, S. N., He, K., Sclaroff, S., and Felzenszwalb, P. (2015). Generalized majorization-minimization., arXiv preprint arXiv:1506.07613.
[24] Ravid, S.-Z. and Naftali, T. (2017). Opening the black box of deep neural networks via information., https://arxiv.org/abs/1703.00810.
[25] Schaible, S. (1976). Minimization of ratios., Journal of Optimization Theory and Applications, 19(2):347-352. · Zbl 0308.90041
[26] Shao, Y., Cook, R., and Weisberg, S. (2007). Marginal tests with sliced average variance estimation., Biometrika, 94:285-296. · Zbl 1133.62032
[27] Shao, Y., Cook, R., and Weisberg, S. (2009). Partial central subspace and sliced average variance estimation., Journal of Statistical Planning and Inference, 139:952-961. · Zbl 1156.62032
[28] Sheng, W. and Yin, X. (2016). Sufficient dimension reduction via distance covariance., Journal of Computational and Graphical Statistics, 25(1):91-104.
[29] Sugiyama, M., Suzuki, T., and Kanamori, T. (2012)., Density ratio estimation in machine learning. Cambridge University Press, New York, NY, USA, 1st edition. · Zbl 1274.62037
[30] Suzuki, T. and Sugiyama, M. (2013). Sufficient dimension reduction via squared-loss mutual information estimation., Neural computation, 25(3):725-758. · Zbl 1269.62054
[31] Székely, G. J. and Rizzo, M. L. (2012). On the uniqueness of distance covariance., Statistics & Probability Letters, 82(12):2278-2282. · Zbl 1471.62342
[32] Székely, G. J. and Rizzo, M. L. (2013). The distance correlation t-test of independence in high dimension., Journal of Multivariate Analysis, 117:193-213. · Zbl 1277.62128
[33] Székely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances., Annals of Statistics, 35(6):2769-2794. · Zbl 1129.62059
[34] Székely, G. J., Rizzo, M. L., et al. (2009). Brownian distance covariance., Annals of Applied Statistics, 3(4):1236-1265. · Zbl 1196.62077
[35] Szekely, J. G., Rizzo, L. M., and Bakirov, K. N. (2007). Measuring and testing dependence by correlation of distances., Annals of Statistics, 35:2769-2794. · Zbl 1129.62059
[36] Szretter, M. E. and Yohai, V. J. (2009). The sliced inverse regression algorithm as a maximum likelihood procedure., Journal of Statistical Planning and Inference, 139:3570-3578. · Zbl 1167.62402
[37] Tishby Naftali, P. F. C. and William, B. (1999). The information bottleneck method., The 37th annual Allerton Conference on Communication, Control, and Computing.
[38] Torgerson, W. S. (1952). Multidimensional scaling: I. Theory and method., Psychometrika, 17(4):401-419. · Zbl 0049.37603
[39] Torres-Sospedra, J., Montoliu, R., Martınez-Usó, A., Avariento, J. P., Arnau, T. J., Benedito-Bordonau, M., and Huerta, J. (2014). Ujiindoorloc: A new multi-building and multi-floor database for wlan fingerprint-based indoor localization problems. In, Proceedings of the fifth conference on indoor positioning and indoor navigation.
[40] Vapnik, V., Braga, I., and Izmailov, R. (2015). Constructive setting for problems of density ratio estimation., Statistical Analysis and Data Mining: The ASA Data Science Journal, 8(3):137-146. · Zbl 07260430
[41] Xin Chen, D. C. and Zou, C. (2015). Diagnostic studies in sufficient dimension reduction., Biometrika, 102(3):545-558. · Zbl 1452.62567
[42] Yamada, M., Niu, G., Takagi, J., and Sugiyama, M. (2011). Sufficient component analysis for supervised dimension reduction., arXiv preprint arXiv:1103.4998.
[43] Zhang, A. (2008)., Quadratic fractional programming problems with quadratic constraints. PhD thesis, Kyoto University.
[44] Zhang, Y., Tapia, R., and Velazquez, L. (2000). On convergence of minimization methods: attraction, repulsion, and selection., Journal of Optimization Theory and Applications, 107(3):529-546. · Zbl 0969.90084
[45] Zhou, F., Claire, Q., and King, R. D. (2014). Predicting the geographical origin of music.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.