Principal support vector machines for linear and nonlinear sufficient dimension reduction. (English) Zbl 1246.62153

Summary: We introduce a principal support vector machine (PSVM) approach that can be used for both linear and nonlinear sufficient dimension reduction. The basic idea is to divide the response variables into slices and use a modified form of support vector machines to find the optimal hyperplanes that separate them. These optimal hyperplanes are then aligned by the principal components of their normal vectors. It is proved that the aligned normal vectors provide an unbiased, \(\surd \)n-consistent, and asymptotically normal estimator of the sufficient dimension reduction space. The method is then generalized to nonlinear sufficient dimension reduction using the reproducing kernel Hilbert space. In that context, the aligned normal vectors become functions and it is proved that they are unbiased in the sense that they are functions of the true nonlinear sufficient predictors. We compare PSVM with other sufficient dimension reduction methods by simulation and in real data analysis, and through both comparisons firmly establish its practical advantages.


62H25 Factor analysis and principal components; correspondence analysis
68T05 Learning and adaptive systems in artificial intelligence
46N30 Applications of functional analysis in probability theory and statistics
62A09 Graphical methods in statistics
62G08 Nonparametric regression and quantile regression
62H12 Estimation in multivariate analysis


Full Text: DOI arXiv Euclid


[1] Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337-404. · Zbl 0037.20701
[2] Artemiou, A. A. (2010). Topics on supervised and unsupervised dimension reduction. Ph.D. thesis, Pennsylvania State Univ., University Park, PA.
[3] Bickel, P., Klaassen, C. A. J., Ritov, Y. and Wellner, J. (1993). Efficient and Adaptive Inference in Semi-Parametric Models . Johns Hopkins Univ. Press, Baltimore. · Zbl 0786.62001
[4] Bura, E. and Pfeiffer, R. (2008). On the distribution of the left singular vectors of a random matrix and its applications. Statist. Probab. Lett. 78 2275-2280. · Zbl 1146.62011
[5] Conway, J. B. (1990). A Course in Functional Analysis , 2nd ed. Graduate Texts in Mathematics 96 . Springer, New York. · Zbl 0706.46003
[6] Cook, R. D. (1994). Using dimension-reduction subspaces to identify important inputs in models of physical systems. In Proc. Section on Physical and Engineering Sciences 18-25. Amer. Statist. Assoc., Alexandria, VA.
[7] Cook, R. D. (1996). Graphics for regressions with a binary response. J. Amer. Statist. Assoc. 91 983-992. · Zbl 0882.62060
[8] Cook, R. D. (1998). Regression Graphics : Ideas for Studying Regressions Through Graphics . Wiley, New York. · Zbl 0903.62001
[9] Cook, R. D. (2007). Fisher lecture: Dimension reduction in regression. Statist. Sci. 22 1-26. · Zbl 1246.62148
[10] Cook, R. D. and Forzani, L. (2008). Principal fitted components for dimension reduction in regression. Statist. Sci. 23 485-501. · Zbl 1329.62274
[11] Cook, R. D. and Li, B. (2002). Dimension reduction for conditional mean in regression. Ann. Statist. 30 455-474. · Zbl 1012.62035
[12] Cook, R. D. and Ni, L. (2005). Sufficient dimension reduction via inverse regression: A minimum discrepancy approach. J. Amer. Statist. Assoc. 100 410-428. · Zbl 1117.62312
[13] Cook, R. D. and Weisberg, S. (1991). Discussion of “Sliced inverse regression for dimension reduction,” by K.-C. Li. J. Amer. Statist. Assoc. 86 316-342. · Zbl 1353.62037
[14] Eaton, M. L. (1986). A characterization of spherical distributions. J. Multivariate Anal. 20 272-276. · Zbl 0596.62057
[15] Fukumizu, K., Bach, F. R. and Jordan, M. I. (2004). Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. J. Mach. Learn. Res. 5 73-99. · Zbl 1222.62069
[16] Fukumizu, K., Bach, F. R. and Jordan, M. I. (2009). Kernel dimension reduction in regression. Ann. Statist. 37 1871-1905. · Zbl 1168.62049
[17] Fung, W. K., He, X., Liu, L. and Shi, P. (2002). Dimension reduction based on canonical correlation. Statist. Sinica 12 1093-1113. · Zbl 1004.62058
[18] Gretton, A., Bousquet, O., Smola, A. and Schölkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. In 16 th International Conference on Algorithmic Learning Theory (S. Jain, H. U. Simon and E. Tomita, eds.). Lecture Notes in Computer Science 3734 63-77. Springer, Berlin. · Zbl 1168.62354
[19] Hall, P. and Li, K.-C. (1993). On almost linearity of low-dimensional projections from high-dimensional data. Ann. Statist. 21 867-889. · Zbl 0782.62065
[20] Hsing, T. and Ren, H. (2009). An RKHS formulation of the inverse regression dimension-reduction problem. Ann. Statist. 37 726-755. · Zbl 1162.62053
[21] Jiang, B., Zhang, X. and Cai, T. (2008). Estimating the confidence interval for prediction errors of support vector machine classifiers. J. Mach. Learn. Res. 9 521-540. · Zbl 1225.68189
[22] Karatzoglou, A. and Meyer, D. (2006). Support vector machines in R. J. Stat. Softw. 15 9.
[23] Karatzoglou, A., Smola, A., Hornik, K. and Zeileis, A. (2004). Kernlab-an S4 package for kernel methods in R. J. Stat. Software 11 9.
[24] Kurdila, A. J. and Zabarankin, M. (2005). Convex Functional Analysis . Birkhäuser, Basel. · Zbl 1077.46002
[25] Kutner, M. H., Nachtsheim, C. J. and Neter, J. (2004). Applied Linear Regression Models , 4th ed. McGraw-Hill/Irwin, Boston.
[26] Li, K.-C. (1991). Sliced inverse regression for dimension reduction (with discussion). J. Amer. Statist. Assoc. 86 316-342. · Zbl 0742.62044
[27] Li, K.-C. (1992). On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. J. Amer. Statist. Assoc. 87 1025-1039. · Zbl 0765.62003
[28] Li, B. (2000). Nonparametric estimating equations based on a penalized information criterion. Canad. J. Statist. 28 621-639. · Zbl 1072.62554
[29] Li, B. (2001). On quasi likelihood equations with non-parametric weights. Scand. J. Stat. 28 577-602. · Zbl 1010.62027
[30] Li, B. and Dong, Y. (2009). Dimension reduction for nonelliptically distributed predictors. Ann. Statist. 37 1272-1298. · Zbl 1160.62050
[31] Li, K.-C. and Duan, N. (1989). Regression analysis under link violation. Ann. Statist. 17 1009-1052. · Zbl 0753.62041
[32] Li, B. and Wang, S. (2007). On directional regression for dimension reduction. J. Amer. Statist. Assoc. 102 997-1008. · Zbl 1469.62300
[33] Li, B., Zha, H. and Chiaromonte, F. (2005). Contour regression: A general approach to dimension reduction. Ann. Statist. 33 1580-1616. · Zbl 1078.62033
[34] Li, Y. and Zhu, L.-X. (2007). Asymptotics for sliced average variance estimation. Ann. Statist. 35 41-69. · Zbl 1114.62053
[35] Loh, W.-Y. (2002). Regression trees with unbiased variable selection and interaction detection. Statist. Sinica 12 361-386. · Zbl 0998.62042
[36] Magnus, J. R. and Neudecker, H. (1979). The commutation matrix: Some properties and applications. Ann. Statist. 7 381-394. · Zbl 0414.62040
[37] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464. · Zbl 0379.62005
[38] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3 . Cambridge Univ. Press, Cambridge. · Zbl 0910.62001
[39] Vapnik, V. N. (1998). Statistical Learning Theory . Wiley, New York. · Zbl 0935.62007
[40] Wang, Y. (2008). Nonlinear dimension reduction in feature space. Ph.D. thesis, Pennsylvania State Univ., University Park, PA.
[41] Wang, Q. and Yin, X. (2008). A nonlinear multi-dimensional variable selection method for high dimensional data: Sparse MAVE. Comput. Statist. Data Anal. 52 4512-4520. · Zbl 1452.62136
[42] Weidmann, J. (1980). Linear Operators in Hilbert Spaces. Graduate Texts in Mathematics 68 . Springer, New York. · Zbl 0434.47001
[43] Wu, H.-M. (2008). Kernel sliced inverse regression with applications to classification. J. Comput. Graph. Statist. 17 590-610.
[44] Wu, Q., Liang, F. and Mukherjee, S. (2008). Regularized sliced inverse regression for kernel models. Technical report, Duke Univ., Durham, NC.
[45] Xia, Y., Tong, H., Li, W. K. and Zhu, L.-X. (2002). An adaptive estimation of dimension reduction space. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 363-410. · Zbl 1091.62028
[46] Yeh, Y.-R., Huang, S.-Y. and Lee, Y.-Y. (2009). Nonlinear dimension reduction with kernel sliced inverse regression. IEEE Transactions on Knowledge and Data Engineering 21 1590-1603.
[47] Yin, X. and Cook, R. D. (2002). Dimension reduction for the conditional k th moment in regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 159-175. · Zbl 1067.62042
[48] Yin, X., Li, B. and Cook, R. D. (2008). Successive direction extraction for estimating the central subspace in a multiple-index regression. J. Multivariate Anal. 99 1733-1757. · Zbl 1144.62030
[49] Zhu, L., Miao, B. and Peng, H. (2006). On sliced inverse regression with high-dimensional covariates. J. Amer. Statist. Assoc. 101 630-643. · Zbl 1119.62331
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.