×

Primal and dual model representations in kernel-based learning. (English) Zbl 06162223

Summary: This paper discusses the role of primal and (Lagrange) dual model representations in problems of supervised and unsupervised learning. The specification of the estimation problem is conceived at the primal level as a constrained optimization problem. The constraints relate to the model which is expressed in terms of the feature map. From the conditions for optimality one jointly finds the optimal model representation and the model estimate. At the dual level the model is expressed in terms of a positive definite kernel function, which is characteristic for a support vector machine methodology. It is discussed how least squares support vector machines are playing a central role as core models across problems of regression, classification, principal component analysis, spectral clustering, canonical correlation analysis, dimensionality reduction and data visualization.

MSC:

62-XX Statistics

Software:

ElemStatLearn
PDF BibTeX XML Cite
Full Text: DOI Euclid

References:

[1] Alzate C. and Suykens J.A.K. (2008). “A Regularized Kernel CCA Contrast Function for ICA”, Neural Networks , 21 (2-3), 170-181. · Zbl 1254.68195
[2] Alzate C. and Suykens J.A.K. (2008). “Kernel Component Analysis using an Epsilon Insensitive Robust Loss Function”, IEEE Transactions on Neural Networks , 19 (9), 1583-1598.
[3] Alzate C. and Suykens J.A.K. (2008). “Sparse Kernel Models for Spectral Clustering using the Incomplete Cholesky Decomposition”, IEEE World Congress on Computational Intelligence (WCCI-IJCNN 2008), Hong Kong, pp. 3555-3562.
[4] Alzate C. and Suykens J.A.K. (2010). “Multiway Spectral Clustering with Out-of-Sample Extensions through Weighted Kernel PCA”, IEEE Transactions on Pattern Analysis and Machine Intelligence , 32 (2): 335-347
[5] Alzate C. and Suykens J.A.K. (2009). “A Regularized Formulation for Spectral Clustering with Pairwise Constraints”, International Joint Conference on Neural Networks (IJCNN 2009), Atlanta, 141-148.
[6] Alzate C. and Suykens J.A.K. (2010). “Highly Sparse Kernel Spectral Clustering with Predictive Out-of-Sample Extensions”, Proc. of the 18th European Symposium on Artificial Neural Networks (ESANN 2010), Bruges, Belgium, pp. 235-240.
[7] Aronszajn N. (1950). “Theory of reproducing kernels”, Trans. American Mathematical Soc. , 68 , 337-404. JSTOR: · Zbl 0037.20701
[8] Bach F.R. and Jordan M.I. (2002). “Kernel independent component analysis”, Journal of Machine Learning Research , 3 , 1-48. · Zbl 1088.68689
[9] Belkin M. and Niyogi P. (2003). “Laplacian Eigenmaps for Dimensionality Reduction and Data Representation”, Neural Computation , 15 (6): 1373-1396. · Zbl 1085.68119
[10] Belkin M., Niyogi P. and Sindhwani V. (2006). “Manifold Regularization: a Geometric Framework for Learning from Labeled and Unlabeled Examples”, Journal of Machine Learning Research , 7 : 2399-2434. · Zbl 1222.68144
[11] Bengio Y., Paiement J.-F., Vincent P., Delalleau O., Le Roux N. and Ouimet M. (2004). “Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering,”, Advances in Neural Information Processing Systems , 16, 2004.
[12] Bissacco A., Chiuso A. and Soatto S. (2007). “Classification and Recognition of Dynamical Models: The Role of Phase, Independent Components, Kernels and Optimal Transport”, IEEE Transactions on Pattern Analysis and Machine Intelligence , 29 (11): 1958-1972.
[13] Bousquet O. and Elisseeff A. (2002). “Stability and Generalization”, Journal of Machine Learning Research , 2 , 499-526. · Zbl 1007.68083
[14] Boyd S. and Vandenberghe L. (2004)., Convex Optimization , Cambridge University Press. · Zbl 1058.90049
[15] Bradley P.S. and Mangasarian O.L. (1998). “Feature Selection via Concave Minimization and Support Vector Machines”, Machine Learning Proceedings of the Fifteenth International Conference (ICML98) , (Ed. J. Shavlik), Morgan Kaufmann, San Francisco, California, 82-90.
[16] Chapelle O., Schölkopf B. and Zien A. (Eds.) (2006)., Semi-Supervised Learning , MIT Press.
[17] Chung F.R.K. (1997)., Spectral Graph Theory , CBMS Regional Conference Series in Mathematics, 92. · Zbl 0867.05046
[18] Coifman R.R. and Lafon S. (2006). “Diffusion maps”, Applied and Computational Harmonic Analysis , 21 (1), 5-30. · Zbl 1095.68094
[19] Cortes C. and Vapnik V. (1995). “Support vector networks”, Machine Learning, 20 , 273-297. · Zbl 0831.68098
[20] Cressie N. (1993)., Statistics for Spatial Data , John Wiley & Sons, New York. · Zbl 0799.62002
[21] Cristianini N. and Shawe-Taylor J. (2000)., An Introduction to Support Vector Machines , Cambridge University Press. · Zbl 0994.68074
[22] Cucker F. and Smale S. (2002). “On the mathematical foundations of learning theory”, Bulletin of the AMS , 39 , 1-49. · Zbl 0983.68162
[23] Cucker F. and Zhou D.-X. (2007)., Learning Theory: an Approximation Theory Viewpoint , Cambridge University Press. · Zbl 1274.41001
[24] Daubechies I., DeVore R., Fornasier M. and Gunturk S. (2010). “Iteratively re-weighted least squares minimization for sparse recovery”, Communications on Pure and Applied Mathematics , 63 (1): 1-38. · Zbl 1202.65046
[25] De Brabanter K., Pelckmans K., De Brabanter J., Debruyne M., Suykens J.A.K., Hubert M. and De Moor B. (2009) “Robustness of Kernel Based Regression: a Comparison of Iterative Weighting Schemes”, Proc. of the 19th International Conference on Artificial Neural Networks (ICANN 2009) , Limassol, Cyprus, pp. 100-110.
[26] De Brabanter K., De Brabanter J., Suykens J.A.K. and De Moor B. (2010), “Optimized Fixed-Size Kernel Models for Large Data Sets”, Computational Statistics & Data Analysis , 54 (6): 1484-1504. · Zbl 1284.62369
[27] Debruyne M., Hubert M. and Suykens J.A.K. (2008). “Model selection for kernel regression using the influence function”, Journal of Machine Learning Research , 9 , 2377-2400. · Zbl 1225.62051
[28] Debruyne M., Christmann A., Hubert M. and Suykens J.A.K. (2010). “Robustness of reweighted least squares kernel based regression”, Journal of Multivariate Analysis , 101 (2): 447-463. · Zbl 1178.62035
[29] Donoho D.L. and Grimes C. (2003). “Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data”, Proc. Natl. Acad. Sci. U.S.A , 100 (10): 5591-5596. · Zbl 1130.62337
[30] Espinoza M., Suykens J.A.K. and De Moor B. (2006). “LS-SVM Regression with Autocorrelated Errors”, Proc. of the 14th IFAC Symposium on System Identification (SYSID 2006), Newcastle, Australia, pp. 582-587.
[31] Espinoza M., Suykens J.A.K., Belmans R. and De Moor B. (2007). “Electric Load Forecasting using kernel based modeling for nonlinear system identification”, IEEE Control Systems Magazine , 27 (5), 43-57. · Zbl 1395.93417
[32] Falck T., Pelckmans K., Suykens J.A.K. and De Moor B. (2009). “Identification of Wiener-Hammerstein Systems using LS-SVMs”, Proceedings of the 15th IFAC Symposium on System Identification (SYSID 2009) , Saint-Malo, France, pp. 820-825.
[33] Fung G. and Mangasarian O.L. (2001). “Proximal support vector machine classifiers”, Proceedings KDD-2001 , 77-86. · Zbl 1101.68758
[34] Fung G. and Mangasarian O.L. (2005). “Multicategory proximal support vector machine classifiers”, Machine Learning , 59 : 77-97. · Zbl 1101.68758
[35] Girolami M. (2002). “Orthogonal series density estimation and the kernel eigenvalue problem”, Neural Computation , 14 (3), 669-688. · Zbl 0986.62021
[36] Golub G.H., Heath M. and Wahba G. (1979). “Generalized cross-validation as a method for choosing a good ridge regression parameter”, Technometrics , 21 , 215-223. JSTOR: · Zbl 0461.62059
[37] Goethals I., Pelckmans K., Suykens J.A.K. and De Moor B. (2005). “Identification of MIMO Hammerstein Models using Least Squares Support Vector Machines”, Automatica , 41 (7), 1263-1272. · Zbl 1086.93064
[38] Goethals I., Pelckmans K., Suykens J.A.K. and De Moor B. (2005). “Subspace Identification of Hammerstein Systems using Least Squares Support Vector Machines”, IEEE Transactions on Automatic Control , 50 (10), 1509-1519. · Zbl 1086.93064
[39] Gretton A., Herbrich R., Smola A., Bousquet O. and Schölkopf B. (2005). “Kernel Methods for Measuring Independence”, Journal of Machine Learning Research , 6 , 2075-2129. · Zbl 1222.68208
[40] Hastie T. and Tibshirani R. (1990)., Generalized Additive Models , Chapman & Hall/CRC. · Zbl 0747.62061
[41] Hastie T., Tibshirani R. and Friedman J. (2001)., The Elements of Statistical Learning: Data Mining, Inference, and Prediction , Springer series in statistics. · Zbl 0973.62007
[42] Huber P.J. (1981)., Robust Statistics , Wiley, New York. · Zbl 0536.62025
[43] Jetter K., Smale S. and Zhou D.-X. (Eds.) (2008)., Learning Theory and Approximation , Oberwolfach Reports, 5 (3): 1655-1706. · Zbl 1177.68114
[44] Jolliffe I.T. (1986)., Principal Component Analysis , Springer Series in Statistics, Springer-Verlag. · Zbl 0584.62009
[45] Kimeldorf G.S. and Wahba G. (1971). “A correspondence between Bayesian estimation on stochastic processes and smoothing by splines”, Ann. Math. Statist. , 2 , 495-502. · Zbl 0193.45201
[46] Kohonen T. (1990). “The self-organizing map,”, Proceedings of the IEEE , 78 (9): 1464-1480.
[47] Luenberger D.G. (1969)., Optimization by vector space methods , Wiley, New York. · Zbl 0176.12701
[48] Luts J., Suykens J.A.K. and Van Huffel S. (2007). “Semi-supervised learning: avoiding zero label assumptions in kernel based classifiers”, Internal Report 07-122, ESAT-SISTA, K.U.Leuven (Leuven, Belgium).
[49] Luts J., Ojeda F., Van de Plas R., De Moor B., Van Huffel S. and Suykens J.A.K. (2010). “A tutorial on support vector machine-based methods for classification problems in chemometrics”, Analytica Chimica Acta , 66 (2): 129-145.
[50] MacKay D.J.C. (1998). “Introduction to Gaussian processes” in, Neural networks and machine learning (Ed. C.M. Bishop), Springer NATO-ASI Series F: Computer and Systems Sciences, Vol.168, 133-165.
[51] Mangasarian O.L. and Wild E.W. (2008). “Nonlinear Knowledge-Based Classification”, IEEE Transactions on Neural Networks , 19 , 1826-1832.
[52] Meila M. and Shi J. (2001). “A random walks view of spectral segmentation”, Artificial Intelligence and Statistics (AISTATS 2001).
[53] Mercer J. (1909). “Functions of positive and negative type and their connection with the theory of integral equations”, Philos. Trans. Roy. Soc. London , 209 , 415-446. · JFM 40.0408.02
[54] Ng A.Y., Jordan M.I. and Weiss Y. (2002). “On spectral clustering: Analysis and an algorithm”, in T. Dietterich, S. Becker and Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems (NIPS), 14.
[55] Ojeda F., Suykens J.A.K. and De Moor B. (2008). “Low rank updated LS-SVM classifiers for fast variable selection”, Neural Networks , 21 (2-3), 437-449. · Zbl 1254.68218
[56] Parzen E. (1970). “Statistical inference on time series by RKHS methods”, Dep. Statist. Stanford Univ. Tech. Rep.14, Jan. · Zbl 0253.60053
[57] Pelckmans K., Espinoza M., De Brabanter J., Suykens J.A.K. and De Moor B. (2005). “Primal-Dual Monotone Kernel Regression”, Neural Processing Letters , 22 (2), 171-182.
[58] Pelckmans K., Goethals I., De Brabanter J., Suykens J.A.K. and De Moor B. (2005). “Componentwise Least Squares Support Vector Machines”, Chapter in, Support Vector Machines: Theory and Applications , (Wang L., ed.), Springer, 2005, pp. 77-98. · Zbl 1086.93064
[59] Pelckmans K., Suykens J.A.K. and De Moor B. (2005). “Building Sparse Representations and Structure Determination on LS-SVM Substrates”, Neurocomputing , 64 , 137-159.
[60] Pelckmans K., Suykens J.A.K. and De Moor B. (2006). “Additive Regularization Trade-off: Fusion of Training and Validation levels in Kernel Methods”, Machine Learning , 62 (3), 217-252.
[61] Perez-Cruz F., Bousono-Calzon C. and Artes-Rodriguez A. (2005). “Convergence of the IRWLS Procedure to the Support Vector Machine Solution”, Neural Computation , 17 (1), 7-18. · Zbl 1092.68649
[62] Poggio T. and Girosi F. (1990). “Networks for approximation and learning”, Proceedings of the IEEE , 78 (9), 1481-1497. · Zbl 1226.92005
[63] Poggio T., Rifkin R., Mukherjee S. and Niyogi P. (2004). “General conditions for predictivity in learning theory”, Nature , 428 (6981), 419-422.
[64] Rasmussen C.E. and Williams C.K.I. (2006)., Gaussian Processes for Machine Learning , MIT Press. · Zbl 1177.68165
[65] Rousseeuw P.J. and Leroy A. (1997)., Robust Regression and Outlier Detection , John Wiley & Sons, New York. · Zbl 0711.62030
[66] Roweis S. and Saul L. (2000). “Nonlinear dimensionality reduction by locally linear embedding”, Science , 290 (5500), 2323-2326.
[67] Saunders C., Gammerman A. and Vovk V. (1998). “Ridge regression learning algorithm in dual variables”, Proc. of the 15th Int. Conf. on Machine Learning , Madison-Wisconsin, 515-521.
[68] Schölkopf B., Smola A. and Müller K.-R. (1998). “Nonlinear component analysis as a kernel eigenvalue problem”, Neural Computation , 10 , 1299-1319.
[69] Schölkopf B. and Smola A. (2002)., Learning with Kernels , MIT Press, Cambridge, MA. · Zbl 1019.68094
[70] Schölkopf B., Tsuda K. and Vert J.P. (Eds.) (2004)., Kernel Methods in Computational Biology 400, MIT Press.
[71] Shawe-Taylor J. and Cristianini N. (2004)., Kernel Methods for Pattern Analysis , Cambridge University Press. · Zbl 0994.68074
[72] Shi J. and Malik J. (2000). “Normalized Cuts and Image Segmentation”, IEEE Transactions on Pattern Analysis and Machine Intelligence , 22 (8), 888-905.
[73] Smale S. (1997), “Complexity theory and numerical analysis,”, Acta Numerica , 523-551. · Zbl 0883.65125
[74] Suykens J.A.K. and Vandewalle J. (1999). “Least squares support vector machine classifiers”, Neural Processing Letters , 9 (3), 293-300. · Zbl 0958.93042
[75] Suykens J.A.K. and Vandewalle J. (1999). “Multiclass Least Squares Support Vector Machines”, Proc. of the International Joint Conference on Neural Networks (IJCNN’99) , Washington DC, USA. · Zbl 0958.93042
[76] Suykens J.A.K., De Brabanter J., Lukas L. and Vandewalle J. (2002). “Weighted least squares support vector machines: robustness and sparse approximation”, Neurocomputing , 48 (1-4), 85-105. · Zbl 1006.68799
[77] Suykens J.A.K., Van Gestel T., De Brabanter J., De Moor B. and Vandewalle J. (2002)., Least Squares Support Vector Machines , World Scientific, Singapore. · Zbl 1017.93004
[78] Suykens J.A.K., Horvath G., Basu S., Micchelli C. and Vandewalle J. (Eds.) (2003)., Advances in Learning Theory: Methods, Models and Applications, vol. 190 NATO-ASI Series III: Computer and Systems Sciences, IOS Press.
[79] Suykens J.A.K., Van Gestel T., Vandewalle J. and De Moor B. (2003). “A support vector machine formulation to PCA analysis and its kernel version”, IEEE Transactions on Neural Networks , 14(2): 447-450.
[80] Suykens J.A.K. (2008). “Data Visualization and Dimensionality Reduction using Kernel Maps with a Reference Point”, IEEE Transactions on Neural Networks , 19 (9), 1501-1517.
[81] Tibshirani R. (1996). “Regression shrinkage and selection via the lasso”, J. Royal. Statist. Soc B. , 58 (1), 267-288. JSTOR: · Zbl 0850.62538
[82] Tsuda K., Shin H.J. and Schölkopf B. (2005). “Fast protein classification with multiple networks”, Bioinformatics (ECCB’05), 21 (Suppl.2): ii59-ii65.
[83] Van Belle V., Pelckmans K., Suykens J.A.K. and Van Huffel S. (2010). “Additive survival least squares support vector machines”, Statistics in Medicine , 29 (2): 296-308.
[84] Van Gestel T., Suykens J.A.K., Lanckriet G., Lambrechts A., De Moor B. and Vandewalle J. (2002). “Multiclass LS-SVMs: Moderated outputs and coding-decoding schemes”, Neural Processing Letters , 15 (1): 45-48. · Zbl 1008.68739
[85] Vapnik V. (1998)., Statistical Learning Theory, Wiley, New York. · Zbl 0935.62007
[86] Wahba G. (1990)., Spline Models for Observational Data , Series in Applied Mathematics, 59, SIAM, Philadelphia. · Zbl 0813.62001
[87] Weinberger K.Q. and Saul L.K. (2004). “Unsupervised learning of image manifolds by semidefinite programming,”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR-04), Washington D.C.
[88] Williams C.K.I. and Seeger M. (2001). “Using the Nyström method to speed up kernel machines”, In T.K. Leen, T.G. Dietterich, and V. Tresp (Eds.), Advances in neural information processing systems , 13 , 682-688.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.