zbMATH — the first resource for mathematics

Selecting the length of a principal curve within a Gaussian model. (English) Zbl 1337.62074
Summary: Principal curves are parameterized curves passing “through the middle” of a data cloud. These objects constitute a way of generalization of the notion of first principal component in Principal Component Analysis. Several definitions of principal curve have been proposed, one of which can be expressed as a least-square minimization problem. In the present paper, adopting this definition, we study a Gaussian model selection method for choosing the length of the principal curve, in order to avoid interpolation, and obtain a related oracle-type inequality. The proposed method is practically implemented and illustrated on cartography problems.

MSC:
 62G08 Nonparametric regression and quantile regression 62G05 Nonparametric estimation 62H25 Factor analysis and principal components; correspondence analysis
CAPUSHE
Full Text:
References:
 [1] H. Akaike. Information theory and an extension of the maximum likelihood principle. In, Proceedings of the 2nd International Symposium on Information Theory , pages 267-281, 1973. · Zbl 0283.62006 [2] S. Arlot and P. Massart. Data-driven calibration of penalties for least-squares regression., Journal of Machine Learning Research , 10:245-279, 2009. [3] J. D. Banfield and A. E. Raftery. Ice floe identification in satellite images using mathematical morphology and clustering about principal curves., Journal of the American Statistical Association , 87:7-16, 1992. [4] A. Barron, L. Birgé, and P. Massart. Risk bounds for model selection via penalization., Probability Theory and Related Fields , 113:301-413, 1999. · Zbl 0946.62036 [5] J.-P. Baudry, C. Maugis, and B. Michel. Slope heuristics: overview and implementation., Statistics and Computing , 22:455-470, 2012. · Zbl 1322.62007 [6] G. Biau and A. Fischer. Parameter selection for principal curves., IEEE Transactions on Information Theory , 58 :1924-1939, 2012. · Zbl 1365.62262 [7] L. Birgé and P. Massart. From model selection to adaptive estimation. In D. Pollard, E. Torgersen, and G. Yang, editors, Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics , pages 55-87. Springer, New York, 1997. · Zbl 0920.62042 [8] L. Birgé and P. Massart. Gaussian model selection., Journal of the European Mathematical Society , 3:203-268, 2001. · Zbl 1037.62001 [9] L. Birgé and P. Massart. Minimal penalties for Gaussian model selection., Probability Theory and Related Fields , 138:33-73, 2007. · Zbl 1112.62082 [10] C. Brunsdon. Path estimation from GPS tracks. In, Proceedings of the 9th International Conference on GeoComputation, National Centre for Geocomputation, National University of Ireland, Maynooth, Eire , 2007. [11] B. S. Caffo, C. M. Crainiceanu, L. Deng, and C. W. Hendrix. A case study in pharmacologic colon imaging using principal curves in single photon emission computed tomography., Journal of the American Statistical Association , 103 :1470-1480, 2008. · Zbl 1286.62090 [12] C. Caillerie and B. Michel. Model selection for simplicial approximation., Foundations of Computational Mathematics , 11:707-731, 2011. · Zbl 1231.62123 [13] P. J. Corkeron, P. Anthony, and R. Martin. Ranging and diving behaviour of two ‘offshore’ bottlenose dolphins, Tursiops sp., off eastern Australia., Journal of the Marine Biological Association of the United Kingdom , 84:465-468, 2004. [14] G. De’ath. Principal curves: a new technique for indirect and direct gradient analysis., Ecology , 80 :2237-2253, 1999. [15] P. Delicado. Another look at principal curves and surfaces., Journal of Multivariate Analysis , 77:84-116, 2001. · Zbl 1033.62048 [16] R. M. Dudley. The sizes of compact subsets of Hilbert space and continuity of Gaussian processes., Journal of Functional Analysis , 1:290-330, 1967. · Zbl 0188.20502 [17] J. Einbeck, G. Tutz, and L. Evers. Exploring multivariate data structures with local principal curves. In C. Weihs and W. Gaul, editors, Classification - The Ubiquitous Challenge, Proceedings of the 28th Annual Conference of the Gesellschaft für Klassifikation, University of Dortmund , Studies in Classification, Data Analysis, and Knowledge Organization, pages 256-263. Springer, Berlin, Heidelberg, 2005. [18] J. Einbeck, G. Tutz, and L. Evers. Local principal curves., Statistics and Computing , 15:301-313, 2005. [19] H. Friedsam and W. A. Oren. The application of the principal curve analysis technique to smooth beamlines. In, Proceedings of the 1st International Workshop on Accelerator Alignment , 1989. [20] C. R. Genovese, M. Perone-Pacifico, I. Verdinelli, and L. Wasserman. The geometry of nonparametric filament estimation., Journal of the American Statistical Association , 107:788-799, 2012. · Zbl 1261.62030 [21] T. Hastie and W. Stuetzle. Principal curves., Journal of the American Statistical Association , 84:502-516, 1989. · Zbl 0679.62048 [22] B. Kégl and A. Krzyżak. Piecewise linear skeletonization using principal curves., IEEE Transactions on Pattern Analysis and Machine Intelligence , 24:59-74, 2002. [23] B. Kégl, A. Krzyżak, T. Linder, and K. Zeger. Learning and design of principal curves., IEEE Transactions on Pattern Analysis and Machine Intelligence , 22:281-297, 2000. [24] J. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. In, Proceedings of the American Mathematical Society , volume 7, pages 48-50, 1956. · Zbl 0070.18404 [25] M. Lerasle. Optimal model selection in density estimation., Annales de l’Institut Henri Poincaré , 48:884-908, 2012. · Zbl 1244.62052 [26] C. L. Mallows. Some comments on $$C_p$$., Technometrics , 15:661-675, 1973. · Zbl 0269.62061 [27] P. Massart., Concentration Inequalities and Model Selection . Ecole d’Eté de Probabilités de Saint-Flour XXXIII - 2003, Lecture Notes in Mathematics. Springer, Berlin, Heidelberg, 2007. · Zbl 1170.60006 [28] U. Ozertem and D. Erdogmus. Locally defined principal curves and surfaces., Journal of Machine Learning Research , 12 :1249-1286, 2011. · Zbl 1280.62071 [29] R. C. Prim. Shortest connection networks and some generalizations., Bell System Technology Journal , 36 :1389-1401, 1957. [30] K. Reinhard and M. Niranjan. Parametric subspace modeling of speech transitions., Speech Communication , 27:19-42, 1999. [31] S. Sandilya and S. R. Kulkarni. Principal curves with bounded turn., IEEE Transactions on Information Theory , 48 :2789-2793, 2002. · Zbl 1062.62506 [32] A. Saumard. The slope heuristics in heteroscedastic regression. 2010. Available at, . [33] G. Schwarz. Estimating the dimension of a model., Annals of Statistics , 6:33-73, 1978. · Zbl 0379.62005 [34] D. C. Stanford and A. E. Raftery. Finding curvilinear features in spatial point patterns: principal curve clustering with noise., IEEE Transactions on Pattern Analysis and Machine Intelligence , 22 :2237-2253, 2000. [35] R. Tibshirani. Principal curves revisited., Statistics and Computing , 2:183-190, 1992. [36] J. J. Verbeek, N. Vlassis, and B. Kröse. A soft $$k$$-segments algorithm for principal curves. In, Proceedings of International Conference on Artificial Neural Networks 2001 , pages 450-456, 2001. · Zbl 1001.68628 [37] U. von Luxburg, O. Bousquet, and B. Schölkopf. A compression approach to support vector model selection., Journal of Machine Learning Research , 5:293-323, 2004. · Zbl 1222.68327 [38] W. C. K. Wong and A. C. S. Chung. Principal curves to extract vessels in 3D angiograms. In, Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’08) , pages 1-8, 2008.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.