×

Nonlinear kernel density principal component analysis with application to climate data. (English) Zbl 1342.62102

Summary: Principal component analysis (PCA) is a well-established tool for identifying the main sources of variation in multivariate data. However, as a linear method it cannot describe complex nonlinear structures. To overcome this limitation, a novel nonlinear generalization of PCA is developed in this paper. The method obtains the nonlinear principal components from ridges of the underlying density of the data. The density is estimated by using Gaussian kernels. Projection onto a ridge of such a density estimate is formulated as a solution to a differential equation, and a predictor-corrector method is developed for this purpose. The method is further extended to time series data by applying it to the phase space representation of the time series. This extension can be viewed as a nonlinear generalization of singular spectrum analysis (SSA). Ability of the nonlinear PCA to capture complex nonlinear shapes and its SSA-based extension to identify periodic patterns from time series are demonstrated on climate data.

MSC:

62H25 Factor analysis and principal components; correspondence analysis
62H11 Directional data; spatial statistics
62G07 Density estimation
62P12 Applications of statistics to environmental and related topics
62H35 Image analysis in multivariate analysis
86A32 Geostatistics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Berkooz, G., Holmes, P., Lumley, J.L.: The proper orthogonal decomposition in the analysis of turbulent flows. Annu. Rev. Fluid Mech. 25, 539-575 (1993) · doi:10.1146/annurev.fl.25.010193.002543
[2] Chacón, J.E., Duong, T., Wand, M.P.: Asymptotics for general multivariate kernel density derivative estimators. Stat. Sin. 21, 807-840 (2011) · Zbl 1214.62039 · doi:10.5705/ss.2011.036a
[3] Christiansen, B.: The shortcomings of nonlinear principal component analysis in identifying circulation regimes. J. Clim. 18(22), 4814-4823 (2005) · doi:10.1175/JCLI3569.1
[4] Damon, J.: Generic structure of two-dimensional images under Gaussian blurring. SIAM J. Appl. Math. 59(1), 97-138 (1998) · Zbl 0914.68206 · doi:10.1137/S0036139997318032
[5] Delworth, T.L., Broccoli, A.J., Rosati, A., Stouffer, R.J., Balaji, V.: GFDL’s CM2 global coupled climate models. Part I: formulation and simulation characteristics. J. Clim. 19(5), 643-674 (2006) · doi:10.1175/JCLI3629.1
[6] Donoho, D.L., Grimes, C.: Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. 100(10), 5591-5596 (2003) · Zbl 1130.62337 · doi:10.1073/pnas.1031596100
[7] Duong, T.: ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R. J. Stat. Softw. 21(7), 1-16 (2007) · doi:10.18637/jss.v021.i07
[8] Einbeck, J., Tutz, G., Evers, L.: Local principal curves. Stat. Comput. 15(4), 301-313 (2005) · doi:10.1007/s11222-005-4073-8
[9] Einbeck, J., Evers, L., Bailer-Jones, C.: Representing complex data using localized principal components with application to astronomical data. In: Gorban, A.N., Kégl, B., Wunsch, D.C., Zinovyev, A.Y. (eds.) Principal Manifolds for Data Visualization and Dimension Reduction, volume 58 of Lecture Notes in Computational Science and Engineering, pp. 178-201. Springer, Berlin, Heidelberg (2008)
[10] Genovese, C.R., Perone-Pacifico, M., Verdinelli, I., Wasserman, L.: Nonparametric ridge estimation. Ann. Stat. 42(4), 1511-1545 (2014) · Zbl 1310.62045 · doi:10.1214/14-AOS1218
[11] Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996) · Zbl 0865.65009
[12] Golyandina, N., Nekrutkin, V., Zhigljavsky, A.A.: Analysis of Time Series Structure: SSA and Related Techniques. Chapman and Hall/CRC Press, London (2001) · Zbl 0978.62073 · doi:10.1201/9781420035841
[13] Greengard, L., Strain, J.: The fast Gauss transform. SIAM J. Sci. Stat. Comput. 12(1), 79-94 (1991) · Zbl 0721.65089 · doi:10.1137/0912004
[14] Higham, N.J.: Functions of Matrices: Theory and Computation. SIAM, Philadelphia (2008) · Zbl 1167.15001 · doi:10.1137/1.9780898717778
[15] Hsieh, W.W.: Nonlinear multivariate and time series analysis by neural network methods. Rev. Geophys. 42(1), 1-25 (2004) · doi:10.1029/2002RG000112
[16] Hsieh, W.W., Hamilton, K.: Nonlinear singular spectrum analysis of the tropical stratospheric wind. Quart. J. R. Meteorol. Soc. 129(592), 2367-2382 (2003) · doi:10.1256/qj.01.158
[17] Jaromczyk, J.W., Toussaint, G.T.: Relative neighborhood graphs and their relatives. Proc. IEEE 80(9), 1502-1517 (1992) · doi:10.1109/5.163414
[18] Jolliffe, I.T.: Principal Component Analysis. Springer-Verlag, Berlin (1986) · Zbl 1011.62064 · doi:10.1007/978-1-4757-1904-8
[19] Kambhatla, N., Leen, K.T.: Dimension reduction by local principal component analysis. Neural Comput. 9(7), 1493-1516 (1997) · doi:10.1162/neco.1997.9.7.1493
[20] Kramer, M.A.: Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37(2), 233-243 (1991) · doi:10.1002/aic.690370209
[21] Loève, M.: Probability Theory: Foundations, Random Sequences. van Nostrand, Princeton (1955) · Zbl 0066.10903
[22] Magnus, J.R.: On differentiating eigenvalues and eigenvectors. Econ. Theory 1(2), 179-191 (1985) · doi:10.1017/S0266466600011129
[23] Miller, J.: Relative critical sets in \[R^nRn\] and applications to image analysis. PhD thesis, University of North Carolina (1998)
[24] Monahan, A.H.: Nonlinear principal component analysis: tropical Indo-Pacific sea surface temperature and sea level pressure. J. Clim. 14(2), 219-233 (2001) · doi:10.1175/1520-0442(2001)013<0219:NPCATI>2.0.CO;2
[25] Newbigging, S.C., Mysak, L.A., Hsieh, W.W.: Improvements to the non-linear principal component analysis method, with applications to ENSO and QBO. Atmos.-Ocean 41(4), 291-299 (2003) · doi:10.3137/ao.410403
[26] Ortega, J.M.: Numerical Analysis: A Second Course. SIAM, Philadelphia (1990) · Zbl 0701.65002 · doi:10.1137/1.9781611971323
[27] Ozertem, U., Erdogmus, D.: Locally defined principal curves and surfaces. J. Mach. Learn. Res. 12, 1249-1286 (2011) · Zbl 1280.62071
[28] Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos. Mag. Ser. 6 2(11), 559-572 (1901) · JFM 32.0710.04 · doi:10.1080/14786440109462720
[29] Pulkkinen, S.: Ridge-based method for finding curvilinear structures from noisy data. Comput. Stat. Data Anal. 82, 89-109 (2015) · Zbl 1507.62147 · doi:10.1016/j.csda.2014.08.007
[30] Pulkkinen, S., Mäkelä, M.M., Karmitsa, N.: A generative model and a generalized trust region Newton method for noise reduction. Comput. Optim. Appl. 57(1), 129-165 (2014) · Zbl 1306.62134 · doi:10.1007/s10589-013-9581-4
[31] Rangayyan, R.M.: Biomedical Signal Analysis: A Case-Study Approach. IEEE Press, New York (2002)
[32] Renardy, M.; Rogers, RC; Marsden, JE (ed.); Sirovich, L. (ed.); Antman, SS (ed.), An introduction to partial differential equations, No. 13 (2004), New York · Zbl 1072.35001
[33] Ross, I.: Nonlinear dimensionality reduction methods in climate data analysis. PhD thesis, University of Bristol, United Kingdom (2008)
[34] Ross, I., Valdes, P.J., Wiggins, S.: ENSO dynamics in current climate models: an investigation using nonlinear dimensionality reduction. Nonlinear Process. Geophys. 15, 339-363 (2008) · doi:10.5194/npg-15-339-2008
[35] Schölkopf, B., Smola, A., Müller, K.-R.: Kernel principal component analysis. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D. (eds.) Artificial Neural Networks-ICANN’97, volume 1327 of Lecture Notes in Computer Science, pp. 583-588. Springer, Berlin (1997) · Zbl 1130.62337
[36] Scholz, M., Kaplan, F., Guy, C.L., Kopka, J., Selbig, J.: Non-linear PCA: a missing data approach. Bioinformatics 21(20), 3887-3895 (2005) · Zbl 1426.76561 · doi:10.1093/bioinformatics/bti634
[37] Scholz, M., Fraunholz, M., Selbig, J.: Nonlinear principal component analysis: Neural network models and applications. In: Gorban, A.N., Kégl, B., Wunsch, D.C., Zinovyev, A.Y. (eds.) Principal Manifolds for Data Visualization and Dimension Reduction, volume 58 of Lecture Notes in Computational Science and Engineering, pp. 44-67. Springer, Berlin (2008) · Zbl 1214.62039
[38] Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319-2323 (2000) · Zbl 0955.37025 · doi:10.1126/science.290.5500.2319
[39] Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. R. Stat. Soc. 61(3), 611-622 (1999) · Zbl 0924.62068 · doi:10.1111/1467-9868.00196
[40] Vautard, R., Yiou, P., Ghil, M.: Singular-spectrum analysis: a toolkit for short, noisy chaotic signals. Physica D 58(1-4), 95-126 (1992) · doi:10.1016/0167-2789(92)90103-T
[41] Weare, B.C., Navato, A.R., Newell, E.R.: Empirical orthogonal analysis of Pacific sea surface temperatures. J. Phys. Oceanogr. 6(5), 671-678 (1976) · doi:10.1175/1520-0485(1976)006<0671:EOAOPS>2.0.CO;2
[42] Weinberger, K., Saul, L.: Unsupervised learning of image manifolds by semidefinite programming. Int. J. Comput. Vis. 70(1), 77-90 (2006) · doi:10.1007/s11263-005-4939-z
[43] Whittlesey, E.F.: Fixed points and antipodal points. Am. Math. Mon. 70(8), 807-821 (1963) · Zbl 0122.41701
[44] Yang, C., Duraiswami, R., Gumerov, N.A., Davis, L.: Improved fast gauss transform and efficient kernel density estimation. In Ninth IEEE International Conference on Computer Vision. volume 1, pp. 664-671. Nice, France (2003)
[45] Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J. Sci. Comput. 26(1), 313-338 (2004) · Zbl 1077.65042 · doi:10.1137/S1064827502419154
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.