×

On Mahalanobis distance in functional settings. (English) Zbl 1497.62372

Summary: Mahalanobis distance is a classical tool in multivariate analysis. We suggest here an extension of this concept to the case of functional data. More precisely, the proposed definition concerns those statistical problems where the sample data are real functions defined on a compact interval of the real line. The obvious difficulty for such a functional extension is the non-invertibility of the covariance operator in infinite-dimensional cases. Unlike other recent proposals, our definition is suggested and motivated in terms of the Reproducing Kernel Hilbert Space (RKHS) associated with the stochastic process that generates the data. The proposed distance is a true metric; it depends on a unique real smoothing parameter which is fully motivated in RKHS terms. Moreover, it shares some properties of its finite dimensional counterpart: it is invariant under isometries, it can be consistently estimated from the data and its sampling distribution is known under Gaussian models. An empirical study for two statistical applications, outliers detection and binary classification, is included. The results are quite competitive when compared to other recent proposals in the literature.

MSC:

62R10 Functional data analysis
62H12 Estimation in multivariate analysis
68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: arXiv Link

References:

[1] Ana Arribas-Gil and Juan Romo. Shape outlier detection and visualization for functional data: the outliergram.Biostatistics, 15(4):603-619, 2014.
[2] Robert B. Ash and Melvin F. Gardner.Topics in Stochastic Processes. Academic Press, 1975. · Zbl 0317.60014
[3] Amparo Ba´ıllo, Antonio Cuevas, and Juan Antonio Cuesta-Albertos. Supervised classification for a family of Gaussian functional models.Scandinavian Journal of Statistics, 38 (3):480-498, 2011. · Zbl 1246.62155
[4] Alain Berlinet and Christine Thomas-Agnan.Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer Academic, 2004. · Zbl 1145.62002
[5] Jos´e R. Berrendero, Antonio Cuevas, and Jos´e L. Torrecilla. On the use of reproducing kernel Hilbert spaces in functional classification.Journal of the American Statistical Association, 113(3):1210-1218, 2018. · Zbl 1402.68152
[6] John B. Conway.A course in functional analysis. Springer, 1990. · Zbl 0706.46003
[7] Felipe Cucker and Steve Smale. On the mathematical foundations of learning.Bulletin of the American Mathematical Society, 39(1):1-49, 2001. · Zbl 0983.68162
[8] Felipe Cucker and Ding Xuan Zhou.Learning Theory: an Approximation Theory Viewpoint. Cambridge University Press, 2007. · Zbl 1274.41001
[9] Antonio Cuevas. A partial overview of the theory of statistics with functional data.Journal of Statistical Planning and Inference, 147:1-23, 2014. · Zbl 1278.62012
[10] Xiongtao Dai, Hans-Georg M¨uller, and Fang Yao. Optimal Bayes classifiers for functional data and density ratios.Biometrika, 104(3):545-560, 2017. · Zbl 07072227
[11] Lokenath Debnath and Piotr Mikiusinski.Introduction to Hilbert Spaces with Applications (3rd Ed.). Elsevier, 2005. · Zbl 0715.46009
[12] Pedro Galeano, Esdras Joseph, and Rosa E Lillo. The Mahalanobis distance for functional data with applications to classification.Technometrics, 57:281-291, 2015.
[13] Andrea Ghiglietti and Anna Maria Paganoni. Exact tests for the means of Gaussian stochastic processes.Statistics & Probability Letters, 131:102-107, 2017. · Zbl 1391.62072
[14] Andrea Ghiglietti, Francesca Ieva, and Anna Maria Paganoni.Statistical inference for stochastic processes: two-sample hypothesis tests.Journal of Statistical Planning and Inference, 180:49-68, 2017. · Zbl 1349.62219
[15] Israel Gohberg, Seymour Goldberg, and Marinus Kaashoek.Basic Classes of Linear Operators. Birkh¨auser, 2003. · Zbl 1065.47001
[16] Tailen Hsing and Randall Eubank.Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. John Wiley & Sons, 2015. · Zbl 1338.62009
[17] Alan J. Izenman.Modern Multivariate Statistical Techniques. Springer, 2008. · Zbl 1155.62040
[18] Svante Janson.Gaussian Hilbert Spaces. Cambridge University Press, 1997.
[19] Tosio Kato.Perturbation Theory for Linear Operators. Springer, 2013. · Zbl 0148.12601
[20] Rainer Kress.Linear Integral Equations. Springer, 1989. · Zbl 0671.45001
[21] Radha G. Laha and Vijay K. Rohatgi.Probability Theory. John Wiley & Sons, 1979.
[22] Milan N. Luki´c and Jay H. Beder. Stochastic processes with sample paths in reproducing kernel Hilbert spaces.Transactions of the American Mathematical Society, 353:3945- 3969, 2001. · Zbl 0973.60036
[23] Prasanta C. Mahalanobis. On the generalized distance in statistics.Proceedings of the National Institute of Sciences (Calcutta), 2:49-55, 1936. · Zbl 0015.03302
[24] Kanti V. Mardia. Assessment of multinormality and the robustness of Hotelling’s t2-test. Applied Statistics, pages 163-171, 1975.
[25] Emanuel Parzen. An approach to time series analysis.Journal of the American Statistical Association, 32:951-989, 1961. · Zbl 0107.13801
[26] Gert K. Pedersen. Some operator monotone functions.Proceedings of the American Mathematical Society, 36:309-310, 1972. · Zbl 0256.47019
[27] Kay I. Penny. Appropriate critical values when testing for a single multivariate outlier by using the Mahalanobis distance.Journal of the Royal Statistical Society. Series C (Applied Statistics), 45:73-81, 1996. · Zbl 1076.62528
[28] Alvin C. Rencher.Methods of Multivariate Analysis, volume 3 ed. John Wiley & Sons, 2012. · Zbl 1275.62011
[29] Peter J. Rousseeuw and Katrien van Driessen. A fast algorithm for the minimum covariance determinant estimator.Technometrics, 41:212-223, 1999.
[30] Peter J. Rousseeuw and Bert C. van Zomeren. Unmasking multivariate outliers and leverage points.Journal of the American Statistical Association, 85(411):633-639, 1990.
[31] Bernhard Sch¨olkopf and Alexander J. Smola.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, 2002.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.