Clustering Chlorophyll-a satellite data using quantiles. (English) Zbl 1400.62304

Summary: The use of water quality indicators is of crucial importance to identify risks to the environment, society and human health. In particular, the Chlorophyll type A (Chl-a) is a shared indicator of trophic status and for monitoring activities it may be useful to discover local dangerous behaviours (for example, the anoxic events). In this paper we consider a comprehensive data set, covering the whole Adriatic Sea, derived from Ocean Colour satellite data, during the period 2002–2012, with the aim of identifying homogeneous areas. Such zonation is becoming extremely relevant for the implementation of European policies, such the Marine Strategy Framework Directive. As an alternative to clustering based on an “average” value over the whole period, we propose a new clustering procedure for the time series. The procedure shares some similarities with the functional data clustering and combines nonparametric quantile regression with an agglomerative clustering algorithm. This approach permits to take into account some features of the time series as nonstationarity in the marginal distribution and the presence of missing data. A small simulation study is also presented for illustrating the relative merits of the procedure.


62P12 Applications of statistics to environmental and related topics
62P10 Applications of statistics to biology and medical sciences; meta analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G08 Nonparametric regression and quantile regression


R; fda (R); clusfind
Full Text: DOI Euclid


[1] Abraham, C., Cornillon, P. A., Matzner-Løber, E. and Molinari, N. (2003). Unsupervised curve clustering using B-splines. Scand. J. Stat. 30 581-595. · Zbl 1039.91067
[2] Antoniadis, A., Brossat, X., Cugliari, J. and Poggi, J.-M. (2013). Clustering functional data using wavelets. Int. J. Wavelets Multiresolut. Inf. Process. 11 1350003, 30. · Zbl 1271.62131
[3] Behrenfeld, M. J. and Falkowski, P. G. (1997). Photosynthetic rates derived from satellite-based chlorophyll concentration. Limnology and Oceanography 42 1-20.
[4] Bondell, H. D., Reich, B. J. and Wang, H. (2010). Noncrossing quantile regression curve estimation. Biometrika 97 825-838. · Zbl 1204.62061
[5] Campbell, J. W. (1995). The lognormal distribution as a model for bio-optical variability in the sea. Journal of Geophysical Research : Oceans 100 13237-13254.
[6] Cheng, K. F. (1983). Nonparametric estimators for percentile regression functions. Comm. Statist. Theory Methods 12 681-692. · Zbl 0522.62030
[7] Cressie, N. A. C. (1993). Statistics for Spatial Data . Wiley, New York. · Zbl 1347.62005
[8] D’Ortenzio, F. and Ribera d’Alcalà, M. (2009). On the trophic regimes of the Mediterranean Sea: A satellite analysis. Biogeosciences 6 139-148.
[9] Djakovac, T., Degobbis, D., Supić, N. and Precali, R. (2012). Marked reduction of eutrophication pressure in the northeastern Adriatic in the period 2000-2009. Estuarine , Coastal and Shelf Science 115 25-32.
[10] Eilers, P. H. C., Currie, I. D. and Durbán, M. (2006). Fast and compact smoothing on large multidimensional grids. Comput. Statist. Data Anal. 50 61-76. · Zbl 1429.62020
[11] Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with \(B\)-splines and penalties. Statist. Sci. 11 89-121. · Zbl 0955.62562
[12] Eilers, P. H. C., Gampe, J., Marx, B. D. and Rau, R. (2008). Modulation models for seasonal time series and incidence tables. Stat. Med. 27 3430-3441.
[13] Frühwirth-Schnatter, S. and Kaufmann, S. (2008). Model-based clustering of multiple time series. J. Bus. Econom. Statist. 26 78-89.
[14] Giani, M., Djakovac, T., Degobbis, D., Cozzi, S., Solidoro, C. and Umani, S. F. (2012). Recent changes in the marine ecosystems of the northern Adriatic Sea. Estuarine , Coastal and Shelf Science 115 1-13.
[15] Giraldo, R., Delicado, P. and Mateu, J. (2012). Hierarchical clustering of spatially correlated functional data. Stat. Neerl. 66 403-421.
[16] Haggarty, R. A., Miller, C. A. and Scott, E. M. (2015). Spatially weighted functional clustering of river network data. J. R. Stat. Soc. Ser. C. Appl. Stat. 64 491-506.
[17] Haggarty, R. A., Miller, C. A., Scott, E. M., Wyllie, F. and Smith, M. (2012). Functional clustering of water quality data in Scotland. Environmetrics 23 685-695.
[18] He, X. (1997). Quantile curves without crossing. Amer. Statist. 51 186-192.
[19] Henderson, B. (2006). Exploring between site differences in water quality trends: A functional data analysis approach. Environmetrics 17 65-80.
[20] Hubert, L. and Arabie, P. (1985). Comparing partitions. J. Classification 2 193-218. · Zbl 0587.62128
[21] Hunter, D. R. and Lange, K. (2000). Quantile regression via an MM algorithm. J. Comput. Graph. Statist. 9 60-77.
[22] Huot, Y., Babin, M., Bruyant, F., Grob, C., Twardowski, M. S. and Claustre, H. (2007). Does chlorophyll a provide the best index of phytoplankton biomass for primary productivity studies? Biogeosciences Discussions 4 707-745.
[23] Jacques, J. and Preda, C. (2014). Functional data clustering: A survey. Adv. Data Anal. Classif. 8 231-255.
[24] James, G. M. and Sugar, C. A. (2003). Clustering for sparsely sampled functional data. J. Amer. Statist. Assoc. 98 397-408. · Zbl 1041.62052
[25] Jiang, H. and Serban, N. (2012). Clustering random curves under spatial interdependence with application to service accessibility. Technometrics 54 108-119.
[26] Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data : An Introduction to Cluster Analysis . Wiley, New York. · Zbl 1345.62009
[27] Koenker, R. (2005). Quantile Regression. Econometric Society Monographs 38 . Cambridge Univ. Press, Cambridge. · Zbl 1111.62037
[28] Koenker, R., Ng, P. and Portnoy, S. (1994). Quantile smoothing splines. Biometrika 81 673-680. · Zbl 0810.62040
[29] Liao, T. W. (2005). Clustering of time series data-A survey. Pattern Recognition 38 1857-1874. · Zbl 1077.68803
[30] Marini, M., Grilli, F., Guarnieri, A., Jones, B. H., Klajic, Z., Pinardi, N. and Sanxhaku, M. (2010). Is the southeastern Adriatic Sea coastal strip an eutrophic area? Estuarine , Coastal and Shelf Science 88 395-406.
[31] Maritorena, S., d’Andon, O. H. F., Mangin, A. and Siegel, D. A. (2010). Merged satellite ocean color data products using a bio-optical model: Characteristics, benefits and issues. Remote Sensing of Environment 114 1791-1804.
[32] Mélin, F., Vantrepotte, V., Clerici, M., D’Alimonte, D., Zibordi, G., Berthon, J.-F. and Canuti, E. (2011). Multi-sensor satellite time series of optical properties and chlorophyll-a concentration in the Adriatic Sea. Progress in Oceanography 91 229-244.
[33] Nieto-Barajas, L. E. and Contreras-Cristán, A. (2014). A Bayesian nonparametric approach for time series clustering. Bayesian Anal. 9 147-169. · Zbl 1327.62473
[34] Pastres, R., Pastore, A. and Tonellato, S. F. (2011). Looking for similar patterns among monitoring stations. Venice Lagoon application. Environmetrics 22 712-724.
[35] Petitjean, F., Inglada, J. and Gançarski, P. (2012). Satellite image time series analysis under time warping. Geoscience and Remote Sensing , IEEE Transactions on 50 3081-3095.
[36] Piccolo, D. (1990). A distance measure for classifying ARMA models. J. Time Series Anal. 2 153-163. · Zbl 0691.62083
[37] R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
[38] Ramos, E., Juanes, J. A., Galván, C., Neto, J. M., Melo, R., Pedersen, A., Scanlan, C., Wilkes, R., van den Bergh, E., Blomqvist, M., Karup, H. P., Heiber, W., Reitsma, J. M., Ximenes, M. C., Silió, A., Méndez, F. and González, B. (2012). Coastal waters classification based on physical attributes along the NE Atlantic region. An approach for rocky macroalgae potential distribution. Estuarine , Coastal and Shelf Science 112 105-114.
[39] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis , 2nd ed. Springer, New York. · Zbl 1079.62006
[40] Reich, B. J. (2012). Spatiotemporal quantile regression for detecting distributional changes in environmental processes. J. R. Stat. Soc. Ser. C. Appl. Stat. 61 535-553.
[41] Schlossmacher, E. J. (1973). An iterative technique for absolute deviations curve fitting. J. Amer. Statist. Assoc. 68 857-859. · Zbl 0287.62038
[42] Schnabel, S. K. and Eilers, P. H. C. (2013). Simultaneous estimation of quantile curves using quantile sheets. AStA Adv. Stat. Anal. 97 77-87.
[43] Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B. Stat. Methodol. 63 411-423. · Zbl 0979.62046
[44] Wang, X., Smith, K. and Hyndman, R. (2006). Characteristic-based clustering for time series data. Data Min. Knowl. Discov. 13 335-364. · Zbl 05063427
[45] Yoder, J. A., McClain, C. R., Feldman, G. C. and Esaias, W. E. (1993). Annual cycles of phytoplankton chlorophyll concentrations in the global ocean: A satellite view. Global Biogeochemical Cycles 7 181-193.
[46] Yuan, M. (2006). GACV for quantile smoothing splines. Comput. Statist. Data Anal. 50 813-829. · Zbl 1432.62090
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.