×

Biological applications of time series frequency domain clustering. (English) Zbl 1281.62232

Summary: Clustering methods are used routinely to form groups of objects with similar characteristics. Collections of time series datasets appear in several biological applications. Some of these applications require grouping the observed time series data to homogeneous clusters. We review methods for time series frequency domain based clustering with emphasis on applications. Our point of view is that an appropriate notion of clustering for time series data can be developed by means of the spectral density function and its sample counterpart, the periodogram. For the development of frequency domain based clustering algorithms, it is required to define suitable similarity (or dissimilarity) measures. We review several such measures and we discuss various clustering algorithms in this context. Biological applications of time series frequency domain clustering are studied along with interesting complementary approaches.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62-02 Research exposition (monographs, survey articles) pertaining to statistics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Alonso, Comparison of time series using subsampling, Computational Statistics & Data Analysis 50 pp 2589– (2006) · Zbl 1445.62216
[2] Alonso, Time series clustering based on forecast densities, Computational Statistics & Data Analysis 51 pp 762– (2006) · Zbl 1157.62484
[3] Bagos, Evaluation of methods for predicting the topology of beta-barrel outer membrane proteins and a consensus prediction method, BMC Bioinformatics 6 pp 7– (2005a)
[4] Bagos, Topology prediction of {\(\beta\)}-barrel outer membrane proteins, Proceedings of the Indian National Science Academy B71 pp 19– (2005b)
[5] Bendtsen, Prediction of twin-arginine signal peptides, BMC Bioinformatics 6 pp 167– (2005)
[6] Berks, Protein targeting by the bacterial twin-arginine translocation (tat) pathway, Current Opinion in Microbiology 8 pp 174– (2005)
[7] Bloomfield, An exponential model for the spectrum of a scalar time series, Biometrika 60 pp 217– (1973) · Zbl 0261.62074
[8] Bogert, Proceedings of the Symposium on Time Series Analysis pp 209– (1963)
[9] Brillinger, Time Series (1981)
[10] Brockwell, Time Series: Theory and Methods (1991) · Zbl 0709.62080
[11] Caiado, A periodogram-based metric for time series classification, Computational Statistics & Data Analysis 50 pp 2668– (2006) · Zbl 1445.62222
[12] Caiado, Comparison of times series with unequal length in the frequency domain, Communications in Statistics - Simulation and Computation 38 (3) pp 527– (2009) · Zbl 1161.37348
[13] Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Annals of Mathematical Statistics 23 pp 493– (1952) · Zbl 0048.11804
[14] Coates, Tests for comparing two estimated spectral densities, Journal of Time Series Analysis 7 pp 7– (1986) · Zbl 0581.62076
[15] Dahlhaus, On the Kullback-Leibler information divergence of locally stationary processes, Stochastic Processes and its Applications 62 pp 139– (1996) · Zbl 0849.60032
[16] D’Avenio, SWIFT (sequence-wide investigation with Fourier transform): a software tool for identifying proteins of a given class from the unannotated genome sequence, Bioinformatics 21 pp 2943– (2005)
[17] Diggle, Time Series (1990)
[18] Eick, Tools with Artificial Intelligence, 2004. ICTAI 2004. 16th IEEE International Conference pp 774– (2004)
[19] Eisen, Cluster analysis and display of genomewide expression patterns, Proceedings of the National Academy of Sciences 95 pp 14863– (1998)
[20] Emanuelsson, Predicting subcellular localization of proteins based on their N-terminal amino acid sequence, Journal of Molecular Biology 300 pp 1005– (2000)
[21] Emanuelsson, Locating proteins in the cell using TargetP, SignalP and related tools, Nature Protocols 2 pp 953– (2007)
[22] Ernst, Clustering short time series gene expression data, Bioinformatics 21 pp i159– (2005)
[23] Fokianos, On comparing several spectral densities, Technometrics 50 pp 317– (2008) · Zbl 1320.62070
[24] Gao, Statistical Methods for Modeling Human Dynamics: An Interdisciplinary Dialogue pp 85– (2009)
[25] Georgiou, Distances and Riemannian metrics for spectral density functions, IEEE Transactions on Signal Processing 55 pp 3995– (2007) · Zbl 1390.94624
[26] Glynn, Detecting periodic patterns in unevenly spaced gene expression time series using Lomb-Scargle periodograms, Bioinformatics 22 pp 310– (2006)
[27] Grira, A Review of Machine Learning Techniques for Processing Multimedia Content (2004)
[28] Hastie, The Elements of Statistical Learning (2009) · Zbl 1273.62005
[29] Holan, Time Series Exponential Models:Theory and Methods (2004)
[30] Huang, Discrimination and classification of nonstationary time series using the SLEX model, Journal of the American Statistical Association 99 pp 763– (2004) · Zbl 1117.62357
[31] Ioannou, Spectral density ratio based clustering methods for the binary segmentation of protein sequences: a comparative study, Biosystems 100 pp 132– (2010)
[32] Issac, Locating probable genes using Fourier transform approach, Bioinformatics 18 pp 196– (2002)
[33] Kakizawa, Discrimination and clustering for multivariate time series, Journal of the American Statistical Association 93 pp 328– (1998) · Zbl 0906.62060
[34] Kawashima, Aaindex: amino acid index database, Nucleic Acids Reearch 28 pp 374– (2000) · Zbl 05435963
[35] Kazakos, Spectral distance measures between Gaussian processes, IEEE Transactions on Automatic Control 25 pp 950– (1980) · Zbl 0454.93040
[36] Kim, Clustering the periodic pattern of gene expression using Fourier series approximations, Current Genomics 7 pp 197– (2006)
[37] Kim, A computational approach to the functional clustering of periodic gene-expression profiles, Genetics 180 pp 821– (2008)
[38] King, Identification and application of the concepts important for accurate and reliable protein secondary structure prediction, Protein Science 5 pp 2298– (1996)
[39] King, DSC: public domain protein secondary structure predication, Computer Applications in the Biosciences: CABIOS 13 pp 473– (1997)
[40] Kullback, On information and sufficiency, Annals of Mathematical Statistics 22 pp 79– (1951) · Zbl 0042.38403
[41] Langmead, A maximum entropy algorithm for rhythmic analysis of genome-wide expression patterns, Proceedings/IEEE Computer Society Bioinformatics Conference 1 pp 237– (2002)
[42] Li, Functional clustering of periodic transcriptional profiles through ARMA(p,q), PLoS ONE 5 pp e9894– (2010)
[43] Liao, Clustering of time series data-a survey, Pattern Recognition 38 pp 1857– (2005) · Zbl 1077.68803
[44] Macnaughton Smith, Dissimilarity analysis: a new technique of hierarchical subdivision, Nature 202 pp 1034– (1965) · Zbl 0128.38503
[45] Maharaj, Clusters of time series, Journal of Classification 17 pp 297– (2000) · Zbl 1017.62079
[46] Marhon, Gene prediction based on dna spectral analysis: a literature review, Journal of Computational Biology 18 pp 639– (2011)
[47] McLachlan, The 14-fold periodicity in alpha-tropomyosin and the interaction with actin, Journal of Molecular Biology 103 pp 271– (1976)
[48] Meila, Comparing clusterings-an information based distance, Journal of Multivariate Analysis 98 pp 873– (2007) · Zbl 1298.91124
[49] Nugent, An overview of clustering applied to molecular biology, Methods in Molecular Biology 620 pp 369– (2010)
[50] Parzen, Time Series, Statistics and Information (1990)
[51] Parzen, Developments in Time Series Analysis. In Honour of M. B. Priestley pp 139– (1993) · Zbl 0878.62063
[52] Pasquier, A web server to locate periodicities in a sequence, Bioinformatics 14 pp 749– (1998)
[53] Pasquier, PRED-CLASS: cascading neural networks for generalized protein classification and genome-wide applications, Proteins 44 pp 361– (2001)
[54] Pham, Lpc cepstral distortion measure for protein sequence comparison, IEEE Transactions on Nanobioscience 5 pp 83– (2006)
[55] Priestley, Evolutionary spectra and non-stationary processes. (with discussion), Journal of the Royal Statistical Society Series B 27 pp 204– (1965) · Zbl 0144.41001
[56] Quackenbush, Genomics. microarrays-guilt by association, Science 302 pp 240– (2003)
[57] Rényi, Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability 1960 I pp 547– (1961)
[58] Sakiyama, Discriminant analysis for locally stationary processes, Journal of Multivariate Analysis 90 pp 282– (2004) · Zbl 1050.62066
[59] Savvides, Clustering of biological time series by cepstral coefficients based distances, Pattern Recognition 41 pp 2398– (2008) · Zbl 1138.68515
[60] Shepherd, A novel approach to the recognition of protein architecture from sequence using Fourier analysis and neural networks, Proteins 50 pp 290– (2003)
[61] Shumway, Time-frequency clustering and discriminant analysis, Statistics & Probability Letters 63 pp 307– (2003) · Zbl 1116.62364
[62] Shumway, Time Series Analysis and Its Applications (2010)
[63] Spellman, Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization, Molecular Biology of the Cell 9 pp 3273– (1998)
[64] Stoffer, Smoothing spline ANOPOW, Journal of Statistical Planning and Inference 140 pp 3789– (2010) · Zbl 1404.62092
[65] Sur, Statistical Analysis of Eye Gaze Data (2010)
[66] Taniguchi, Asymptotic Theory of Statistical Inference for Time Series (2000) · Zbl 0955.62088
[67] Vilar, Non-linear time series clustering based on non-parametric forecast densities, Computational Statistics & Data Analysis 54 pp 2850– (2010) · Zbl 1284.62575
[68] Wichert, Identifying periodically expressed transcripts in microarray time series data, Bioinformatics 20 pp 5– (2004)
[69] Yates, Mass spectral analysis in proteomics, Annual Review of Biophysics and Biomolecular Structure 33 pp 297– (2004)
[70] Zadeh, Fuzzy sets, Information and Control 8 pp 338– (1965) · Zbl 0139.24606
[71] Zhao, Detecting periodic genes from irregularly sampled gene expressions: a comparison study, EURASIP Journal on Bioinformatics & Systems Biology 2008 (2008) · Zbl 05735216
[72] Zhao, Spectral preprocessing for clustering time-series gene expressions, EURASIP Journal on Bioinformatics & Systems Biology (2009) · Zbl 05735241
[73] Zhu, Semi-Supervised Learning Literature Survey (2005)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.