Principal trend analysis for time-course data with applications in genomic medicine. (English) Zbl 1283.62126

Summary: Time-course high-throughput gene expression data are emerging in genomic and translational medicine. Extracting interesting time-course patterns from a patient cohort can provide biological insights for further clinical research and patient treatment. We propose principal trend analysis (PTA) to extract principal trends of time-course gene expression data from a group of patients, and identify the genes that make dominant contributions to the principal trends. Through simulations, we demonstrate the utility of PTA for dimension reduction, time-course signal recovery and feature selection with high-dimensional data. Moreover, PTA derives new insights in real biological and clinical research.
We demonstrate the usefulness of PTA by applying it to longitudinal gene expression data of a circadian regulation system and burn patients. These applications show that PTA can extract interesting time-course trends with biological significance, which helps understanding of biological mechanisms of circadian regulation systems as well as the recovery of burn patients. Overall, the proposed PTA approach will benefit the genomic medicine research. Our method is implemented into an R-package: PTA (Principal Trend Analysis).


62H25 Factor analysis and principal components; correspondence analysis
62P10 Applications of statistics to biology and medical sciences; meta analysis
62-04 Software, source code, etc. for problems pertaining to statistics
92C40 Biochemistry, molecular biology
65C60 Computational problems in statistics (MSC2010)


PMA; ElemStatLearn; R; PTA
Full Text: DOI arXiv Euclid


[1] Bläsing, O. E., Gibon, Y., Günther, M., Höhne, M., Morcuende, R., Osuna, D., Thimm, O., Usadel, B., Scheible, W. R. and Stitt, M. (2005). Sugars and circadian regulation make major contributions to the global regulation of diurnal gene expression in Arabidopsis. The Plant Cell Online 17 3257.
[2] Boyd, S. and Vandenberghe, L. (2004). Convex Optimization . Cambridge Univ. Press, Cambridge. · Zbl 1058.90049
[3] Breiman, L. and Spector, P. (1992). Submodel selection and evaluation in regression. The \(x\)-random case. International Statistical Review 60 291-319.
[4] De Leeuw, J. and Michailidis, G. (1994). Block relaxation algorithms in statistics. In Information Systems and Data Analysis 308-325. Springer, Berlin.
[5] Finnerty, C. C., Herndon, D. N., Przkora, R., Pereira, C. T., Oliveira, H. M., Queiroz, D. M. M., Rocha, A. M. C. and Jeschke, M. G. (2006). Cytokine expression profile over time in severely burned pediatric patients. Shock 26 13-19.
[6] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning : Data Mining , Inference , and Prediction , 2nd ed. Springer, New York. · Zbl 1273.62005
[7] Hubbell, E., Liu, W.-M. and Mei, R. (2002). Robust estimators for expression analysis. Bioinformatics 18 1585-1592.
[8] Kimeldorf, G. S. and Wahba, G. (1970). A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Ann. Math. Statist. 41 495-502. · Zbl 0193.45201
[9] Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In International Joint Conference on Artificial Intelligence , Vol. 14 1137-1145. Lawrence Erlbaum Associates, Mahwah, NJ.
[10] Li, C. and Wong, W. H. (2001). Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proc. Natl. Acad. Sci. USA 98 31-36. · Zbl 0990.62091
[11] Li, H. T., Su, Y. P., Cheng, T. M., Xu, J. M., Liao, J., Chen, J. C., Ji, C. Y., Ai, G. P. and Wang, J. P. (2010). The interaction between interferon-induced protein with tetratricopeptide repeats-1 and eukaryotic elongation factor-1A. Molecular and Cellular Biochemistry 337 101-110.
[12] Mairal, J., Bach, F., Ponce, J. and Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11 19-60. · Zbl 1242.62087
[13] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267-288. · Zbl 0850.62538
[14] Wahba, G. (1990). Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics 59 . SIAM, Philadelphia, PA. · Zbl 0813.62001
[15] Witten, D. M. and Tibshirani, R. J. (2009). Extensions of sparse canonical correlation analysis with applications to genomic data. Stat. Appl. Genet. Mol. Biol. 8 Art. 28, 29. · Zbl 1276.62099
[16] Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 515-534.
[17] Zhang, Y. and Davis, R. (2013). Supplement to “Principal trend analysis for time-course data with applications in genomic medicine.” . · Zbl 1283.62126
[18] Zhang, Y., Tibshirani, R. J. and Davis, R. W. (2010). Predicting patient survival from longitudinal gene expression. Stat. Appl. Genet. Mol. Biol. 9 Art. 41, 23. · Zbl 1304.92103
[19] Zhang, Y., Tibshirani, R. and Davis, R. (2013). Classification of patients from time-course gene expression. Biostatistics 14 87-98.
[20] Zhong, H. H., Young, J. C., Pease, E. A., Hangarter, R. P. and McClung, C. R. (1994). Interactions between light and the circadian clock in the regulation of CAT2 expression in Arabidopsis. Plant Physiology 104 889-898.
[21] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301-320. · Zbl 1069.62054
[22] Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 265-286.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.