Discussion of: Treelets – an adaptive multi-scale basis for sparse unordered data. (English) Zbl 1400.62011

Summary: This is a discussion of paper “Treelets-An adaptive multi-scale basis for sparse unordered data” by Ann B. Lee, Boaz Nadler and Larry Wasserman [A. B. Lee et al., Ann. Appl. Stat. 2, No. 2, 435–471 (2008; Zbl 1400.62274)]. In this paper the authors defined a new type of dimension reduction algorithm, namely, the treelet algorithm. The treelet method has the merit of being completely data driven, and its decomposition is easier to interpret as compared to PCR. It is suitable in some certain situations, but it also has its own limitations. I will discuss both the strength and the weakness of this method when applied to microarray data analysis.


62-07 Data analysis (statistics) (MSC2010)
62P10 Applications of statistics to biology and medical sciences; meta analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)


Zbl 1400.62274
Full Text: DOI arXiv


[1] St. Jude children’s research hospital (sjcrh) database on childhood, leukemia.
[2] Akey, J. M., Zhang, G., Zhang, K., Jin, L. and Shriver, M. D. (2002). Interrogating a high-density snp map for signatures of natural selection., Genome Res. 12 1805-1814.
[3] Barbujani, G., Magagni, A., Minch, E. and Cavalli-Sforza, L. L. (1997). An apportionment of human dna diversity., Proc. Natl. Acad. Sci. USA 94 4516-4519.
[4] Eisen, M., Spellman, P., Brown, P. and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns., Proc. Natl. Acad. Sci. USA 95 14863-14868.
[5] Jolliffe, I. T. (2002)., Principal Component Analysis . Springer, New York. · Zbl 1011.62064
[6] Klebanov, L., Jordan, C. and Yakovlev, A. (2006). A new type of stochastic dependence revealed in gene expression data., Stat. Appl. Genet. Mol. Biol. 5 Article 7. · Zbl 1166.92309
[7] Klebanov, L. and Yakovlev, A. (2007). How high is the level of technical noise in microarray data?, Biol. Direct. 2 9.
[8] Lee, A. B., Nadler, B. and Wasserman, L. (2008). Treelets-An adaptive multi-scale basis for sparse unordered data., Ann. Appl. Statist. · Zbl 1400.62274 · doi:10.1214/07-AOAS137
[9] Qiu, X., Brooks, A. I., Klebanov, L. and Yakovlev, A. (2005a). The effects of normalization on the correlation structure of microarray data., BMC Bioinformatics 6 120.
[10] Qiu, X., Klebanov, L. and Yakovlev, A. (2005b). Correlation between gene expression levels and limitations of the empirical bayes methodology in microarray data., Statist. Appl. Genet. Mol. Biol. 4 Article 3. · doi:10.2202/1544-6115.1157
[11] Qiu, X. and Yakovlev, A. (2006). Some comments on instability of false discovery rate estimation., J. Bioinformatics Computational Biology 4 1057-1068.
[12] Rosenberg, N. A., Pritchard, J. K., Weber, J. L., Cann, H. M., Kidd, K. K., Zhivotovsky, L. A. and Feldman, M. W. (2002). Genetic structure of human populations., Science 298 2381-2385.
[13] Storey, J. D. and Tibshirani, R. (2003). Statistical significance for genomewide studies., Proc. Nat. Acad. Sci. USA 100 9440-9445. · Zbl 1130.62385 · doi:10.1073/pnas.1530509100
[14] Storey, J. D., Madeoy, J., Strout, J. L., Wurfel, M., Ronald, J. and Akey, J. M. (2007). Gene-expression variation within and among human populations., Am. J. Hum. Genet. 80 502-509.
[15] Tibshirani, R., Hastie, T., Eisen, M., Ross, D., Botstein, D. and Brown, P. (1999). Clustering methods for the analysis of dna microarray data. Technical report, Dept. Statistics, Stanford, Univ.
[16] Yang, Y. H., Dudoit, S., Luu, P., Lin, D. M., Peng, V., Ngai, J. and Speed, T. P. (2002). Normalization for cdna microarray data: A robust composite method addressing single and multiple slide systematic variation., Nucleic Acids Res. 30 e15.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.