Treelets – an adaptive multi-scale basis for sparse unordered data. (English) Zbl 1400.62274

Summary: In many modern applications, including analysis of gene expression and text documents, the data are noisy, high-dimensional, and unordered-with no particular meaning to the given order of the variables. Yet, successful learning is often possible due to sparsity: the fact that the data are typically redundant with underlying structures that can be represented by only a few features. In this paper we present treelets – a novel construction of multi-scale bases that extends wavelets to nonsmooth signals. The method is fully adaptive, as it returns a hierarchical tree and an orthonormal basis which both reflect the internal structure of the data. Treelets are especially well-suited as a dimensionality reduction and feature selection tool prior to regression and classification, in situations where sample sizes are small and the data are sparse with unknown groupings of correlated or collinear variables. The method is also simple to implement and analyze theoretically. Here we describe a variety of situations where treelets perform better than principal component analysis, as well as some common variable selection and cluster averaging schemes. We illustrate treelets on a blocked covariance model and on several data sets (hyperspectral image data, DNA microarray data, and internet advertisements) with highly complex dependencies between variables.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)


Full Text: DOI arXiv


[1] Ahn, J. and Marron, J. S. (2008). Maximal data piling in discrimination., Biometrika . · Zbl 1182.62134
[2] Angeletti, C., Harvey, N. R., Khomitch, V., Fischer, A. H., Levenson, R. M. and Rimm, D. L. (2005). Detection of malignancy in cytology specimens using spectral-spatial analysis., Laboratory Investigation 85 1555-1564.
[3] Asimov, D. (1985). The Grand Tour: A tool for viewing multidimensional data., SIAM J. Sci. Comput. 6 128-143. · Zbl 0552.62052
[4] Bair, E., Hastie, T., Paul, D. and Tibshirani, R. (2006). Prediction by supervised principal components., J. Amer. Statist. Assoc. 101 119-137. · Zbl 1118.62326
[5] Belkin, M. and Niyogi, P. (2005). Semi-supervised learning on Riemannian manifolds., Machine Learning 56 209-239. · Zbl 1089.68086
[6] Beran, R. and Srivastava, M. (1985). Bootstrap tests and confidence regions for functions of a covariance matrix., Ann. Statist. 13 95-115. · Zbl 0607.62048
[7] Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices., Ann. Statist. 36 199-227. · Zbl 1132.62040
[8] Buckheit, J. and Donoho, D. (1995). Improved linear discrimination using time frequency dictionaries. In, Proc. SPIE 2569 540-551.
[9] Candès, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when, p is much larger than n (with discussion). Ann. Statist. 35 2313-2404. · Zbl 1139.62019
[10] Coifman, R., Lafon, S., Lee, A., Maggioni, M., Nadler, B., Warner, F. and Zucker, S. (2005). Geometric diffusions as a tool for harmonics analysis and structure definition of data: Diffusion maps., Proc. Natl. Acad. Sci. 102 7426-7431.
[11] Coifman, R. and Saito, N. (1996). The local Karhunen-Loève basis. In, Proc. IEEE International Symposium on Time-Frequency and Time-Scale Analysis 129-132. IEEE Signal Processing Society.
[12] Coifman, R. and Wickerhauser, M. (1992). Entropy-based algorithms for best basis selection. In, Proc. IEEE Trans. Inform. Theory . 32 712-718. · Zbl 0849.94005
[13] Dettling, M. and Bühlmann, P. (2004). Finding predictive gene groups from microarray data., J. Multivariate Anal. 90 106-131. · Zbl 1047.62103
[14] Donoho, D. and Elad, M. (2003). Maximal sparsity representation via, l 1 minimization. Proc. Natl. Acad. Sci. USA 100 2197-2202. · Zbl 1064.94011
[15] Donoho, D. and Johnstone, I. (1995). Adapting to unknown smoothness via wavelet shrinkage., J. Amer. Statist. Assoc. 90 1200-1224. JSTOR: · Zbl 0869.62024
[16] Eisen, M., Spellman, P., Brown, P. and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns., Proc. Natl. Acad. Sci. USA 95 14863-14868.
[17] Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation., J. Amer. Statist. Assoc. 97 611-631. JSTOR: · Zbl 1073.62545
[18] Golub, G. and van Loan, C. F. (1996)., Matrix Computations , 3rd ed. Johns Hopkins Univ. Press. · Zbl 0865.65009
[19] Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P. Coller, H., Lob, M. L., Downing, J. R., Caliguiri, M., Bloomfield, C. and Lander, E. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring., Science 286 531-537. · Zbl 1047.65504
[20] Gruber, M. (1998)., Improving Efficiency by Shrinkage : The James-Stein and Ridge Regression Estimators . Dekker, New York. · Zbl 0920.62085
[21] Guyon, I., Weston, J., Barnhill, S. and Vapnik, V. (2002). Gene selection for cancer classification using support vector machines., Machine Learning 46 389-422. · Zbl 0998.68111
[22] Hall, P., Marron, J. S. and Neeman, A. (2005). Geometric representation of high dimension, low sample size data., J. R. Stat. Soc. Ser. B Stat. Methodol. 67 427-444. JSTOR: · Zbl 1069.62097
[23] Hastie, T., Tibshirani, R., Botstein, D. and Brown, P. (2001). Supervised harvesting of expression trees., Genome Biology 2 research0003.1-0003.12.
[24] Hastie, T., Tibshirani, R. and Friedman, J. (2001)., The Elements of Statistical Learning . Springer, New York. · Zbl 0973.62007
[25] Jain, A. K., Murty, M. N. and Flynn, P. J. (1999). Data clustering: A review., ACM Computing Surveys 31 264-323.
[26] Johnstone, I. and Lu, A. (2008). Sparse principal component analysis., J. Amer. Statist. Assoc.
[27] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal component analysis., Ann. Statist. 29 295-327. · Zbl 1016.62078
[28] Jolliffe, I. T. (2002)., Principal Component Analysis , 2nd ed. Springer, New York. · Zbl 1011.62064
[29] Kalisch, M. and Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the pc-algorithm., J. Machine Learning Research 8 613-636. · Zbl 1222.68229
[30] Kushmerick, N. (1999). Learning to remove internet advertisements. In, Proceedings of the Third Annual Conference on Autonomous Agents 175-181.
[31] Lee, A. and Nadler, B. (2007). Treelets-a tool for dimensionality reduction and multi-scale analysis of unstructured data. In, Proc. of the Eleventh International Conference on Artificial Intelligence and Statistics (M. Meila and Y. Shen, eds.).
[32] Levina, E. and Zhu, J. (2007). Sparse estimation of large covariance matrices via a hierarchical lasso penalty., Submitted.
[33] Lorber, A., Faber, K. and Kowalski, B. R. (1997). Net analyte signal calculation in multivariate calibration., Anal. Chemometrics 69 1620-1626.
[34] Mallat, S. (1998)., A Wavelet Tour of Signal Processing . Academic Press, San Diego, CA. · Zbl 0937.94001
[35] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso., Ann. Statist. 34 1436-1462. · Zbl 1113.62082
[36] Murtagh, F. (2004). On ultrametricity, data coding, and computation., J. Classification 21 167-184. · Zbl 1084.62052
[37] Murtagh, F. (2007). The Haar wavelet transform of a dendrogram., J. Classification 24 3-32. · Zbl 1141.65094
[38] Murtagh, F., Starck, J.-L. and Berry, M. W. (2000). Overcoming the curse of dimensionality in clustering by means of the wavelet transform., Computer J. 43 107-120.
[39] Nadler, B. (2007). Finite sample approximation results for principal component analysis: A matrix perturbation approach., Submitted. · Zbl 1168.62058
[40] Nadler, B. and Coifman, R. (2005a). Partial least squares, Beer’s law and the net analyte signal: Statistical modeling and analysis., J. Chemometrics 19 45-54.
[41] Nadler, B. and Coifman, R. (2005b). The prediction error in CLS and PLS: The importance of feature selection prior to multivariate calibration., J. Chemometrics 19 107-118.
[42] Ogden, R. T. (1997)., Essential Wavelets for Statistical Applications and Data Analysis . Birkhäuser, Boston. · Zbl 0868.62033
[43] Saito, N. and Coifman, R. (1995). On local orthonormal bases for classification and regression. In, On Local Orthonormal Bases for Classification and Regression 1529-1532. IEEE Signal Processing Society.
[44] Saito, N., Coifman, R., Geshwind, F. B. and Warner, F. (2002). Discriminant feature extraction using empirical probability density estimation and a local basis library., Pattern Recognition 35 2841-2852. · Zbl 1010.68157
[45] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., J. Roy. Statist. Soc. Ser. B 58 267-288. JSTOR: · Zbl 0850.62538
[46] Tibshirani, R., Hastie, T., Eisen, M., Ross, D., Botstein, D. and Brown, P. (1999). Clustering methods for the analysis of DNA microarray data. Technical report, Dept. Statistics, Stanford, Univ.
[47] Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression., Proc. Natl. Acad. Sci. 99 6567-6572.
[48] Whittaker, J. (2001)., Graphical Models in Multivariate Statistics . Wiley, New York. · Zbl 1151.62053
[49] Xu, R. and Wunsch, D. (2005). Survey of clustering algorithms., IEEE Trans. Neural Networks 16 645-678.
[50] Zhao, Z. and Liu, H. (2007). Searching for interacting features. In, Proceedings of the 20th International Joint Conference on AI ( IJCAI-07 ).
[51] Zhu, J. and Hastie, T. (2004). Classification of gene microarrays by penalized logistic regression., Biostatistics 5 427-444. · Zbl 1154.62406
[52] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net., J. Roy. Statist. Soc. Ser. B 67 301-320. JSTOR: · Zbl 1069.62054
[53] Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis., J. Comput. Graph. Statist. 15 265-286.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.