×

An optimal hierarchical clustering algorithm for gene expression data. (English) Zbl 1173.68773

Summary: Microarrays are used for measuring expression levels of thousands of genes simultaneously. Clustering algorithms are used on gene expression data to find co-regulated genes. An often used clustering strategy is the Pearson correlation coefficient based hierarchical clustering algorithm presented by M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein [Proc. Nat. Acad. Sci. 95, 14863–14868 (1998)], which takes \(O(N^{3})\) time. We note that this run time can be reduced to \(O(N^{2})\) by applying known hierarchical clustering algorithms to this problem. In this paper, we present an algorithm which runs in \(O(N\log N)\) time using a geometrical reduction and show that it is optimal.

MSC:

68U05 Computer graphics; computational geometry (digital and algorithmic aspects)
68W05 Nonnumerical algorithms
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Alizadeh, A. A.; Eisen, M. B.; Davis, R. E.; Ma, C., Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling, Nature, 403, 503-511 (2000)
[2] Bespamyatnikh, S. N., An optimal algorithm for closest pair maintenance, Discrete Comput. Geom., 19, 175-195 (1998) · Zbl 0892.68105
[3] Callahan, P. B.; Kosaraju, S. R., A decomposition of multi-dimensional point-sets with applications to \(k\)-nearest-neighbors and \(n\)-body potential fields, (Proc. 24th Annual ACM Symposium on Theory of Computing (1992)), 546-556
[4] Eisen, M. B.; Spellman, P. T.; Brown, P. O.; Botstein, D., Cluster analysis and display of genome-wide expression patterns, Proc. Nat. Acad. Sci., 95, 25, 14863-14868 (1998)
[5] Eppstein, D., Fast hierarchical clustering and other applications of dynamic closest pairs, (Proc. 9th Annual ACM-SIAM Symposium on Discrete Algorithms (1998)), 619-628 · Zbl 0930.68041
[6] Kawasaki, S.; Borchert, C.; Deyholos, M.; Wong, H., Gene expression profiles during the initial phase of salt stress in rice, Plant Cell, 13, 889-906 (2001)
[7] Khodursky, A. B.; Peter, B. J.; Botstein, D.; Brown, P. O., DNA microarray analysis of physiological conditions and genetic changes that affect tryptophan metabolism in escherichia coli, Proc. Nat. Acad. Sci., 97, 12170-12175 (2000)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.