×

Spectral analysis of phylogenetic data. (English) Zbl 0772.92014

Summary: The spectral analysis of sequence and distance data is a new approach to phylogenetic analysis. For two-state character sequences, the character values at a given site split the set of taxa into two subsets, a bipartition of the taxa set. The vector which counts the relative numbers of each of these bipartitions over all sites is called a sequence spectrum. Applying a transformation called a Hadamard conjugation, the sequence spectrum is transformed to the conjugate spectrum. This conjugation corrects for unobserved changes in the data, independently from the choice of phylogenetic tree. For any given phylogenetic tree with edge weights (probabilities of state change), we define a corresponding tree spectrum. The selection of a weighted phylogenetic tree from the given sequence data is made by matching the conjugate spectrum with a tree spectrum.
We develop an optimality selection procedure using a least squares best fit, to find the phylogenetic tree whose tree spectrum most closely matches the conjugate spectrum. An inferred sequence spectrum can be derived from the selected tree spectrum using the inverse Hadamard conjugation to allow a comparison with the original sequence spectrum.

MSC:

92D20 Protein sequences, DNA sequences
92D15 Problems related to evolution
92-08 Computational methods for problems pertaining to biology
92B10 Taxonomy, cladistics, statistics in mathematical biology
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] ANDREWS, H. C. (1970),Computer Techniques in Image Processing, New York: Academic Press.
[2] CAVENDER, J. A. (1978), ”Taxonomy with Confidence,”Mathematical Biosciences, 40, 271–280. · Zbl 0391.92015
[3] CAVENDER, J. A., and FELSENSTEIN, J. (1978), ”Invariants of Phylogenies: Simple Cases with Discrete States”Journal of Classification, 4, 57–71. · Zbl 0612.62142
[4] COOPER, B. E. (1968), ”The Extension of Yates’ 2n Algorithm to any COmplete Factorial Experiment,”Technometrics, 10, 575–577.
[5] DE SOETE, G. (1983), ”A Least Squares Algorithm for Fitting Additive Trees to Proximity Data,”Psychometrika, 48, 621–626.
[6] FARRIS, J. S. (1972), ”Estimating Phylogenetic Trees from Distance Matrices,”American Naturalist, 106, 645–668.
[7] FARRIS, J. S. (1978), ”Inferring Phylogenetic Trees from Chromosome Inversion Data,”Systematic Zoology, 27, 275–284.
[8] FELSENSTEIN, J. (1978), ”Cases in which Parsimony or Compatibility Methods will be Positively Misleading,”Systematic Zoology, 27, 401–410.
[9] FELSENSTEIN, J. (1987), ”Estimation of Hominoid Phylogeny from a DNA Hybridization Data Set,”Journal of Molecular Evolution, 26, 123–131.
[10] HADAMARD, J. (1893), ”Resolution d’une question relative aux determinants,”Bulletin des Sciences Mathematiques Series 2, 17, 240–246. · JFM 25.0221.02
[11] HEDAYAT, A., and WALLIS, W. D. (1978), ”Hadamard Matrices and their Applications,”Annuls of Statistics, 6, 1184–1238. · Zbl 0401.62061
[12] HENDY, M. D., and PENNY, D. (1982), ”Branch and Bound Algorithms to Determine Minimal Evolutionary Trees,”Mathematical Biosciences, 59, 277–290. · Zbl 0488.92004
[13] HENDY, M. D., and PENNY, D. (1989), ”A Framework for the Quantitative Study of Evolutionary Trees,”Systematic Zoology, 38, 297–309.
[14] HENDY, M. D. (1989), ”The Relationship Between Simple Evolutionary Tree Models and Observable Sequence Data,”Systematic Zoology, 38, 310–321.
[15] HENDY, M. D. (1991), ”A Combinatorial Description of the Closest Tree Algorithm for Finding Evolutionary Trees,”Discrete Mathematics, 96, 51–58. · Zbl 0746.05059
[16] JUKES, T. H., and CANTOR, C. H. (1969), ”Evolution of Protein Molecules,” inMammalian Protein Metabolism, Ed. H. M. Munro, New York: Academic Press, 21–123.
[17] JAKE., J. A. (1987), ”Prokaryotes and Archaebacteria are not Monophyletic: Rate Invariant Analysis of rRNA Genes Indicates that Eukaryotes and Eocytes, From a monophyletic Taxon,”Cold Spring Harbor Symposia on Quantitative Biology, 52, 839–846.
[18] LAKE, J. A. (1987a), ”A Rate-Independent Technique for Analysis of Nucleic Acid Sequences: Evolutionary Parsimony,”Molecular Biology and Evolution, 4, 167–191.
[19] PENNY, D., and HENDY, M. D. (1987), ”TurboTree: A Fast Algorithm for Minimal Trees,”Computer Applications in the Biosciences, 3, 183–188.
[20] PENNY, D., HENDY, M. D., ZIMMER, E. A., and HAMBY, R. K. (1990), ”Trees from Sequences: Panacea or Pandora’s Box?,”Australian Journal of Systematic Botany., 3, 21–38.
[21] SANKOFF, D. (1990), ”Designer Invariants for Large Phylogenies,”Molecular Biology and Evolution, 7, 255–269.
[22] SARICH, V. M (1969), ”Pinniped Origins and the Rate of Evolution of Carnivore Albumins,”Systematic Zoology, 18, 286.
[23] SCHROEDER, M. R. (1986),Number Theory in Science and Communication, 2nd ed., Berlin: Springer-Verlag. · Zbl 0613.10001
[24] SNEATH, P. H. A., and SOKAL, R. R. (1973),Numerical Taxonomy, San Francisco: W. H. Freeman. · Zbl 0285.92001
[25] STEEL, M. A. (1989),Distributions on Bicoloured Evolutionary Trees, Ph.D. thesis, Massey University, Palmerston North. · Zbl 0676.05038
[26] WHELCHEL, J. E., and GUINN, D. F. (1968), ”The Fast Fourier-Hadamard Transform and its Use in Signal Representation and Classification,”Eascon 1968 Convention Record, 561–573.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.