×

zbMATH — the first resource for mathematics

\( F_{S T}\) and the triangle inequality for biallelic markers. (English) Zbl 07208408
Summary: The population differentiation statistic \(F_{S T}\), introduced by Sewall Wright, is often treated as a pairwise distance measure between populations. As was known to Wright, however, \( F_{S T}\) is not a true metric because allele frequencies exist for which it does not satisfy the triangle inequality. We prove that a stronger result holds: for biallelic markers whose allele frequencies differ across three populations, \( F_{S T} never\) satisfies the triangle inequality. We study the deviation from the triangle inequality as a function of the allele frequencies of three populations, identifying the frequency vector at which the deviation is maximal. We also examine the implications of the failure of the triangle inequality for four-point conditions for placement of groups of four populations on evolutionary trees. Next, we study the extent to which \(F_{S T}\) fails to satisfy the triangle inequality in human genomic data, finding that some loci produce deviations near the maximum. We provide results describing the consequences of the theory for various types of data analysis, including multidimensional scaling and inference of neighbor-joining trees from pairwise \(F_{S T}\) matrices.
MSC:
92 Biology and other natural sciences
Software:
sedaR
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Atteson, K., The performance of neighbor-joining methods for phylogenetic reconstruction, Algorithmica, 25, 251-278 (1999) · Zbl 0938.68747
[2] Bosch, E.; Calafell, F.; Pérez-Lezaun, A.; Clarimón, J.; Comas, D.; Mateu, E.; Martínez-Arias, R.; Morera, B.; Brakez, Z.; Akhayat, O.; Sefiani, A.; Hariti, G.; Cambon-Thomsen, A.; Bertranpetit, J., Genetic structure of north-west Africa revealed by STR analysis, Eur. J. Human. Genet., 8, 360-366 (2000)
[3] Buneman, P., A note on the metric properties of trees, J. Combin. Theory B, 17, 48-50 (1974) · Zbl 0286.05102
[4] Cailliez, F., The analytical solution of the additive constant problem, Psychometrika, 48, 305-308 (1983) · Zbl 0534.62079
[5] Cox, T. F.; Cox, M. A.A., Multidimensional Scaling (2001), Chapman & Hall/CRC: Chapman & Hall/CRC Boca Raton · Zbl 1004.91067
[6] Holsinger, K. E.; Weir, B. S., Genetics in geographically structured populations: defining, estimating, and interpretting \(F_{S T}\), Nature Rev. Genet., 10, 639-650 (2009)
[7] International HapMap 3. Consortium, K. E., Integrating common and rare genetic variation in diverse human populations, Nature, 467, 52-58 (2010)
[8] Jombart, T.; Pontier, D.; Dufour, A.-B., Genetic markers in the playground of multivariate analysis, Heredity, 102, 330-341 (2009)
[9] Jorde, L. B., Human genetic distance studies: present status and future prospects, Ann. Rev. Anthropol., 14, 343-373 (1985)
[10] Kang, J. T.L.; Goldberg, A.; Edge, M. D.; Behar, D. M.; Rosenberg, N. A., Consanguinity rates predict long runs of homozygosity in Jewish populations, Human Heredity, 82, 87-102 (2016)
[11] Legendre, P.; Legendre, L., Numerical Ecology (1998), Elsevier: Elsevier Amsterdam · Zbl 1033.92036
[12] Li, J. Z.; Absher, D. M.; Tang, H.; Southwick, A. M.; Casto, A. M.; Ramachandran, S.; Cann, H. M.; Barsh, G. S.; Feldman, M.; Cavalli-Sforza, L. L.; Myers, R. M., Worldwide human relationships inferred from genome-wide patterns of variation, Science, 319, 1100-1104 (2008)
[13] Mardia, K. V.; Kent, J. T.; Bibby, J. M., Multivariate Analysis (1979), Academic Press: Academic Press Amsterdam · Zbl 0432.62029
[14] Mihaescu, R.; Levy, D.; Pachter, L., Why neighbor-joining works, Algorithmica, 54, 1-24 (2009) · Zbl 1187.68683
[15] Nei, M., Analysis of gene diversity in subdivided populations, Proc. Natl. Acad. Sci. USA, 70, 3321-3323 (1973) · Zbl 0272.92013
[16] Pérez-Lezaun, A.; Calafell, F.; Mateu, E.; Comas, D.; Ruiz-Pacheco, R.; Bertranpetit, J., Microsatellite variation and the differentiation of modern humans, Hum. Genet., 99, 1-7 (1997)
[17] Rosenberg, N. A.; Mahajan, S.; Ramachandran, S.; Zhao, C.; Pritchard, J. K.; Feldman, M. W., Clines, clusters, and the effect of study design on the inference of human population structure, PLoS Genet., 1, 660-671 (2005)
[18] Saitou, N.; Nei, M., The neighbor-joining method: a new method for reconstructing phylogenetic trees, Mol. Biol. Evol., 4, 406-425 (1987)
[19] Steel, M., Phylogeny: Discrete and Random Processes in Evolution (2016), Society for Industrial and Applied Mathematics: Society for Industrial and Applied Mathematics Philadelphia · Zbl 1361.92001
[20] Studier, J. A.; Keppler, K. J., A note on the neighbor-joining algorithm of Saitou and Nei, Mol. Biol. Evol., 5, 729-731 (1988)
[21] Takezaki, N.; Nei, M., Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA, Genetics, 144, 389-399 (1996)
[22] Verdu, P.; Pemberton, T. J.; Laurent, R.; Kemp, B. M.; Gonzalez-Oliver, A.; Gorodezky, C.; Hughes, C. E.; Shattuck, M. R.; Petzelt, B.; Mitchell, J.; Harry, H.; William, T.; Worl, R.; Cybulski, J. S.; Rosenberg, N. A.; Malhi, R. S., Patterns of admixture and population structure in native populations of northwest North America, PLoS Genet., 10, Article e1004530 pp. (2014)
[23] Wang, C.; Szpiech, Z. A.; Degnan, J. H.; Jakobsson, M.; Pemberton, T. J.; Hardy, J. A.; Singleton, A. B.; Rosenberg, N. A., Comparing spatial maps of human population-genetic variation using Procrustes analysis, Stat. Appl. Genet. Mol. Biol., 9, 13 (2010)
[24] Weir, B. S., Genetic Data Analysis II (1996), Sinauer: Sinauer Sunderland, MA
[25] Weir, B. S.; Cockerham, C. C., Estimating F-statistics for the analysis of population structure, Evolution, 38, 1358-1370 (1984)
[26] Wright, S., The genetical structure of populations, Ann. Eugen., 15, 323-354 (1951)
[27] Wright, S., Evolution and the Genetics of Populations, vol. 4 (1978), University of Chicago Press: University of Chicago Press Chicago
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.