×

Best linear unbiased allele-frequency estimation in complex pedigrees. (English) Zbl 1115.62360

Summary: Many types of genetic analyses depend on estimates of allele frequencies. We consider the problem of allele-frequency estimation based on data from related individuals. The motivation for this work is data collected on the Hutterites, an isolated founder population, so we focus particularly on the case in which the relationships among the sampled individuals are specified by a large, complex pedigree for which maximum likelihood estimation is impractical. For this case, we propose to use the best linear unbiased estimator (BLUE) of allele frequency. We derive this estimator, which is equivalent to the quasi-likelihood estimator for this problem, and we describe an efficient algorithm for computing the estimate and its variance. We show that our estimator has certain desirable small-sample properties in common with the maximum likelihood estimator (MLE) for this problem. We treat both the case when parental origin of each allele is known and when it is unknown. The results are extended to prediction of allele frequency in some set of individuals S based on genotype data collected on a set of individuals R. We compare the mean-squared error of the BLUE, the commonly used naive estimator (sample frequency) and the MLE when the latter is feasible to calculate. The results indicate that although the MLE performs the best of the three, the BLUE is close in performance to the MLE and is substantially easier to calculate, making it particularly useful for large complex pedigrees in which MLE calculation is impractical or infeasible. We apply our method to allele-frequency estimation in a Hutterite data set.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
92D15 Problems related to evolution
62G05 Nonparametric estimation
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Abney, Estimation of variance components of quantitative traits in inbred populations., American Journal of Human Genetics 66 pp 629– (2000) · doi:10.1086/302759
[2] Abney, Quantitative trait homozygosity and association mapping and empirical genome-wide significance in large complex pedigrees: Fasting serum insulin level in the Hutterites., American Journal of Human Genetics 70 pp 920– (2002) · doi:10.1086/339705
[3] Boehnke, Allele frequency estimation from data on relatives., American Journal of Human Genetics 48 pp 22– (1991)
[4] Bourgain, Novel case-control test in a founder population identifies P-selectin as an atopy susceptibility locus., American Journal of Human Genetics 73 pp 612– (2003) · doi:10.1086/378208
[5] Boyce, Computation of inbreeding and kinship coefficients on extended pedigrees., Journal of Heredity 74 pp 400– (1983)
[6] Broman, Estimation of allele frequencies with data on sibships., Genetic Epidemiology 20 pp 307– (2001) · doi:10.1002/gepi.2
[7] Ceppellini, The estimation of gene frequencies in a random mating population., Annals of Human Genetics, London 20 pp 97– (1955) · Zbl 0066.13203 · doi:10.1111/j.1469-1809.1955.tb01360.x
[8] Chakraborty, Number of independent genes examined in family surveys and its effect on gene frequency estimation., American Journal of Human Genetics 30 pp 550– (1978)
[9] Cotterman, A weighting system for the estimation of gene frequencies from family records., Contributions from the Laboratory of Vertebrate Biology 33 pp 1– (1947)
[10] Finney, The estimation of gene frequencies from family records. I. Factors without dominance., Heredity 2 pp 199– (1948a) · doi:10.1038/hdy.1948.11
[11] Finney, The estimation of gene frequencies from family records. II. Factors with dominance., Heredity 2 pp 369– (1948b) · doi:10.1038/hdy.1948.22
[12] Fisher, The estimation of the proportion of recessives from tests carried out on a sample not wholly unrelated., Annals of Eugenics 10 pp 160– (1940) · doi:10.1111/j.1469-1809.1940.tb02245.x
[13] Graybill, Theory and Application of the Linear Model (1976) · Zbl 0371.62093
[14] Lange, Programs for pedigree analysis: MENDEL, FISHER, and dGENE., Genetic Epidemiology 5 pp 471– (1988) · doi:10.1002/gepi.1370050611
[15] Lehmann, Theory of Point Estimation (1998) · Zbl 0916.62017
[16] Lockwood, A Bayesian hierarchical model for allele frequencies., Genetic Epidemiology 20 pp 17– (2001) · doi:10.1002/1098-2272(200101)20:1<17::AID-GEPI3>3.0.CO;2-Q
[17] McCullagh, Generalized Linear Models (1989) · Zbl 0588.62104 · doi:10.1007/978-1-4899-3242-6
[18] Ober, The genetic dissection of complex traits in a founder population., American Journal of Human Genetics 69 pp 1068– (2001) · doi:10.1086/324025
[19] Olson, Robust estimation of gene frequency and association parameters., Biometrics 50 pp 665– (1994) · Zbl 0822.62096 · doi:10.2307/2532781
[20] Ott, Strategies for characterizing highly polymorphic markers in human gene mapping., American Journal of Human Genetics 51 pp 283– (1992)
[21] Wedderburn, Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method., Biometrika 61 pp 439– (1974) · Zbl 0292.62050
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.