×

Improving the efficiency of genomic selection. (English) Zbl 1311.92126

Summary: We investigate two approaches to increase the efficiency of phenotypic prediction from genome-wide markers, which is a key step for genomic selection (GS) in plant and animal breeding. The first approach is feature selection based on Markov blankets, which provide a theoretically-sound framework for identifying non-informative markers. Fitting GS models using only the informative markers results in simpler models, which may allow cost savings from reduced genotyping. We show that this is accompanied by no loss, and possibly a small gain, in predictive power for four GS models: partial least squares (PLS), ridge regression, LASSO and elastic net. The second approach is the choice of kinship coefficients for genomic best linear unbiased prediction (GBLUP). We compare kinships based on different combinations of centring and scaling of marker genotypes, and a newly proposed kinship measure that adjusts for linkage disequilibrium (LD). We illustrate the use of both approaches and examine their performances using three real-world data sets with continuous phenotypic traits from plant and animal genetics. We find that elastic net with feature selection and GBLUP using LD-adjusted kinships performed similarly well, and were the best-performing methods in our study.

MSC:

92D10 Genetics and epigenetics
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Aliferis, C. F., A. Statnikov, I. Tsamardinos, S. Mani and X. D. Xenofon (2010): “Local causal and markov blanket induction for causal discovery and feature selection for classification part i: algorithms and empirical evaluation,” J. Mach. Learn. Res., 11, 171-234.; · Zbl 1242.68197
[2] Astle, W. and D. J. Balding (2009): “Population structure and cryptic relatedness in genetic association studies,” Stat. Sci., 24, 451-471.; · Zbl 1329.62419
[3] Bravo, H. C., K. E. Leeb, B. E. K. Kleinb, R. Kleinb, S. K. Iyengarc and G. Wahbad (2009): “Examining the relative influence of familial, genetic, and environmental covariate information in flexible risk models,” PNAS, 106, 8128-8133.;
[4] Cockram, J., J. White, D. L. Zuluaga, D. Smith, J. Comadran, M. Macaulay, Z. Luo, M. J. Kearsey, P. Werner, D. Harrap, C. Tapsell, H. Liu, P. E. Hedley, N. Stein, D. Schulte, B. Steuernagel, D. F. Marshall, W. T. Thomas, L. Ramsay, I. Mackay, D. J. Balding, The AGOUEB Consortium, R. Waugh and D. M. O’Sullivan (2010): “Genome-wide association mapping to candidate polymorphism resolution in the unsequenced barley genome,” PNAS, 107, 21611-21616.;
[5] de los Campos, G., J. M. Hickey, R. Pong-Wong, H. D. Daetwyler and M. P. L. Calus (2012): “Whole-genome regression and prediction methods applied to plant and animal breeding,” Genetics, 193, 327-345.;
[6] Forni, S., I. Aguilar and I. Misztal (2011): “Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information,” Genet. Sel. Evol., 43, 1-7.;
[7] Friedman, J. H., T. Hastie and R. Tibshirani (2010): “Regularization paths for generalized linear models via coordinate descent,” J. Stat. Soft., 33, 1-22.;
[8] Gianola, D., G. de los Campos, W. G. Hill, E. Manfredi and R. Fernando (2009): “Additive genetic variability and the bayesian alphabet,” Genetics, 183, 347-363.;
[9] Goeman, J. J. (2012): penalized R package, R package version 0.9-41.;
[10] Guan, Y. and M. Stephens (2011): “Bayesian variable selection regression for genome-wide association studies and other large-scale problems,” Ann. Appl. Stat., 5, 1780-1815.; · Zbl 1229.62145
[11] Habier, D., R. L. Fernando and J. C. M. Dekkers (2007): “The impact of genetic relationship information on genome-assisted breeding values,” Genetics, 177, 2389-2397.;
[12] Hastie, T., R. Tibshirani, B. Narasimhan and G. Chu (2012): impute: Imputation for Microarray Data, R package version 1.30.0.;
[13] Hayes, B. J., P. J. Bowman, A. J. Chamberlain and M. E. Goddard (2009): “Genomic selection in dairy cattle: progress and challenges,” J. Dairy Sci., 92, 433-443.;
[14] Heffner, E. L., M. E. Sorrells and J.-L. Jannink (2009): “Genomic selection for crop improvement,” Crop Sci., 49, 1-12.;
[15] Hoerl, A. E. and R. W. Kennard (1970): “Ridge regression: biased estimation for nonorthogonal problems,” Technometrics, 12, 55-67.; · Zbl 0202.17205
[16] Hooper, J. W. (1958): “The sampling variance of correlation coefficients under assumptions of fixed and mixed variates,” Biometrika, 45, 471-477.; · Zbl 0088.12901
[17] Hotelling, H. (1953): “New light on the correlation coefficient and its transforms,” J. Roy. Stat. Soc. B, 15, 193-232.; · Zbl 0052.14905
[18] Koller, D. and M. Sahami (1996): “Toward optimal feature selection,” In: Proceedings of the 13th International Conference on Machine Learning (ICML), San Francisco, CA: Morgan Kaufmann, 284-292.;
[19] Legendre, P. (2000): “Comparison of permutation methods for the partial correlation and partial mantel tests,” J. S. Comput. Sim., 67, 37-73.; · Zbl 1146.62355
[20] Li, Y., C. J. Willer, J. Ding, P. Scheet and G. R. Abecasis (2010): “MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes,” Genet. Epidemiol., 34, 816-834.;
[21] Macciotta, N. P. P., G. Gaspa, R. Steri, C. Pieramati, P. Carnier and C. Dimauro (2009): “Pre-selection of most significant snps for the estimation of genomic breeding values,” BMC Proc., 3, 1-4.;
[22] Meuwissen, T. H. E., B. J. Hayes and M. E. Goddard (2001): “Prediction of total genetic value using genome-wide dense marker maps,” Genetics, 157, 1819-1829.;
[23] Mevik, B.-H., R. Wehrens and K. H. Liland (2011): pls: Partial Least Squares and Principal Component Regression, R package version 2.3-0.;
[24] Morris, A. P. and L. R. Cardon (2007): Whole Genome Association. In: D. J. Balding, M. Bishop, and C. Cannings. (Eds.), Handbook of Statistical Genetics, 3rd edition. Hoboken, NJ: Wiley.;
[25] Park, T. and G. Casella (2008): “The Bayesian Lasso,” J. Am. Stat. Assoc., 103, 681-686.; · Zbl 1330.62292
[26] Pearl, J. (1988): Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco, CA: Morgan Kaufmann.; · Zbl 0746.68089
[27] Piepho, H.-P. (2009): “Ridge regression and extensions for genomewide selection in maize,” Crop Sci., 49, 1165-1176.;
[28] Piepho, H.-P., J. O. Ogutu, T. Schulz-Streeck, B. Estaghvirou, A. Gordillo and F. Technow (2012): “Efficient computation of ridge-regression best linear unbiased prediction in genomic selection in plant breeding,” Crop Sci., 52, 1093-1104.;
[29] Purcell, S., B. Neale, K. Todd-Brown, L. Thomas, M. A. Ferreira, D. Bender, J. Mailer, P. Sklar, P. I. de Bakker, M. J. Daly and P. C. Sham (2007): “PLINK: a tool set for whole-genome association and population-based linkage analyses,” Am. J. Hum. Genet., 81, 559-575.;
[30] Rostoks, N., L. Ramsay, K. MacKenzie, L. Cardle, P. R. Bhat, M. L. Roose, J. T. Svensson, N. Stein, R. K. Varshney, D. F. Marshall, A. Graner, T. J. Close and R. Waugh (2006): “Recent history of artificial outcrossing facilitates whole-genome association mapping in elite inbred crop varieties,” PNAS, 106, 18656-18661.;
[31] Schulz-Streeck, T., J. Ogutu and H.-P. Piepho (2011): “Pre-selection of markers for genomic selection,” BMC Proc., 5, S12.;
[32] Scutari, M. (2010): “Learning Bayesian networks with the bnlearn R package,” J. Stat. Soft., 35, 1-22.;
[33] Scutari, M. and A. Brogini (2012): “Bayesian network structure learning with permutation tests,” Commun. Stat. Theory, 41, 3233-3243, special Issue “Statistics for Complex Problems: Permutation Testing Methods and Related Topics”. Proceedings of the Conference “Statistics for Complex Problems: the Multivariate Permutation Approach and Related Topics”, Padova, June 14-15, 2010.; · Zbl 1296.62044
[34] Solberg, L. C., W. Valdar, D. Gauguier, G. Nunez, A. Taylor, S. Burnett, C. Arboledas-Hita, P. Hernandez-Pliego, S. Davidson, P. Burns, S. Bhattacharya, T. Hough, D. Higgs, P. K. W. O. Cookson, Y. Zhang, R. M. Deacon, J. N. Rawlins, R. Mott and J. Flint (2006): “A protocol for high-throughput phenotyping, suitable for quantitative trait analysis in mice,” Mamm. Genome, 17, 129-146.;
[35] Speed, D., G. Hermani, M. R. Johnson and D. J. Balding (2012): “Improved heritability estimation from genome-wide SNPs,” Am. J. Hum. Genet., 91, 1011-1021.;
[36] Speed, D., G. Hermani, M. R. Johnson, and D. J. Balding (2013): LDAK, .;
[37] Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,” J. Roy. Stat. Soc. B, 58, 267-288.; · Zbl 0850.62538
[38] Valdar, W., L. C. Solberg, D. Gauguier, S. Burnett, P. Klenerman, W. O. Cookson, M. S. Taylor, J. N. Rawlins, R. Mott and J. Flint (2006): “Genome-wide genetic association of complex traits in heterogeneous stock mice,” Nat. Genet., 8, 879-887.;
[39] VanRaden, P. (2008): “Efficient methods to compute genomic predictions,” J. Dairy Sci., 91, 4414-4423.;
[40] Vazquez, A. I., G. de los Campos, Y. C. Klimentidis, G. J. M. Rosa, D. Gianola, N. Yi and D. B. Allison (2012): “A comprehensive genetic approach for improving prediction of skin cancer risk in humans,” Genetics, 192, 1493-1502.;
[41] Waugh, R., D. Marshall, B. Thomas, J. Comadran, J. Russell, T. Close, N. Stein, P. Hayes, G. Muehlbauer, J. Cockram, D. O’Sullivan, I. Mackay, A. Flavell, AGOUEB, BarleyCAP and L. Ramsay (2010): “Whole-genome association mapping in elite inbred crop varieties,” Genome, 53, 967-972.;
[42] Wimmer, V., T. Albrecht, H.-J. Auinger and C.-C. Schön (2012): “synbreed: framework for the analysis of genomic prediction data using R,” Bioinformatics, 18, 2086-2087.;
[43] Zhao, K., C. Tung, G. C. Eizenga, M. H. Wright, M. L. Ali, A. H. Price, G. J. Norton, M. R. Islam, A. Reynolds, J. Mezey, A. M. McClung, C. D. Bustamante and S. R. McCouch (2011): “Genome-wide association mapping reveals a rich genetic architecture of complex traits in oryza sativa,” Nat. Commun., 2, 467.;
[44] Zou, H. and T. Hastie (2005): “Regularization and variable selection via the elastic net,” J. Roy. Stat. Soc. B, 67, 301-320.; · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.