Selectiongain: an R package for optimizing multi-stage selection. (English) Zbl 1342.65050

Summary: Multi-stage selection is practised in numerous fields of the life sciences and particularly in breeding. A special characteristic of multi-stage selection is that candidates are evaluated in successive stages with increasing intensity and efforts, and only a fraction of the superior candidates is selected and promoted to the next stage. For the optimum design of such selection programs, the selection gain \(\varDelta G(y)\) plays a central role. It can be calculated by integration of a truncated multivariate normal distribution. While mathematical formulas for calculating \(\varDelta G(y)\) and \(\psi (y)\), the variance among the selected candidates, were developed a long time ago, solutions and software for numerical calculations were not available. We developed the R package selectiongain for efficient and precise calculation of \(\varDelta G(y)\) and \(\psi (y)\) for (i) a given matrix \(\boldsymbol{\Sigma}^*\) of correlations among the unobservable target character and the selection criteria and (ii) given coordinates \(\mathbf Q\) of the truncation point or the selected fractions \(\boldsymbol{\alpha}\) in each stage. In addition, our software can be used for optimizing multi-stage selection programs under a given total budget and different costs of evaluating the candidates in each stage. Besides a detailed description of the functions of the software, the package is illustrated with two examples.


65C60 Computational problems in statistics (MSC2010)
62-04 Software, source code, etc. for problems pertaining to statistics
62L10 Sequential statistical analysis
62F07 Statistical ranking and selection procedures
Full Text: DOI


[1] Brent R (1973) Algorithms for minimization without derivatives. Prentice-Hall, Englewood Cliffs, New Jersey · Zbl 0245.65032
[2] Cochran WG (1951) Improvement by means of selection. In: Proceedings of Second Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, pp 449-470
[3] Falconer DS, Mackay TFC (1996) Introduction to quantitative genetics, 4th edn. Longman Publishing Group, London
[4] Genz, A; Bretz, F, Numerical computation of multivariate \(t\)-probabilities with application to power calculation of multiple contrasts, J Stat Comput Simul, 63, 361-378, (1999) · Zbl 0934.62020
[5] Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, Hothorn T (2011) mvtnorm: multivariate normal and t distributions. R package version 0.9-9995
[6] Kim J (1997) Iterated grid search algorithm on unimodal criteria. PhD thesis, Virginia Polytechnic Institute and State University
[7] Longin, CFH; Utz, HF; Reif, JC; Wegenast, T; Schipprack, W; Melchinger, AE, Hybrid maize breeding with doubled haploids: III. efficiency of early testing prior to doubled haploid production in two-stage selection for testcross performance, Theor Appl Genet, 115, 519-527, (2007)
[8] Lynch M, Walsh B (1997) Genetics and analysis of quantitative traits. Sinauer Associates Inc, Sunderland
[9] Mi, X; Miwa, T; Hothorn, T, Mvtnorm: new numerical algorithm for multivariate normal probabilities, R J, 1, 37-39, (2009)
[10] Mi, X; Wegenast, T; Utz, HF; Dhillon, BS; Melchinger, AE, Best linear unbiased prediction and optimum allocation of test resources in maize breeding with doubled haploids, Theor Appl Genet, 123, 1-10, (2011)
[11] Miwa, T; Hayter, AJ; Kuriki, S, The evaluation of general non-centred orthant probabilities, J R Stat Soc B, 65, 223-234, (2003) · Zbl 1063.62082
[12] Press WH, Teukolsky SA, Vetterling WT, Flannery BP (1993) Numerical recipes in FORTRAN; the art of scientific computing, 2nd edn. Cambridge University Press, New York · Zbl 0778.65002
[13] R Core Team (2013) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org · Zbl 0934.62020
[14] Ron L, Bruce H (2009) Calculus, 9th edn. Brooks/Cole Publishing, Los Angeles
[15] Shi, J; Zhou, S, Quality control and improvement for multistage systems : a survey, IIE Trans, 41, 744-753, (2009)
[16] Tallis, GM, The moment generating function of the truncated multi-normal distribution, J R Stat Soc B, 23, 223-229, (1961) · Zbl 0107.14206
[17] Villet, S; Pichoud, C; Villeneuve, JP; Trepo, C; Zoulim, F, Selection of a multiple drug-resistant hepatitis b virus strain in a liver-transplanted patient, Gastroenterology, 131, 1253-1261, (2006)
[18] Wegenast, T; Utz, HF; Longin, CFH; Maurer, HP; Dhillon, BS; Melchinger, AE, Hybrid maize breeding with doubled haploids: V. selection strategies for testcross performance with variable sizes of crosses and \(s_1\) families, Theor Appl Genet, 121, 1391-1393, (2010)
[19] West-Eberhard, MJ, Sexual selection, social competition, and speciation, Q Rev Biol, 58, 155-183, (1983)
[20] Xu, S; Martin, TG; Muir, WM, Multistage selection for maximum economic return with an application to beef cattle breeding, J Anim Sci, 73, 699-710, (1995)
[21] Yan, W; Clack, CD, Evolving robust gp solutions for hedge fund stock selection in emerging markets, Soft Comput, 15, 37-50, (2011)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.