×

Approximately optimal subset selection for statistical design and modelling. (English) Zbl 07497078

Summary: We study the problem of optimal subset selection from a set of correlated random variables. In particular, we consider the associated combinatorial optimization problem of maximizing the determinant of a symmetric positive definite matrix that characterizes the chosen subset. This problem arises in many domains, such as experimental designs, regression modelling, and environmental statistics. However, finding the exact solution to this optimization problem is shown to be computationally intractable for large datasets. In this paper, we establish an efficient polynomial-time algorithm using the determinantal point process to approximate the optimal solution to the problem. We demonstrate the advantages of our methods by presenting examples from D-optimal designs of experiemnts for regression modelling and maximum entropy based designs for spatial monitoring networks.

MSC:

62-XX Statistics

Software:

EnviroStat; R
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Zidek, JV; Zimmerman, DL.; Gelfand, AE; Fuentes, M.; Hoeting, JA, Handbook of environmental and ecological statistics, Monitoring network design, 499-522 (2019), Boca Raton, Florida: Chapman and Hall/CRC, Boca Raton, Florida · Zbl 1409.62013
[2] Caselton, WF; Zidek, JV., Optimal monitoring network designs, Stat Probab Lett, 2, 4, 223-227 (1984) · Zbl 0547.94002
[3] Ko, C-W; Lee, J.; Queyranne, M., An exact algorithm for maximum entropy sampling, Oper Res, 43, 4, 684-691 (1995) · Zbl 0857.90069
[4] Le, ND; Zidek, JV., Statistical analysis of environmental space-time processes (2006), New York City, New Y: Springer, New York City, New Y · Zbl 1102.62126
[5] Le, ND, Zidek, JV, White, R, et al.. Envirostat: statistical analysis of environmental space-time processes. Vienna, Austria: R foundation for statistical computing. 2015.
[6] Anstreicher, KM; Fampa, M.; Lee, J.; Cunningham, WH; McCormick, ST; Queyranne, M., International conference on integer programming and combinatorial optimization, Continuous relaxations for constrained maximum-entropy sampling, 234-248 (1996), Berlin, Heidelberg: Springer, Berlin, Heidelberg · Zbl 1415.90057
[7] Anstreicher, KM; Fampa, M.; Lee, J., Using continuous nonlinear relaxations to solve constrained maximum-entropy sampling problems, Math Program, 85, 2, 221-240 (1999) · Zbl 0954.90048
[8] Hoffman, A.; Lee, J.; Williams, J.; Atkinson, AC; Hackl, P.; Muller, WG, mODa 6 advances in model-oriented design and analysis, New upper bounds for maximum-entropy sampling, 143-153 (2001), Berlin, Heidelberg: Springer, Berlin, Heidelberg
[9] Fedorov, V.; Lee, J.; Wolkowicz, H.; Saigal, R.; Vandenberghe, L., Handbook of semidefinite programming. International series in operations research and management science, 27, Design of experiemnts in statistics, 511-532 (2000), Boston, MA: Springer, Boston, MA · Zbl 0962.90001
[10] Lee, J.; Williams, J., A linear integer programming bound for maximum-entropy sampling, Math Program, 94, 2-3, 247-256 (2003) · Zbl 1030.90063
[11] El-Shaarawi, AH; Piegorsch, WW., Encyclopedia of environmetrics, Vol. 1 (2001), Hoboken, New Jersey: Wiley, Hoboken, New Jersey · Zbl 1009.62104
[12] Anstreicher, KM., Efficient solution of maximum-entropy sampling problems, Oper Res, 68, 6, 1826-1835 (2020) · Zbl 1457.90107
[13] Mitchell, TJ., An algorithm for the construction of d-optimal experimental designs, Technometrics, 42, 1, 48-54 (2000)
[14] Mitchell, TJ., Computer construction of d-optimal first-order designs, Technometrics, 16, 2, 211-220 (1974) · Zbl 0299.62040
[15] Guttorp, P.; Le, ND; Sampson, PD.; Patil, GP; Rao, CR; Ross, NP, Multivariate environmental statistics, Using entropy in the redesign of an environmental monitoring network, 175-202 (1993), New York: North Holland/Elsevier Science, New York · Zbl 0828.62100
[16] Ruiz-Cárdenas, R.; Ferreira, MAR; Schmidt, AM., Stochastic search algorithms for optimal design of monitoring networks, Environ Off J Int Environ Soc, 21, 1, 102-112 (2010)
[17] Holland, JH., Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence (1992), Cambridge, Massachusetts: MIT Press, Cambridge, Massachusetts
[18] Goldberg, DE; Holland, JH., Machine Learning, 3, 2, 95-99 (1988)
[19] Whitley, D., A genetic algorithm tutorial, Stat Comput, 4, 2, 65-85 (1994)
[20] Goldberg, DE; Deb, K.; Rawlins, GJE, Foundations of genetic algorithms, 1, A Comparative Analysis of Selection Schemes Used in Genetic Algorithms, 69-93 (1991), Amsterdam, The Netherlands: Elsevier, Amsterdam, The Netherlands
[21] Macchi, O., The coincidence approach to stochastic point processes, Adv Appl Probab, 7, 1, 83-122 (1975) · Zbl 0366.60081
[22] Borodin, A.; Olshanski, G., Distributions on partitions, point processes, and the hypergeometric kernel, Commun Math Phys, 211, 2, 335-358 (2000) · Zbl 0966.60049
[23] Hough, JB; Krishnapur, M.; Peres, Y., Determinantal processes and independence, Probab Surv, 3, 206-229 (2006) · Zbl 1189.60101
[24] Kulesza, A, Taskar, B. Determinantal point processes for machine learning. Preprint arXiv:1207.6083; 2012. · Zbl 1278.68240
[25] Borodin, A.; Rains, EM., Eynard-Mehta theorem, schur process, and their pfaffian analogs, J Stat Phys, 121, 3-4, 291-317 (2005) · Zbl 1127.82017
[26] Atkinson, A.; Donev, A.; Tobias, R., Optimum experimental designs, with SAS, Vol. 34 (2007), Oxford, England: Oxford University Press, Oxford, England · Zbl 1183.62129
[27] Fedorov, VV., Theory of optimal experiments (2013), Amsterdam, The Netherlands: Elsevier, Amsterdam, The Netherlands
[28] Casquilho-Resende, CM; Le, ND; Zidek, JV, Design of monitoring networks using k-determinantal point processes, Environmetrics, 29, 1, e2483 (2018)
[29] R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R foundation for statistical computing. 2020.
[30] Li, C.; Jegelka, S.; Sra, S.; Robert, CC; Gretton, A., Proceedings of the 19th International Conference on Artificial intelligence and statistics, Efficient sampling for k-determinantal point processes, 1328-1337 (2016), Cadiz, Spain: PMLR, Cadiz, Spain
[31] Affandi, RH; Kulesza, A.; Fox, E.; Carvalho, CM; Ravikumar, P., Proceedings of the 16th International Conference on Artificial Intelligence and Statistics, Nystrom approximation for large-scale determinantal processes, 85-98 (2013), Scottsdale, Arizona: PMLR, Scottsdale, Arizona
[32] Kulesza, A.; Taskar, B.; Lafferty, J.; Williams, C.; Shawe-Taylor, J., Advances in neural information processing systems, Structured determinantal point processes, 1171-1179 (2010), Red Hook, New York: Curran Associates Inc, Red Hook, New York
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.