Gramacy, Robert B.; Taddy, Matt; Wild, Stefan M. Variable selection and sensitivity analysis using dynamic trees, with an application to computer code performance tuning. (English) Zbl 1454.62239 Ann. Appl. Stat. 7, No. 1, 51-80 (2013). Summary: We investigate an application in the automatic tuning of computer codes, an area of research that has come to prominence alongside the recent rise of distributed scientific processing and heterogeneity in high-performance computing environments. Here, the response function is nonlinear and noisy and may not be smooth or stationary. Clearly needed are variable selection, decomposition of influence, and analysis of main and secondary effects for both real-valued and binary inputs and outputs. Our contribution is a novel set of tools for variable selection and sensitivity analysis based on the recently proposed dynamic tree model. We argue that this approach is uniquely well suited to the demands of our motivating example. In illustrations on benchmark data sets, we show that the new techniques are faster and offer richer feature sets than do similar approaches in the static tree and computer experiment literature. We apply the methods in code-tuning optimization, examination of a cold-cache effect, and detection of transformation errors. Cited in 4 Documents MSC: 62K20 Response surface designs 62F15 Bayesian inference 62G08 Nonparametric regression and quantile regression Keywords:sensitivity analysis; variable selection; Bayesian methods; Bayesian regression trees; CART; exploratory data analysis; particle filtering; computer experiments Software:dynaTree; ElemStatLearn; GUI-HDMR; reglogit; BartPy; tgp; gss; BLAS; UCI-ml; EGO; BayesTree; glmnet PDFBibTeX XMLCite \textit{R. B. Gramacy} et al., Ann. Appl. Stat. 7, No. 1, 51--80 (2013; Zbl 1454.62239) Full Text: DOI arXiv Euclid References: [1] Asuncion, A. and Newman, D. J. (2007). UCI machine learning repository. Available at . [2] Balaprakash, P., Wild, S. M. and Hovland, P. D. (2011). Can search algorithms save large-scale automatic performance tuning? Procedia Computer Science 4 2136-2145. [3] Balaprakash, P., Wild, S. M. and Norris, B. (2012). SPAPT: Search problems in automatic performance tuning. Procedia Computer Science 9 1959-1968. [4] Bastos, L. S. and O’Hagan, A. (2009). Diagnostics for Gaussian process emulators. Technometrics 51 425-438. [5] Bayarri, M. J., Berger, J. O., Kennedy, M. C., Kottas, A., Paulo, R., Sacks, J., Cafeo, J. A., Lin, C.-H. and Tu, J. (2009). Predicting vehicle crashworthiness: Validation of computer models for functional and hierarchical data. J. Amer. Statist. Assoc. 104 929-943. · Zbl 1388.62382 [6] Blackford, L. S., Demmel, J., Dongarra, J., Duff, I., Hammarling, S., Henry, G., Heroux, M., Kaufman, L., Lumsdaine, A., Petitet, A., Pozo, R., Remington, K. and Whaley, R. C. (2002). An updated set of basic linear algebra subprograms (BLAS). ACM Trans. Math. Software 28 135-151. · Zbl 1070.65520 [7] Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics 37 373-384. · Zbl 0862.62059 [8] Breiman, L., Friedman, J. H., Olshen, R. and Stone, C. (1984). Classification and Regression Trees . Wadsworth, Belmont, CA. · Zbl 0541.62042 [9] Cantoni, E., Flemming, J. M. and Ronchetti, E. (2011). Variable selection in additive models by non-negative garrote. Stat. Model. 11 237-252. · Zbl 05933702 [10] Carvalho, C. M., Johannes, M. S., Lopes, H. F. and Polson, N. G. (2010). Particle learning and smoothing. Statist. Sci. 25 88-106. · Zbl 1328.62541 [11] Chipman, H. A., George, E. I. and McCulloch, R. E. (1998). Bayesian CART model search (with discussion). J. Amer. Statist. Assoc. 93 935-960. · Zbl 1072.62650 [12] Chipman, H. A., George, E. I. and McCulloch, R. E. (2002). Bayesian treed models. Machine Learning 48 303-324. · Zbl 0820.68098 [13] Chipman, H. A., George, E. I. and McCulloch, R. E. (2010). BART: Bayesian additive regression trees. Ann. Appl. Stat. 4 266-298. · Zbl 1189.62066 [14] Farah, M. and Kottas, A. (2011). Bayesian inference for sensitivity analysis of computer simulators, with an application to radiative transfer models. Technical Report UCSC-SOE-10-15, Univ. California, Santa Cruz. [15] Friedman, J. H. (1991). Multivariate adaptive regression splines. Ann. Statist. 19 1-141. · Zbl 0765.62064 [16] Friedman, J. H., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33 1-22. [17] George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881-889. [18] Gramacy, R. B. and Polson, N. G. (2011). Particle learning of Gaussian process models for sequential design and optimization. J. Comput. Graph. Statist. 20 102-118. [19] Gramacy, R. B. and Polson, N. G. (2012). Simulation-based regularized logistic regression. Bayesian Anal. 7 1-24. · Zbl 1330.62301 [20] Gramacy, R. B. and Taddy, M. A. (2010). Categorical inputs, sensitivity analysis, optimization and importance tempering with tgp version 2, an R package for treed Gaussian process models. Journal of Statistical Software 33 1-48. [21] Gramacy, R. B. and Taddy, M. A. (2011). dynaTree: Dynamic trees for learning and design. R package version 2.0. · Zbl 1396.62158 [22] Gu, C. (2002). Smoothing Spline ANOVA Models . Springer, New York. · Zbl 1051.62034 [23] Haaland, B. and Qian, P. Z. G. (2011). Accurate emulators for large-scale computer experiments. Ann. Statist. 39 2974-3002. · Zbl 1246.65172 [24] Hartono, A., Norris, B. and Sadayappan, P. (2009). Annotation-based empirical performance tuning using orio. In Proceedings of the IEEE International Symposium on Parallel Distributed Processing , 2009 ( IPDPS 2009) 1-11. IEEE, New York. [25] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning : Data Mining , Inference , and Prediction , 2nd ed. Springer, New York. · Zbl 1273.62005 [26] Huang, J., Horowitz, J. L. and Wei, F. (2010). Variable selection in nonparametric additive models. Ann. Statist. 38 2282-2313. · Zbl 1202.62051 [27] Jones, D. R., Schonlau, M. and Welch, W. J. (1998). Efficient global optimization of expensive black-box functions. J. Global Optim. 13 455-492. · Zbl 0917.90270 [28] Krishnapuram, B., Carin, L., Figueiredo, M. and Hartemink, A. (2005). Sparse multinomial logistic regression: Fast algorithms and generalization bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 957-969. [29] Lee, H. K. H., Sansó, B., Zhou, W. and Higdon, D. M. (2008). Inference for a proton accelerator using convolution models. J. Amer. Statist. Assoc. 103 604-613. · Zbl 1469.62415 [30] Linkletter, C., Bingham, D., Hengartner, N., Higdon, D. and Ye, K. Q. (2006). Variable selection for Gaussian process models in computer experiments. Technometrics 48 478-490. [31] Maity, A. and Lin, X. (2011). Powerful tests for detecting a gene effect in the presence of possible gene-gene interactions using garrote kernel machines. Biometrics 67 1271-1284. · Zbl 1274.62834 [32] Marrel, A., Iooss, B., Laurent, B. and Roustant, O. (2009). Calculations of Sobol indices for the Gaussian process metamodel. Reliability Engineering and System Safety 94 742-751. [33] Morgan, J. N. and Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal. J. Amer. Statist. Assoc. 58 415-434. · Zbl 0114.10103 [34] Morris, R. D., Kottas, A., Taddy, M., Furfaro, R. and Ganapol, B. (2008). A statistical framework for the sensitivity analysis of radiative transfer models. IEEE Transactions on Geoscience and Remote Sensing 12 4062-4074. [35] Oakley, J. E. and O’Hagan, A. (2004). Probabilistic sensitivity analysis of complex models: A Bayesian approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 751-769. · Zbl 1046.62027 [36] Patterson, D. A. and Hennessy, J. L. (2007). Computer Organization and Design-the Hardware / Software Interface , 3rd ed. Morgan Kaufmann, Boston. · Zbl 0833.68020 [37] Reich, B. J., Storlie, C. B. and Bondell, H. D. (2009). Variable selection in Bayesian smoothing spline ANOVA models: Application to deterministic computer codes. Technometrics 51 110-120. [38] Saltelli, A. (2002). Making best use of model evaluations to compute sensitivity indices. Comput. Phys. Comm. 145 280-297. · Zbl 0998.65065 [39] Saltelli, A., Chan., K. and Scott, E. M., eds. (2000). Sensitivity Analysis . Wiley, Chichester. · Zbl 0961.62091 [40] Saltelli, A. and Tarantola, S. (2002). On the relative importance of input factors in mathematical models: Safety assessment for nuclear waste disposal. J. Amer. Statist. Assoc. 97 702-709. · Zbl 1073.62602 [41] Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M. and Tarantola, S. (2008). Global Sensitivity Analysis. The Primer . Wiley, Chichester. · Zbl 1161.00304 [42] Sang, H. and Huang, J. Z. (2012). A full scale approximation of covariance functions for large spatial data sets. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 111-132. [43] Santner, T. J., Williams, B. J. and Notz, W. I. (2003). The Design and Analysis of Computer Experiments . Springer, New York. · Zbl 1041.62068 [44] Storlie, C. B., Swiler, L. P., Helton, J. C. and Sallaberry, C. J. (2009). Implementation and evaluation of nonparametric regression procedures for sensitivity analysis of computationally demanding models. Reliability Engineering & System Safety 94 1735-1763. [45] Taddy, M. A., Gramacy, R. B. and Polson, N. G. (2011). Dynamic trees for learning and design. J. Amer. Statist. Assoc. 106 109-123. · Zbl 1396.62158 [46] Taddy, M. A., Lee, H. K. H., Gray, G. A. and Griffin, J. D. (2009). Bayesian guided pattern search for robust local optimization. Technometrics 51 389-401. [47] Yi, G., Shi, J. Q. and Choi, T. (2011). Penalized Gaussian process regression and classification for high-dimensional nonlinear data. Biometrics 67 1285-1294. · Zbl 1274.62912 [48] Ziehn, T. and Tomlin, A. S. (2009). GUI-HDMR-a software tool for global sensitivity analysis of complex models. Environmental Modelling and Software 24 775-785. This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.