×

Bayesian joint modeling of chemical structure and dose response curves. (English) Zbl 1478.62337

Summary: Today there are approximately 85,000 chemicals regulated under the Toxic Substances Control Act, with around 2,000 new chemicals introduced each year. It is impossible to screen all of these chemicals for potential toxic effects, either via full organism in vivo studies or in vitro high-throughput screening (HTS) programs. Toxicologists face the challenge of choosing which chemicals to screen, and predicting the toxicity of as yet unscreened chemicals. Our goal is to describe how variation in chemical structure relates to variation in toxicological response to enable in silico toxicity characterization designed to meet both of these challenges. With our Bayesian partially Supervised Sparse and Smooth Factor Analysis \((\text{BS}^3 \text{FA})\) model, we learn a distance between chemicals targeted to toxicity, rather than one based on molecular structure alone. Our model also enables the prediction of chemical dose-response profiles based on chemical structure (i.e., without in vivo or in vitro testing) by taking advantage of a large database of chemicals that have already been tested for toxicity in HTS programs. We show superior simulation performance in distance learning and modest to large gains in predictive ability compared to existing methods. Results from the high-throughput screening data application elucidate the relationship between chemical structure and a toxicity-relevant high-throughput assay. An R package for \(\text{BS}^3\text{FA}\) is available online at https://github.com/kelrenmor/bs3fa.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62M20 Inference from stochastic processes and prediction
62H25 Factor analysis and principal components; correspondence analysis
62R10 Functional data analysis

Software:

R; GitHub; JIVE; bs3fa

References:

[1] Barber, R. F., Reimherr, M. and Schill, T. (2017). The function-on-scalar LASSO with applications to longitudinal GWAS. Electron. J. Stat. 11 1351-1389. · Zbl 1362.62084 · doi:10.1214/17-EJS1260
[2] Bhattacharya, A. and Dunson, D. B. (2011). Sparse Bayesian infinite factor models. Biometrika 98 291-306. · Zbl 1215.62025 · doi:10.1093/biomet/asr013
[3] Canale, A. and Dunson, D. B. (2013). Nonparametric Bayes modelling of count processes. Biometrika 100 801-816. · Zbl 1279.62202 · doi:10.1093/biomet/ast037
[4] Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97 465-480. · Zbl 1406.62021 · doi:10.1093/biomet/asq017
[5] Chen, Y., Goldsmith, J. and Ogden, R. T. (2016). Variable selection in function-on-scalar regression. Stat 5 88-101. · Zbl 07848553 · doi:10.1002/sta4.106
[6] Dhaliwal, L. K., Suri, V., Gupta, K. R. and Sahdev, S. (2011). Tamoxifen: An alternative to clomiphene in women with polycystic ovary syndrome. Journal of Human Reproductive Sciences 4 76.
[7] Dix, D. J., Houck, K. A., Martin, M. T., Richard, A. M., Setzer, R. W. and Kavlock, R. J. (2007). The ToxCast program for prioritizing toxicity testing of environmental chemicals. Toxicol. Sci. 95 5-12. · doi:10.1093/toxsci/kfl103
[8] Docampo, R. and Moreno, S. N. (1990). The metabolism and mode of action of gentian violet. Drug Metab. Rev. 22 161-178.
[9] Durante, D. (2017). A note on the multiplicative gamma process. Statist. Probab. Lett. 122 198-204. · Zbl 1463.62160 · doi:10.1016/j.spl.2016.11.014
[10] Fan, Z. and Reimherr, M. (2017). High-dimensional adaptive function-on-scalar regression. Econom. Stat. 1 167-183. · doi:10.1016/j.ecosta.2016.08.001
[11] Hahn, P. R. and Carvalho, C. M. (2015). Decoupling shrinkage and selection in Bayesian linear models: A posterior summary perspective. J. Amer. Statist. Assoc. 110 435-448. · Zbl 1373.62036 · doi:10.1080/01621459.2014.993077
[12] Hong, H., Xie, Q., Ge, W., Qian, F., Fang, H., Shi, L., Su, Z., Perkins, R. and Tong, W. (2008). Mold2, molecular descriptors from 2D structures for chemoinformatics and toxicoinformatics. J. Chem. Inf. Model. 48 1337-1344.
[13] Hong, H., Slavov, S., Ge, W., Qian, F., Su, Z., Fang, H., Cheng, Y., Perkins, R., Shi, L. et al. (2012). Mold2 molecular descriptors for QSAR. In Statistical Modelling of Molecular Descriptors in QSAR/QSPR 2 65-109.
[14] Judson, R. S., Houck, K. A., Kavlock, R. J., Knudsen, T. B., Martin, M. T., Mortensen, H. M., Reif, D. M., Rotroff, D. M., Shah, I. et al. (2009). In vitro screening of environmental chemicals for targeted testing prioritization: The ToxCast project. Environ. Health Perspect. 118 485-492.
[15] Kavlock, R., Chandler, K., Houck, K., Hunter, S., Judson, R., Kleinstreuer, N., Knudsen, T., Martin, M., Padilla, S. et al. (2012). Update on EPA’s ToxCast program: Providing high throughput decision support tools for chemical risk management. Chem. Res. Toxicol. 25 1287-1302.
[16] Kliewer, S. A., Goodwin, B. and Willson, T. M. (2002). The nuclear pregnane X receptor: A key regulator of xenobiotic metabolism. Endocr. Rev. 23 687-702. · doi:10.1210/er.2001-0038
[17] Knowles, D. and Ghahramani, Z. (2011). Nonparametric Bayesian sparse factor models with application to gene expression modeling. Ann. Appl. Stat. 5 1534-1552. · Zbl 1223.62013 · doi:10.1214/10-AOAS435
[18] Kowal, D. R.and Bourgeois, D. C. (2020). Bayesian function-on-scalars regression for high-dimensional data. J. Comput. Graph. Statist. 29 629-638. · Zbl 07499302 · doi:10.1080/10618600.2019.1710837
[19] Li, G., Shen, H. and Huang, J. Z. (2016). Supervised sparse and functional principal component analysis. J. Comput. Graph. Statist. 25 859-878. · doi:10.1080/10618600.2015.1064434
[20] Liu, R., Rallo, R., George, S., Ji, Z., Nair, S., Nel, A. E. and Cohen, Y. (2011). Classification NanoSAR development for cytotoxicity of metal oxide nanoparticles. Small 7 1118-1126.
[21] Lock, E. F., Hoadley, K. A., Marron, J. S. and Nobel, A. B. (2013). Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Ann. Appl. Stat. 7 523-542. · Zbl 1454.62355 · doi:10.1214/12-AOAS597
[22] Low-Kam, C., Telesca, D., Ji, Z., Zhang, H., Xia, T., Zink, J. I. and Nel, A. E. (2015). A Bayesian regression tree approach to identify the effect of nanoparticles’ properties on toxicity profiles. Ann. Appl. Stat. 9 383-401. · Zbl 1454.62356 · doi:10.1214/14-AOAS797
[23] Makalic, E. and Schmidt, D. F. (2016). A simple sampler for the horseshoe estimator. IEEE Signal Process. Lett. 23 179-182.
[24] Martin, Y. C., Kofron, J. L. and Traphagen, L. M. (2002). Do structurally similar molecules have similar biological activity? J. of Med. Chem. 45 4350-4358.
[25] Meng, J., Zhang, J., Qi, Y., Chen, Y. and Huang, Y. (2010). Uncovering transcriptional regulatory networks by sparse Bayesian factor model. EURASIP J. Adv. Signal Process. 2010 3.
[26] Meyer, M. J., Coull, B. A., Versace, F., Cinciripini, P. and Morris, J. S. (2015). Bayesian function-on-function regression for multilevel functional data. Biometrics 71 563-574. · Zbl 1419.62408 · doi:10.1111/biom.12299
[27] Montagna, S., Tokdar, S. T., Neelon, B. and Dunson, D. B. (2012). Bayesian latent factor regression for functional and longitudinal data. Biometrics 68 1064-1073. · Zbl 1258.62030 · doi:10.1111/j.1541-0420.2012.01788.x
[28] Moran, K. R., Dunson, D., Wheeler, M. W. and Herring, A. H. (2021). Supplement to “Bayesian joint modeling of chemical structure and dose response curves.” https://doi.org/10.1214/21-AOAS1461SUPPA, https://doi.org/10.1214/21-AOAS1461SUPPB, https://doi.org/10.1214/21-AOAS1461SUPPC
[29] Nikolova, N. and Jaworska, J. (2003). Approaches to measure chemical similarity—a review. QSAR & Combinatorial Science 22 1006-1026.
[30] O’Connell, M. J. and Lock, E. F. (2019). Linked matrix factorization. Biometrics 75 582-592. · Zbl 1436.62611 · doi:10.1111/biom.13010
[31] Patel, T., Telesca, D., Low-Kam, C., Ji, Z. X., Zhang, H. Y., Xia, T., Zinc, J. I. and Nel, A. E. (2014). Relating nano-particle properties to biological outcomes in exposure escalation experiments. Environmetrics 25 57-68. · Zbl 1525.62197 · doi:10.1002/env.2246
[32] Pati, D., Bhattacharya, A., Pillai, N. S. and Dunson, D. (2014). Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann. Statist. 42 1102-1130. · Zbl 1305.62124 · doi:10.1214/14-AOS1215
[33] Ray, P., Zheng, L., Lucas, J. and Carin, L. (2014). Bayesian joint analysis of heterogeneous genomics data. Bioinformatics 30 1370-1376.
[34] Seyedoshohadaei, F., Zandvakily, F. and Shahgeibi, S. (2012). Comparison of the effectiveness of clomiphene citrate, tamoxifen and letrozole in ovulation induction in infertility due to isolated unovulation. Iran. J. Reprod. Med. 10 531-536.
[35] Srivastava, S., Sinha, R. and Roy, D. (2004). Toxicological effects of malachite green. Aquat. Toxicol. 66 319-329.
[36] Weininger, D. (1988). SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28 31-36.
[37] Wheeler, M. W. (2019). Bayesian additive adaptive basis tensor product models for modeling high dimensional surfaces: An application to high-throughput toxicity testing. Biometrics 75 193-201. · Zbl 1436.62650
[38] Wilson, A., Reif, D. M. and Reich, B. J. (2014). Hierarchical dose-response modeling for high-throughput toxicity screening of environmental chemicals. Biometrics 70 237-246. · Zbl 1419.62477 · doi:10.1111/biom.12114
[39] Yoshida, R. and West, M. (2010). Bayesian learning in sparse graphical factor models via variational mean-field annealing. J. Mach. Learn. Res. 11 1771-1798 · Zbl 1242.68261
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.