×

zbMATH — the first resource for mathematics

Bayesian deep convolutional encoder-decoder networks for surrogate modeling and uncertainty quantification. (English) Zbl 1407.62091
Summary: We are interested in the development of surrogate models for uncertainty quantification and propagation in problems governed by stochastic PDEs using a deep convolutional encoder-decoder network in a similar fashion to approaches considered in deep learning for image-to-image regression tasks. Since normal neural networks are data-intensive and cannot provide predictive uncertainty, we propose a Bayesian approach to convolutional neural nets. A recently introduced variational gradient descent algorithm based on Stein’s method is scaled to deep convolutional networks to perform approximate Bayesian inference on millions of uncertain network parameters. This approach achieves state of the art performance in terms of predictive accuracy and uncertainty quantification in comparison to other approaches in Bayesian neural networks as well as techniques that include Gaussian processes and ensemble methods even when the training data size is relatively small. To evaluate the performance of this approach, we consider standard uncertainty quantification tasks for flow in heterogeneous media using limited training data consisting of permeability realizations and the corresponding velocity and pressure fields. The performance of the surrogate model developed is very good even though there is no underlying structure shared between the input (permeability) and output (flow/pressure) fields as is often the case in the image-to-image regression models used in computer vision problems. Studies are performed with an underlying stochastic input dimensionality up to 4225 where most other uncertainty quantification methods fail. Uncertainty propagation tasks are considered and the predictive output Bayesian statistics are compared to those obtained with Monte Carlo estimates.

MSC:
62F15 Bayesian inference
35Q35 PDEs in connection with fluid mechanics
76S05 Flows in porous media; filtration; seepage
68T05 Learning and adaptive systems in artificial intelligence
PDF BibTeX Cite
Full Text: DOI
References:
[1] Bilionis, I.; Zabaras, N., Bayesian uncertainty propagation using Gaussian processes, (Handbook of Uncertainty Quantification, (2016)), 1-45
[2] Bilionis, I.; Zabaras, N.; Konomi, B. A.; Lin, G., Multi-output separable gaussian process: towards an efficient, fully Bayesian paradigm for uncertainty quantification, J. Comput. Phys., 241, 212-239, (2013) · Zbl 1349.76760
[3] Bilionis, I.; Zabaras, N., Multi-output local Gaussian process regression: applications to uncertainty quantification, J. Comput. Phys., 231, 17, 5718-5746, (2012) · Zbl 1277.60066
[4] Xiu, D.; Karniadakis, G. E., The Wiener-Askey polynomial chaos for stochastic differential equations, SIAM J. Sci. Comput., 24, 2, 619-644, (2002) · Zbl 1014.65004
[5] Torquato, S., Random heterogeneous materials: microstructure and macroscopic properties, vol. 16, (2013), Springer Science & Business Media
[6] van der Maaten, L. J.P.; Postma, E. O.; van den Herik, H. J., Dimensionality reduction: A comparative review, (2009), Tilburg University, Tech. rep.
[7] Maaten, L.v.d.; Hinton, G., Visualizing data using t-sne, J. Mach. Learn. Res., 9, 2579-2605, (Nov 2008)
[8] Hinton, G. E.; Salakhutdinov, R. R., Reducing the dimensionality of data with neural networks, Science, 313, 5786, 504-507, (2006) · Zbl 1226.68083
[9] Kingma, D. P.; Welling, M., Auto-encoding variational Bayes, CoRR
[10] Lawrence, N. D., Gaussian process latent variable models for visualisation of high dimensional data, (Advances in Neural Information Processing Systems, (2004)), 329-336
[11] Grigo, C.; Koutsourelakis, P.-S., Bayesian model and dimension reduction for uncertainty propagation: applications in random media, arXiv preprint · Zbl 1422.62377
[12] Bengio, Y., Learning deep architectures for AI, Found. Trends Mach. Learn., 2, 1, 1-127, (2009) · Zbl 1192.68503
[13] Krizhevsky, A.; Sutskever, I.; Hinton, G. E., Imagenet classification with deep convolutional neural networks, (Advances in Neural Information Processing Systems, (2012)), 1097-1105
[14] Zeiler, M. D.; Fergus, R., Visualizing and understanding convolutional networks, (European Conference on Computer Vision, (2014), Springer), 818-833
[15] LeCun, Y.; Bengio, Y.; Hinton, G., Deep learning, Nature, 521, 7553, 436-444, (2015)
[16] Hinton, G. E.; Osindero, S.; Teh, Y.-W., A fast learning algorithm for deep belief nets, Neural Comput., 18, 7, 1527-1554, (2006) · Zbl 1106.68094
[17] Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O., Understanding deep learning requires rethinking generalization, CoRR
[18] Dziugaite, G. K.; Roy, D. M., Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data, arXiv preprint
[19] Arora, S.; Ge, R.; Neyshabur, B.; Zhang, Y., Stronger generalization bounds for deep nets via a compression approach, arXiv preprint
[20] Raghu, M.; Poole, B.; Kleinberg, J.; Ganguli, S.; Sohl-Dickstein, J., On the expressive power of deep neural networks, arXiv preprint
[21] Kutz, J. N., Deep learning in fluid dynamics, J. Fluid Mech., 814, 1-4, (2017) · Zbl 1383.76380
[22] Marçais, J.; de Dreuzy, J.-R., Prospective interest of deep learning for hydrological inference, Groundwater, 55, 5, 688-692, (2017)
[23] Chan, S.; Elsheikh, A. H., A machine learning approach for efficient uncertainty quantification using multiscale methods, J. Comput. Phys., 354, Supplement C, 493-511, (2018) · Zbl 1380.65331
[24] Min, S.; Lee, B.; Yoon, S., Deep learning in bioinformatics, Brief. Bioinform., 18, 5, 851-869, (2017)
[25] Baldi, P.; Sadowski, P.; Whiteson, D., Searching for exotic particles in high-energy physics with deep learning, Nat. Commun., 5, 4308, (2014)
[26] MacKay, D. J., A practical Bayesian framework for backpropagation networks, Neural Comput., 4, 3, 448-472, (1992)
[27] Neal, R. M., Bayesian learning for neural networks, vol. 118, (2012), Springer Science & Business Media
[28] Gal, Y.; Ghahramani, Z., Dropout as a Bayesian approximation: representing model uncertainty in deep learning, (International Conference on Machine Learning, (2016)), 1050-1059
[29] Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D., Weight uncertainty in neural networks, arXiv preprint
[30] Liu, Q.; Wang, D., Stein variational gradient descent: a general purpose Bayesian inference algorithm, (Advances In Neural Information Processing Systems, (2016)), 2378-2386
[31] Kingma, D. P.; Salimans, T.; Welling, M., Variational dropout and the local reparameterization trick, (Advances in Neural Information Processing Systems, (2015)), 2575-2583
[32] Hernández-Lobato, J. M.; Adams, R., Probabilistic backpropagation for scalable learning of Bayesian neural networks, (International Conference on Machine Learning, (2015)), 1861-1869
[33] Louizos, C.; Welling, M., Multiplicative normalizing flows for variational Bayesian neural networks, arXiv preprint
[34] Louizos, C.; Ullrich, K.; Welling, M., Bayesian compression for deep learning, arXiv preprint
[35] Huang, G.; Liu, Z.; Weinberger, K. Q.; van der Maaten, L., Densely connected convolutional networks, arXiv preprint
[36] Simonyan, K.; Zisserman, A., Very deep convolutional networks for large-scale image recognition, arXiv preprint
[37] Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z., Rethinking the inception architecture for computer vision, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016)), 2818-2826
[38] He, K.; Zhang, X.; Ren, S.; Sun, J., Deep residual learning for image recognition, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016)), 770-778
[39] Ronneberger, O.; Fischer, P.; Brox, T., U-net: convolutional networks for biomedical image segmentation, (International Conference on Medical Image Computing and Computer-Assisted Intervention, (2015), Springer), 234-241
[40] Eigen, D.; Puhrsch, C.; Fergus, R., Depth map prediction from a single image using a multi-scale deep network, (Advances in Neural Information Processing Systems, (2014)), 2366-2374
[41] Isola, P.; Zhu, J.; Zhou, T.; Efros, A. A., Image-to-image translation with conditional adversarial networks, CoRR
[42] Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y., Generative adversarial nets, (Advances in Neural Information Processing Systems, (2014)), 2672-2680
[43] van den Oord, A.; Kalchbrenner, N.; Espeholt, L.; Vinyals, O.; Graves, A., Conditional image generation with pixelcnn decoders, (Advances in Neural Information Processing Systems, (2016)), 4790-4798
[44] Oord, A.v.d.; Kalchbrenner, N.; Kavukcuoglu, K., Pixel recurrent neural networks, arXiv preprint
[45] Srivastava, R. K.; Greff, K.; Schmidhuber, J., Training very deep networks, (Advances in Neural Information Processing Systems, (2015)), 2377-2385
[46] Ioffe, S.; Szegedy, C., Batch normalization: accelerating deep network training by reducing internal covariate shift, (International Conference on Machine Learning, (2015)), 448-456
[47] Glorot, X.; Bordes, A.; Bengio, Y., Deep sparse rectifier neural networks, (Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, (2011)), 315-323
[48] Theano: a python framework for fast computation of mathematical expressions
[49] Long, J.; Shelhamer, E.; Darrell, T., Fully convolutional networks for semantic segmentation, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2015)), 3431-3440
[50] Badrinarayanan, V.; Kendall, A.; Cipolla, R., Segnet: a deep convolutional encoder-decoder architecture for image segmentation, arXiv preprint
[51] Jégou, S.; Drozdzal, M.; Vazquez, D.; Romero, A.; Bengio, Y., The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation, (2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW, (2017), IEEE), 1175-1183
[52] Han, S.; Mao, H.; Dally, W. J., Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding, arXiv preprint
[53] Lee, A.; Caron, F.; Doucet, A.; Holmes, C., A hierarchical Bayesian framework for constructing sparsity-inducing priors, arXiv preprint
[54] Gramacy, R. B.; Lee, H. K., Cases for the nugget in modeling computer experiments, Stat. Comput., 22, 3, 713-722, (2012) · Zbl 1252.62098
[55] Nix, D. A.; Weigend, A. S., Estimating the mean and variance of the target probability distribution, (1994 IEEE International Conference on Neural Networks, IEEE World Congress on Computational Intelligence, vol. 1, (1994), IEEE), 55-60
[56] Le, Q. V.; Smola, A. J.; Canu, S., Heteroscedastic Gaussian process regression, (Proceedings of the 22nd International Conference on Machine Learning, (2005), ACM), 489-496
[57] Kendall, A.; Gal, Y., What uncertainties do we need in Bayesian deep learning for computer vision?, arXiv preprint
[58] Blei, D. M.; Kucukelbir, A.; McAuliffe, J. D., Variational inference: a review for statisticians, J. Am. Stat. Assoc., 112, 518, 859-877, (2017)
[59] Liu, Q., Stein variational gradient descent as gradient flow, (Advances in Neural Information Processing Systems, (2017)), 3117-3125
[60] Liu, Q.; Lee, J.; Jordan, M., A kernelized Stein discrepancy for goodness-of-fit tests, (International Conference on Machine Learning, (2016)), 276-284
[61] Rasmussen, C. E.; Williams, C. K., Gaussian processes for machine learning, vol. 1, (2006), MIT press Cambridge
[62] Alnæs, M. S.; Blechta, J.; Hake, J.; Johansson, A.; Kehlet, B.; Logg, A.; Richardson, C.; Ring, J.; Rognes, M. E.; Wells, G. N., The fenics project version 1.5, Arch. Numer. Softw., 3, 100, (2015)
[63] Lakshminarayanan, B.; Pritzel, A.; Blundell, C., Simple and scalable predictive uncertainty estimation using deep ensembles, arXiv preprint
[64] Li, L.; Jamieson, K.; DeSalvo, G.; Rostamizadeh, A.; Talwalkar, A., Hyperband: a novel bandit-based approach to hyperparameter optimization, arXiv preprint · Zbl 06982941
[65] Kingma, D.; Ba, J., Adam: a method for stochastic optimization, arXiv preprint
[66] Goyal, P.; Dollár, P.; Girshick, R. B.; Noordhuis, P.; Wesolowski, L.; Kyrola, A.; Tulloch, A.; Jia, Y.; He, K., Accurate, large minibatch SGD: training imagenet in 1 hour, CoRR
[67] Quinonero Candela, J.; Rasmussen, C.; Sinz, F.; Bousquet, O.; Schölkopf, B., Evaluating predictive uncertainty challenge, (Machine Learning Challenges: Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Textual Entailment, Max-Planck-Gesellschaft, (2006), Springer Berlin, Germany), 1-27
[68] Poggio, T.; Kawaguchi, K.; Liao, Q.; Miranda, B.; Rosasco, L.; Boix, X.; Hidary, J.; Mhaskar, H., Theory of deep learning III: explaining the non-overfitting puzzle
[69] de Oliveira, L.; Paganini, M.; Nachman, B., Learning particle physics by example: location-aware generative adversarial networks for physics synthesis, arXiv preprint
[70] Luo, W.; Li, Y.; Urtasun, R.; Zemel, R., Understanding the effective receptive field in deep convolutional neural networks, (Advances in Neural Information Processing Systems, (2016)), 4898-4906
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.