×

Symbolic DNN-tuner. (English) Zbl 07510323

Summary: Hyper-Parameter Optimization (HPO) occupies a fundamental role in Deep Learning systems due to the number of hyper-parameters (HPs) to be set. The state-of-the-art of HPO methods are Grid Search, Random Search and Bayesian Optimization. The first two methods try all possible combinations and random combination of the HPs values, respectively. This is performed in a blind manner, without any information for choosing the new set of HPs values. Bayesian Optimization (BO), instead, keeps track of past results and uses them to build a probabilistic model mapping HPs into a probability density of the objective function. Bayesian Optimization builds a surrogate probabilistic model of the objective function, finds the HPs values that perform best on the surrogate model and updates it with new results. In this paper, we improve BO applied to Deep Neural Network (DNN) by adding an analysis of the results of the network on training and validation sets. This analysis is performed by exploiting rule-based programming, and in particular by using Probabilistic Logic Programming. The resulting system, called Symbolic DNN-Tuner, logically evaluates the results obtained from the training and the validation phase and, by applying symbolic tuning rules, fixes the network architecture, and its HPs, therefore improving performance. We also show the effectiveness of the proposed approach, by an experimental evaluation on literature and real-life datasets.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Bender, G., Kindermans, P.J., Zoph, B., Vasudevan, V., Le, Q. (2018). Understanding and simplifying one-shot architecture search. In: International Conference on Machine Learning, pp. 550-559.
[2] Bergstra, J.; Bengio, Y., Random search for hyper-parameter optimization, Journal of Machine Learning Research, 13, 281-305 (2012) · Zbl 1283.68282
[3] Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B. (2011). Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems, pp. 2546-2554.
[4] Bertrand, H., Ardon, R., Perrot, M., Bloch, I. (2017). Hyperparameter optimization of deep neural networks: Combining hyperband with bayesian model selection. In: Conférence sur l’Apprentissage Automatique.
[5] Bishop, CM, Pattern recognition and machine learning (2006), Berlin: Springer, Berlin · Zbl 1107.68072
[6] Cai, H., Zhu, L., Han, S. (2018) Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332.
[7] Darwiche, A. (2011). Sdd: A new canonical representation of propositional knowledge bases. In: Twenty-Second International Joint Conference on Artificial Intelligence.
[8] De Raedt, L., Kimmig, A., Toivonen, H. (2007). Problog: A probabilistic prolog and its application in link discovery. In: IJCAI, vol. 7, pp. 2462-2467.
[9] Dempster, AP; Laird, NM; Rubin, DB, Maximum likelihood from incomplete data via the em algorithm, Journal of the Royal Statistical Society: Series B, Methodological),39(1, 1-22 (1977) · Zbl 0364.62022
[10] Dewancker, I., McCourt, M., Clark, S. (2015). Bayesian optimization primer.
[11] Dries, A., Kimmig, A., Meert, W., Renkens, J., Van den Broeck, G., Vlasselaer, J., De Raedt, L. (2015) Problog2: Probabilistic logic programming. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 312-315. Springer
[12] Elsken, T., Metzen, J.H., Hutter, F. (2018). Neural architecture search: A survey. arXiv preprint arXiv:1808.05377. · Zbl 1485.68229
[13] Fadja, A.N., Riguzzi, F. (2017). Probabilistic logic programming in action. In: Towards Integrative Machine Learning and Knowledge Extraction, pp. 89-116. Springer
[14] Fierens, D.; Van den Broeck, G.; Renkens, J.; Shterionov, D.; Gutmann, B.; Thon, I.; Janssens, G.; De Raedt, L., Inference and learning in probabilistic logic programs using weighted boolean formulas, Theory and Practice of Logic Programming, 15, 3, 358-401 (2015) · Zbl 1379.68062
[15] Fraccaroli, M., Lamma, E., & Riguzzi, F. (2021). Symbolic dnn-tuner, a python and problog-based system for optimizing deep neural networks hyperparameters. SoftwareX. Under submission.
[16] Frazier, P.I. (2018) A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811.
[17] Gorlin, A.; Ramakrishnan, C.; Smolka, SA, Model checking with probabilistic tabled logic programming, Theory and Practice of Logic Programming, 12, 4-5, 681-700 (2012) · Zbl 1260.68062
[18] Guo, Z., Zhang, X., Mu, H., Heng, W., Liu, Z., Wei, Y., Sun, J. (2019). Single path one-shot neural architecture search with uniform sampling. arXiv preprint arXiv:1904.00420.
[19] Gutmann, B., Thon, I., De Raedt, L. (2011). Learning the parameters of probabilistic logic programs from interpretations. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 581-596. Springer.
[20] Ioffe, S., Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRRhttp://arxiv.org/abs/1502.03167.
[21] Jalali, A., Azimi, J., Fern, X.Z. (2012). Exploration vs exploitation in bayesian optimization. CoRRhttp://arxiv.org/abs/1204.0047
[22] Jin, H., Song, Q., Hu, X. (2019). Auto-keras: An efficient neural architecture search system. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1946-1956.
[23] Jones, DR; Schonlau, M.; Welch, WJ, Efficient global optimization of expensive black-box functions, Journal of Global, optimization,13(4), 455-492 (1998) · Zbl 0917.90270
[24] Korichi, Guillemot, M., Heusèle, C. (2019). Rodolphe: Tuning neural network hyperparameters through bayesian optimization and application to cosmetic formulation data. In: ORASIS 2019.
[25] Krizhevsky, A., Nair, V., Hinton, G.: Cifar-10 (canadian institute for advanced research) http://www.cs.toronto.edu/ kriz/cifar.html
[26] Krizhevsky, A., Nair, V., Hinton, G. (2009). Cifar-100 (canadian institute for advanced research) http://www.cs.toronto.edu/ kriz/cifar.html
[27] van Laarhoven, T. (2017). L2 regularization versus batch and weight normalization. CoRRhttp://arxiv.org/abs/1706.05350.
[28] Liu, H., Simonyan, K., Yang, Y. (2018). Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055.
[29] Meert, W., Struyf, J., Blockeel, H. (2009). Cp-logic theory inference with contextual variable elimination and comparison to bdd based inference methods. In: International Conference on Inductive Logic Programming, pp. 96-109. Springer. · Zbl 1286.68421
[30] Montavon, G.; Orr, G.; Múller, KR, Neural networks: tricks of the trade (2012), Berlin: Springer, Berlin
[31] Mørk, S.; Holmes, I., Evaluating bacterial gene-finding hmm structures as probabilistic logic programs, Bioinformatics, 28, 5, 636-642 (2012)
[32] Ngn, A., Mourri, Y. B., & Katanforoosh, K. (2017). Improving deep neural networks: Hyperparameter tuning, regularization and optimization. https://www.coursera.org/learn/deep-neural-network.
[33] Ou, M.; Wei, H.; Zhang, Y.; Tan, J., A dynamic adam based deep neural network for fault diagnosis of oil-immersed power transformers, Energies, 12, 6, 995 (2019)
[34] Pham, H., Guan, M.Y., Zoph, B., Le, Q.V., Dean, J. (2018). Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268.
[35] Rasmussen, CE, Gaussian processes in machine learning, Summer school on machine learning, 63-71 (2003), Berlin: Springer, Berlin · Zbl 1120.68436
[36] Real, E., Aggarwal, A., Huang, Y., Le, Q.V. (2019). Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4780-4789.
[37] Real, E., Aggarwal, A., Huang, Y., Le, Q.V. (2019). Regularized evolution for image classifier architecture search. In: Proceedings of the AAAI Conference on Artificial Intelligence, (vol. 33), pp. 4780-4789.
[38] Riguzzi, F., Speeding up inference for probabilistic logic programs, The Computer Journal, 57, 3, 347-363 (2014)
[39] Riguzzi, F., Foundations of probabilistic logic programming (2018), Denmark: River Publishers, Denmark · Zbl 1420.68003
[40] Riguzzi, F., Lamma, E., Alberti, M., Bellodi, E., Zese, R., Cota, G., et al. (2016). Probabilistic logic programming for natural language processing. In: URANIA@ AI* IA, pp. 30-37. · Zbl 1347.68320
[41] Sato, T. (1995). A statistical learning method for logic programs with distribution semantics. In: In Proceedings of the 12th International Conference On Logic Programming ICLP’95. Citeseer.
[42] Sato, T., Kameya, Y. (1997). Prism: a language for symbolic-statistical modeling. In: IJCAI, (vol. 97), pp. 1330-1339.
[43] Sato, T.; Kubota, K., Viterbi training in prism, Theory and Practice of Logic Programming, 15, 2, 147-168 (2015) · Zbl 1379.68272
[44] Shorten, C.; Khoshgoftaar, TM, A survey on image data augmentation for deep learning, Journal of Big Data, 6, 1, 60 (2019)
[45] Snoek, J., Larochelle, H., Adams, R.P. (2012) Practical bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951-2959.
[46] Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R., Dropout: a simple way to prevent neural networks from overfitting., The Journal of Machine Learning, research,15(1), 1929-1958 (2014) · Zbl 1318.68153
[47] Yu, T., Zhu, H. (2020). Hyper-parameter optimization: A review of algorithms and applications. arXiv preprint arXiv:2003.05689.
[48] Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V. (2018). Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697-8710.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.