×

zbMATH — the first resource for mathematics

Propositionalization and embeddings: two sides of the same coin. (English) Zbl 07255754
Summary: Data preprocessing is an important component of machine learning pipelines, which requires ample time and resources. An integral part of preprocessing is data transformation into the format required by a given learning algorithm. This paper outlines some of the modern data processing techniques used in relational learning that enable data fusion from different input data types and formats into a single table data representation, focusing on the propositionalization and embedding data transformation approaches. While both approaches aim at transforming data into tabular data format, they use different terminology and task definitions, are perceived to address different goals, and are used in different contexts. This paper contributes a unifying framework that allows for improved understanding of these two data transformation techniques by presenting their unified definitions, and by explaining the similarities and differences between the two approaches as variants of a unified complex data transformation task. In addition to the unifying framework, the novelty of this paper is a unifying methodology combining propositionalization and embeddings, which benefits from the advantages of both in solving complex data transformation and learning tasks. We present two efficient implementations of the unifying methodology: an instance-based PropDRM approach, and a feature-based PropStar approach to data transformation and learning, together with their empirical evaluation on several relational problems. The results show that the new algorithms can outperform existing relational learners and can solve much larger problems.
MSC:
68T05 Learning and adaptive systems in artificial intelligence
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Ahmed, CF; Lachiche, N.; Charnay, C.; Jelali, SE; Braud, A., Flexible propositionalization of continuous attributes in relational data mining, Expert Systems with Applications, 42, 21, 7698-7709 (2015)
[2] Benavoli, A.; Corani, G.; Demšar, J.; Zaffalon, M., Time for a change: A tutorial for comparing multiple classifiers through Bayesian analysis, Journal of Machine Learning Research, 18, 1, 2653-2688 (2017)
[3] Bennett, KP; Buja, A.; Freund, WSY; Schapire, RE; Friedman, J.; Hastie, T.; Tibshirani, R.; Bickel, PJ; Ritov, Y.; Bühlmann, P.; Yu, B., Responses to [52], Journal of Machine Learning Research, 9, 157-194 (2008)
[4] Blei, DM; Ng, AY; Jordan, MI, Latent Dirichlet allocation, Journal of Machine Learning Research, 3, Jan, 993-1022 (2003) · Zbl 1112.68379
[5] Blockeel, H., Raedt, L. D., & Ramon, J. (1998). Top-down induction of clustering trees. In Proceedings of the 15th international conference on machine learning, pp. 55-63. Morgan Kaufmann.
[6] Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. Advances in Neural Information Processing Systems, pp. 2787-2795.
[7] Bordes, A.; Glorot, X.; Weston, J.; Bengio, Y., A semantic matching energy function for learning with multi-relational data, Machine Learning, 94, 2, 233-259 (2014) · Zbl 1319.68169
[8] Breiman, L., Bagging predictors, Machine Learning, 24, 2, 123-140 (1996) · Zbl 0858.68080
[9] Breiman, L., Random forests, Machine Learning, 45, 1, 5-32 (2001) · Zbl 1007.68152
[10] Breiman, L.; Friedman, JH; Olshen, R.; Stone, C., Classification and regression trees (1984), Pacific Grove, CA: Wadsworth & Brooks, Pacific Grove, CA
[11] Chang, S., Han, W., Tang, J., Qi, G. J., Aggarwal, C. C., & Huang, T. S. (2015). Heterogeneous network embedding via deep architectures. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 119-128. ACM.
[12] Charnay, C., Lachiche, N., & Braud, A. (2015). CARAF: Complex aggregates within random forests. In Inductive logic programming—25th international conference, ILP 2015, Kyoto, Japan, August 20-22, 2015, Revised Selected Papers, pp. 15-29. Springer.
[13] Clark, P.; Niblett, T., The CN2 induction algorithm, Machine Learning, 3, 4, 261-283 (1989)
[14] Clevert, D. A., Unterthiner, T., & Hochreiter, S. (2016). Fast and accurate deep network learning by exponential linear units (ELUs). In International conference on representation learning, ICLR. arXiv:1511.07289.
[15] Corani, G.; Benavoli, A.; Demšar, J.; Mangili, F.; Zaffalon, M., Statistical comparison of classifiers through Bayesian hierarchical modelling, Machine Learning, 106, 11, 1817-1837 (2017) · Zbl 1440.62241
[16] Cortes, C.; Vapnik, V., Support-vector networks, Machine Learning, 20, 3, 273-297 (1995) · Zbl 0831.68098
[17] Cumby, C. M., & Roth, D. (2003). On kernel methods for relational learning. In Proceedings of the 20th international conference on machine learning (ICML-03), pp. 107-114.
[18] Dash, T., Srinivasan, A., Vig, L., Orhobor, O. I., & King, R. D. (2018). Large-scale assessment of deep relational machines. In Proceedings of the international conference on inductive logic programming, pp. 22-37. Springer, Berlin.
[19] Dash, T.; Srinivasan, A.; Joshi, RS; Baskar, A.; Tetko, IV; Kůrková, V.; Karpov, P.; Theis, F., Discrete stochastic search and its application to feature-selection for deep relational machines, Artificial neural networks and machine learning: ICANN 2019-deep Learning, 29-45 (2019), Berlin: Springer, Berlin
[20] De Raedt, L., Logical and relational learning (2008), Berlin: Springer, Berlin · Zbl 1203.68145
[21] Debnath, AK; Lopez de Compadre, RL; Debnath, G.; Shusterman, AJ; Hansch, C., Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity, Journal of Medicinal Chemistry, 34, 2, 786-797 (1991)
[22] Demeester, T., Rocktäschel, T., & Riedel, S. (2016). Lifted rule injection for relation embeddings. In Proceedings of the 2016 conference on empirical methods in natural language processing, pp. 1389-1399.
[23] Dumančić, S., Guns, T., Meert, W., & Blockleel, H. (2018). Auto-encoding logic programs. In Proceedings of the international conference on machine learning, Stockholm, Sweden.
[24] Džeroski, S., & Lavrač, N. (Eds.). (2001). Relational data mining. Berlin: Springer. · Zbl 1003.68039
[25] Flach, P., & Lachiche, N. (1999). 1BC: A first-order Bayesian classifier. In International conference on inductive logic programming, pp. 92-103. Berlin: Springer.
[26] Flach, P.; Lachiche, N., Confirmation-guided discovery of first-order rules with Tertius, Machine Learning, 42, 1-2, 61-95 (2001) · Zbl 0970.68127
[27] Freund, Y.; Schapire, RE, A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 55, 1, 119-139 (1997) · Zbl 0880.68103
[28] Friedman, JH; Fisher, NI, Bump hunting in high-dimensional data, Statistics and Computing, 9, 2, 123-143 (1999)
[29] Gärdenfors, P., Conceptual spaces: The geometry of thought (2000), Cambridge, MA: MIT Press, Cambridge, MA
[30] Goodfellow, I.; Bengio, Y.; Courville, A., Deep learning (2016), Cambridge: MIT Press, Cambridge · Zbl 1373.68009
[31] Grčar, M.; Trdin, N.; Lavrač, N., A methodology for mining document-enriched heterogeneous information networks, The Computer Journal, 56, 3, 321-335 (2013)
[32] Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 855-864.
[33] Guo, S., Wang, Q., Wang, L., Wang, B., & Guo, L. (2016). Jointly embedding knowledge graphs and logical rules. In Proceedings of the 2016 conference on empirical methods in natural language processing, pp. 192-202.
[34] Haussler, D. (1999). Convolution kernels on discrete structures. Tech. rep., Department of Computer Science, University of California.
[35] He, S., Liu, K., Ji, G., & Zhao, J. (2015). Learning to represent knowledge graphs with Gaussian embedding. In Proceedings of the 24th ACM international on conference on information and knowledge management, pp. 623-632. ACM.
[36] Kralj, J.; Robnik-Šikonja, M.; Lavrač, N., HINMINE: Heterogeneous information network mining with information retrieval heuristics, Journal of Intelligent Information Systems, 50, 1, 29-61 (2018)
[37] Kramer, S.; Lavrač, N.; Flach, P.; Džeroski, S.; Lavrač, N., Propositionalization approaches to relational data mining, Relational data mining, 262-291 (2001), Berlin: Springer, Berlin
[38] Krogel, M. A., & Wrobel, S. (2001). Transformation-based learning using multirelational aggregation. In Proceedings of international conference on inductive logic programming, pp. 142-155. Berlin: Springer. · Zbl 1006.68519
[39] Krogel, MA; Rawles, S.; Železný, F.; Flach, P.; Lavrač, N.; Wrobel, S.; Horvath, T.; Yamamoto, A., Comparative evaluation of approaches to propositionalization, Proceedings of the 13th international conference on inductive logic programming (ILP-2003, 197-214 (2003), Berlin: Springer, Berlin
[40] Kuželka, O., & Železný, F. (2008). HiFi: Tractable propositionalization through hierarchical feature construction. In Železný, F., Lavrač, N. (Eds.) Late breaking papers, the 18th international conference on inductive logic programming, pp. 69-74.
[41] Kuželka, O.; Železný, F., Block-wise construction of tree-like relational features with monotone reducibility and redundancy, Machine Learning, 83, 2, 163-192 (2011) · Zbl 1237.68151
[42] Lachiche, N., & Flach, P. A. (2003). 1BC2: A true first-order Bayesian classifier. Proceedings of inductive logic programming, pp. 133-148. · Zbl 1017.68522
[43] Lavrač, N., Džeroski, S., & Grobelnik, M. (1991). Learning nonrecursive definitions of relations with LINUS. In Proceedings of the 5th European working session on learning (EWSL-91), pp. 265-281. Springer, Porto, Portugal.
[44] Lavrač, N., Kralj Novak, P., Mozetič, I., Podpečan, V., Motaln, H., Petek, M., & Gruden, K. (2009). Semantic subgroup discovery: Using ontologies in microarray data analysis. In Proceedings of the 31st annual international conference of the IEEE EMBS, pp. 5613-5616.
[45] Lavrač, N.; Džeroski, S., Inductive logic programming: Techniques and applications (1994), New York: Ellis Horwood, New York · Zbl 0830.68027
[46] Lavrač, N.; Flach, P., An extended transformation approach to inductive logic programming, ACM Transactions on Computational Logic, 2, 4, 458-494 (2001) · Zbl 1365.68374
[47] Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In Proceedings of international conference on machine learning, pp. 1188-1196.
[48] Lewis, D. D. (1992). An evaluation of phrasal and clustered representations on a text categorization task. In Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval, pp. 37-50 .
[49] Lodhi, H. (2013). Deep relational machines. In Proceedings of the international conference on neural information processing, pp. 212-219. Berlin: Springer.
[50] Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (Eds.) Advances in neural information processing systems, pp. 4765-4774.
[51] McInnes, L.; Healy, J.; Saul, N.; Grossberger, L., UMAP: Uniform manifold approximation and projection, The Journal of Open Source Software, 3, 29, 861 (2018)
[52] Mease, D.; Wyner, A., Evidence contrary to the statistical view of boosting, Journal of Machine Learning Research, 9, 131-156 (2008)
[53] Michalski, R. S., Mozetič, I., Hong, J., & Lavrač, N. (1986). The multi-purpose incremental learning system AQ15 and its testing application on three medical domains. In Proceedings of the 5th national conference on artificial intelligence, pp. 1041-1045. Philadelphia, PA.
[54] Michie, D., Muggleton, S., Page, D., & Srinivasan, A. (1994). To the international computing community: A new East-West challenge. Tech. rep., Oxford University Computing laboratory, Oxford, UK.
[55] Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, GS; Dean, J.; Burges, CJC; Bottou, L.; Welling, M.; Ghahramani, Z.; Weinberger, KQ, Distributed representations of words and phrases and their compositionality, Advances in neural information processing systems 26, 3111-3119 (2013), New York, USA: Curran Associates Inc, New York, USA
[56] Motl, J., & Schulte, O. (2015). The CTU Prague relational learning repository. arXiv:1511.03086.
[57] Muggleton, S. H. (Ed.). (1992). Inductive logic programming. London: Academic Press Ltd. · Zbl 0838.68093
[58] Muggleton, S., Inverse entailment and Progol, New Generation Computing, 13, 3-4, 245-286 (1995)
[59] Nickel, M., & Kiela, D. (2017). Poincaré embeddings for learning hierarchical representations. In Advances in neural information processing systems, pp. 6338-6347.
[60] Nickel, M.; Tresp, V.; Kriegel, HP, A three-way model for collective learning on multi-relational data, Proceedings of International Conference on Machine Learning, 11, 809-816 (2011)
[61] Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V., Scikit-learn: Machine learning in python, Journal of Machine Learning Research, 12, Oct, 2825-2830 (2011) · Zbl 1280.68189
[62] Perovšek, M., Vavpetič, A., Cestnik, B., & Lavrač, N. (2013). A wordification approach to relational data mining. In Proceedings of the international conference on discovery science, pp. 141-154. Berlin: Springer.
[63] Perovšek, M.; Vavpetič, A.; Kranjc, J.; Cestnik, B.; Lavrač, N., Wordification: Propositionalization by unfolding relational data into bags of words, Expert Systems with Applications, 42, 17-18, 6442-6456 (2015)
[64] Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 701-710. ACM.
[65] Plantié, M.; Crampes, M.; Ramzan, N.; Zwol, R.; Lee, JS; Clüver, K.; Hua, XS, Survey on social community detection, Social media retrieval, 65-85 (2013), London: Springer, London
[66] Podpečan, V.; Lavrač, N.; Mozetič, I.; Kralj Novak, P.; Trajkovski, I.; Langohr, L.; Kulovesi, K.; Toivonen, H.; Petek, M.; Motaln, H.; Gruden, K., SegMine workflows for semantic microarray data analysis in Orange4WS, BMC Bioinformatics, 12, 416 (2011)
[67] Qiu, J., Dong, Y., Ma, H., Li, J., Wang, K., & Tang, J. (2018). Network embedding as matrix factorization: Unifying DeepWalk, LINE, PTE, and Node2Vec. In Proceedings of the eleventh ACM international conference on web search and data mining, WSDM ’18, pp. 459-467. ACM.
[68] Quinlan, JR, Induction of decision trees, Machine Learning, 1, 1, 81-106 (1986)
[69] Ribeiro, L. F., Saverese, P. H., & Figueiredo, D. R. (2017). Struc2vec: Learning node representations from structural identity. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’17, pp. 385-394. New York: ACM.
[70] Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135-1144. ACM.
[71] Ristoski, P.; Paulheim, H.; Groth, P.; Simperl, E.; Gray, A.; Sabou, M.; Krötzsch, M.; Lecue, F.; Flöck, F.; Gil, Y., Rdf2vec: Rdf graph embeddings for data mining, The semantic web: ISWC 2016, 498-514 (2016), Cham: Springer, Cham
[72] Robnik-Šikonja, M.; Kononenko, I., Explaining classifications for individual instances, IEEE Transactions on Knowledge and Data Engineering, 20, 5, 589-600 (2008)
[73] Rocktäschel, T., Singh, S., & Riedel, S. (2015). Injecting logical background knowledge into embeddings for relation extraction. In Proceedings of the 2015 conference of the north American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1119-1129.
[74] Rumelhart, D. E., & McClelland, J. L. (Eds.) (1986). Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1: Foundations. MIT Press, Cambridge, MA.
[75] Rumelhart, DE; Hinton, GE; Williams, RJ, Learning representations by back-propagating errors, Nature, 323, 6088, 533 (1986) · Zbl 1369.68284
[76] Schapire, RE; Freund, Y.; Bartlett, P.; Lee, WS, Boosting the margin: A new explanation for the effectiveness of voting methods, The Annals of Statistics, 26, 5, 1651-1686 (1998) · Zbl 0929.62069
[77] Schölkopf, B.; Smola, AJ, Learning with kernels: Support vector machines, regularization, optimization, and beyond (2001), Cambridge: The MIT Press, Cambridge
[78] Škrlj, B., Kralj, J., Konc, J., Robnik-Šikonja, M., & Lavrač, N. (2019). Deep node ranking: An algorithm for structural network embedding and end-to-end classification. arXiv:1902.03964.
[79] Srinivasan, A. (2007). Aleph manual. http://www.cs.ox.ac.uk/activities/machinelearning/Aleph/.
[80] Srinivasan, A., King, R. D., Muggleton, S., & Sternberg, M. J. (1997). Carcinogenesis predictions using ILP. In Proceedings of the international conference on inductive logic programming, pp. 273-287. Berlin: Springer.
[81] Srinivasan, A.; Vig, L.; Bain, M., Logical explanations for deep relational machines using relevance information, Journal of Machine Learning Research, 20, 130, 1-47 (2019) · Zbl 1434.68559
[82] Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R., Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, 15, 1, 1929-1958 (2014) · Zbl 1318.68153
[83] Štrumbelj, E.; Kononenko, I., Explaining prediction models and individual predictions with feature contributions, Knowledge and Information Systems, 41, 3, 647-665 (2014)
[84] Tang, J., Qu, M., & Mei, Q. (2015a). PTE: Predictive text embedding through large-scale heterogeneous text networks. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1165-1174. ACM.
[85] Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., & Mei, Q. (2015b). LINE: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web, pp. 1067-1077.
[86] Van Der Walt, S.; Colbert, SC; Varoquaux, G., The NumPy array: A structure for efficient numerical computation, Computing in Science & Engineering, 13, 2, 22 (2011)
[87] Vapnik, V., The nature of statististical learning theory (1995), New York: Springer, New York
[88] Vavpetič, A., & Lavrač, N. (2011). Semantic data mining system g-SEGS. In Proceedings of the workshop on planning to learn and service-oriented knowledge discovery (PlanSoKD-11), ECML PKDD conference, pp. 17-29.
[89] Wang, Q., Wang, B., & Guo, L. (2015). Knowledge base completion using embeddings and rules. In Proceedings of the 24th international joint conference on artificial intelligence, pp. 1859-1865.
[90] Wang, Z., Zhang, J., Feng, J., & Chen, Z. (2014). Knowledge graph and text jointly embedding. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1591-1601.
[91] Wang, Q.; Mao, Z.; Wang, B.; Guo, L., Knowledge graph embedding: A survey of approaches and applications, IEEE Transactions on Knowledge and Data Engineering, 29, 12, 2724-2743 (2017)
[92] Wu, L. Y., Fisch, A., Chopra, S., Adams, K., Bordes, A., & Weston, J. (2018). Starspace: Embed all the things! In Proceedings of the 32nd AAAI conference on artificial intelligence, pp. 5569-5577.
[93] Železný, F.; Lavrač, N., Propositionalization-based relational subgroup discovery with RSD, Machine Learning, 62, 33-63 (2006)
[94] Zhu, S., Bing, J., Min, X., Lin, C., & Zeng, X. (2018). Prediction of drug-gene interaction by using metapath2vec. Frontiers in Genetics, 9.
[95] Žitnik, M.; Leskovec, J., Predicting multicellular function through multi-layer tissue networks, Bioinformatics, 33, 14, i190-i198 (2017)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.