Set-to-sequence methods in machine learning: a review. (English) Zbl 07406484

Summary: Machine learning on sets towards sequential output is an important and ubiquitous task, with applications ranging from language modelling and meta-learning to multi-agent strategy games and power grid optimization. Combining elements of representation learning and structured prediction, its two primary challenges include obtaining a meaningful, permutation invariant set representation and subsequently utilizing this representation to output a complex target permutation. This paper provides a comprehensive introduction to the field as well as an overview of important machine learning methods tackling both of these key challenges, with a detailed qualitative comparison of selected model architectures.


68Txx Artificial intelligence
Full Text: DOI arXiv


[1] Achlioptas, P., Diamanti, O., Mitliagkas, I., & Guibas, L. (2018). Learning representations and generative models for 3D point clouds. In6th International Conference on Learning Representations, ICLR 2018 - Workshop Track Proceedings.
[2] Ai, Q., Bi, K., Guo, J., & Croft, W. B. (2018). Learning a deep listwise context model for ranking refinement. InThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 135-144.
[3] Altae-Tran, H., Ramsundar, B., Pappu, A. S., & Pande, V. (2017). Low data drug discovery with one-shot learning.ACS central science,3(4), 283-293.
[4] Arora, S. (1996). Polynomial time approximation schemes for euclidean tsp and other geometric problems. InProceedings of 37th Conference on Foundations of Computer Science, pp. 2-11. IEEE.
[5] Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization.arXiv preprint arXiv:1607.06450, ArXiv.
[6] Bahdanau, D., Brakel, P., Xu, K., Goyal, A., Lowe, R., Pineau, J., Courville, A., & Bengio, Y. (2017). An Actor-Critic Algorithm for Sequence Prediction. InICLR, pp. 1-17.
[7] Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate.CoRR,abs/1409.0473.
[8] Bello, I., Kulkarni, S., Jain, S., Boutilier, C., Chi, E., Eban, E., Luo, X., Mackey, A., & Meshi, O. (2018). Seq2slate: Re-ranking and slate optimization with rnns. InProceedings of the Workshop on Negative Dependence in Machine Learning at the 36th International Conference on Machine Learning,.
[9] Bello, I., Pham, H., Le, Q. V., Norouzi, M., & Bengio, S. (2016). Neural combinatorial optimization with reinforcement learning.arXiv preprint arXiv:1611.09940,ArXiv.
[10] Bengio, Y., Lodi, A., & Prouvost, A. (2020). Machine learning for combinatorial optimization: a methodological tour d’horizon.European Journal of Operational Research,1. · Zbl 1487.90541
[11] Blondel, M., Teboul, O., Berthet, Q., & Djolonga, J. (2020). Fast differentiable sorting and ranking. InProceedings of the 37th International Conference on Machine Learning, PMLR 119, 2020.
[12] Brown, N., & Mueller, C. (2017). Designing with data: Moving beyond the design space catalog.Disciplines and Disruption - Proceedings Catalog of the 37th Annual Conference of the Association for Computer Aided Design in Architecture, ACADIA 2017,1, 154-163.
[13] Brown, T., Mann, B., Ryder, N., & Subbiah, M. (2020). Language models are few-shot learners. ArXiv,abs/2005.14165.
[14] Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., & Li, H. (2007). Learning to rank: from pairwise approach to listwise approach. InProceedings of the 24th international conference on Machine learning, pp. 129-136.
[15] Carlson-Skalak, S., White, M. D., & Teng, Y. (1998). Using an evolutionary algorithm for catalog design.Research in Engineering Design,10, 63-83.
[16] Chaudhari, S., Polatkan, G., Ramanath, R., & Mithal, V. (2019). An attentive survey of attention models.arXiv preprint arXiv:1904.02874,ArXiv.
[17] Cheng, J., Dong, L., & Lapata, M. (2016). Long short-term memory-networks for machine reading. InProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 551-561, Austin, Texas. Association for Computational Linguistics.
[18] Cho, K., Courville, A., & Bengio, Y. (2015). Describing multimedia content using attention-based encoder-decoder networks.IEEE Transactions on Multimedia,17(11), 1875-1886.
[19] Cohen, W. W., Schapire, R. E., & Singer, Y. (1998). Learning to order things. InAdvances in Neural Information Processing Systems, pp. 451-457. · Zbl 0915.68031
[20] Cui, B., Li, Y., Chen, M., & Zhang, Z. (2018). Deep attentive sentence ordering network. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, pp. 4340-4349.
[21] Cui, G., Yu, W., Li, X., Zeng, Z., & Gu, B. (2019). Machine-Learning-Driven Matrix Ordering for Power Grid Analysis. InProceedings of the 2019 Design, Automation and Test in Europe Conference and Exhibition, DATE 2019, pp. 984-987. EDAA.
[22] Cuturi, M., Teboul, O., & Vert, J.-P. (2019).Differentiable ranking and sorting using optimal transport. InAdvances in Neural Information Processing Systems, pp. 6861-6871.
[23] Dai, H., Khalil, E. B., Zhang, Y., Dilkina, B., & Song, L. (2017). Learning combinatorial optimization algorithms over graphs.Advances in Neural Information Processing Systems,2017December(Nips), 6349-6359.
[24] Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q., & Salakhutdinov, R. (2019). Transformer-XL: Attentive language models beyond a fixed-length context. InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978-2988, Florence, Italy. Association for Computational Linguistics.
[25] De Cao, N., & Kipf, T. (2018). MolGAN: An implicit generative model for small molecular graphs. ICML Deep Generative Models Workshop,1.
[26] Dehghani, M., Gouws, S., Vinyals, O., Uszkoreit, J., & Kaiser, L. (2019). Universal transformers. In7th International Conference on Learning Representations, ICLR 2019, pp. 1-23.
[27] Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171-4186, Minneapolis, Minnesota. Association for Computational Linguistics.
[28] Diallo, A., Zopf, M., & F¨urnkranz, J. (2020). Permutation learning via lehmer codes.Frontiers in Artificial Intelligence and Applications,325, 1095-1102.
[29] Edwards, H., & Storkey, A. (2017). Towards a Neural Statistician. In5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings, pp. 1-13.
[30] Emami, P., & Ranka, S. (2018). Learning permutations with sinkhorn policy gradient.ArXiv, abs/1805.07010.
[31] Engilberge, M., Chevallier, L., P´erez, P., & Cord, M. (2019). Sodeep: A sorting deep net to learn ranking loss surrogates. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[32] Ge, L., Cai, Y., Weng, J., & Yuan, J. (2018). Hand pointnet: 3d hand pose estimation using point sets. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8417-8426.
[33] Graves, A. (2016).Adaptive Computation Time for Recurrent Neural Networks.Preprint arXiv:1603.08983,ArXiv.
[34] Graves, A., Wayne, G., & Danihelka, I. (2014). Neural turing machines.Preprint arXiv:1410.5401, ArXiv.
[35] Grinman, A. (2015). The hungarian algorithm for weighted bipartite graphs.Massachusetts Institute of Technology,MIT.
[36] Grover, A., Wang, E., Zweig, A., & Ermon, S. (2019). Stochastic optimization of sorting networks via continuous relaxations.Preprint arXiv:1903.08850,ArXiv.
[37] Hahn, M. (2020). Theoretical Limitations of Self-Attention in Neural Sequence Models.Transactions of the Association for Computational Linguistics,8.
[38] Halmos, P. R. (2017).Naive set theory. Courier Dover Publications. · Zbl 1361.03001
[39] Hamilton, W. L., Ying, Z., & Leskovec, J. (2017). Inductive representation learning on large graphs. InNIPS.
[40] Han, Y., Huang, G., Song, S., Yang, L., Wang, H., & Wang, Y. (2021). Dynamic neural networks: A survey.Preprint arXiv:2102.04906,ArXiv.
[41] Harmon, M., & Klabjan, D. (2019). Dynamic prediction length for time series with sequence to sequence networks.arXiv,1(2014), 1-23.
[42] Hayden, S., Zermelo, E., Fraenkel, A. A., & Kennison, J. F. (1968).Zermelo-Fraenkel set theory. CE Merrill. · Zbl 0179.01501
[43] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778.
[44] Hern´andez-Garc´ıa, A., K¨onig, P., & Kietzmann, T. (2019). Learning robust visual representations using data augmentation invariance.ArXiv,abs/1906.04547.
[45] Hu, D. (2019). An introductory survey on attention mechanisms in nlp problems. InProceedings of SAI Intelligent Systems Conference, pp. 432-448. Springer.
[46] Huang, G., Liu, S., Van der Maaten, L., & Weinberger, K. Q. (2018a). Condensenet: An efficient densenet using learned group convolutions. InProceedings of the IEEE conference on computer vision and pattern recognition, pp. 2752-2761.
[47] Huang, P. S., Wang, C., Singh, R., Yih, W. T., & He, X. (2018b). Natural language to structured query generation via meta-learning.NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference,2, 732-738.
[48] Ibrahim, A., R, A., & Ishaya, J. (2019). The Capacitated Vehicle Routing Problem.International Journal of Research - Granthaalayah,,3(2), 30-33.
[49] Ionescu, C., Vantzos, O., & Sminchisescu, C. (2015). Matrix backpropagation for deep networks with structured layers. InProceedings of the IEEE International Conference on Computer Vision, pp. 2965-2973.
[50] Iqbal, S., & Sha, F. (2019). Actor-attention-critic for multi-agent reinforcement learning. InInternational Conference on Machine Learning, pp. 2961-2970. PMLR.
[51] Ishaya, J., Ibrahim, A., & Lo, N. (2019). A comparative analysis of the travelling salesman problem: Exact and machine learning techniques.Open Journal of Discrete Applied mathematics,2(3), 23-37.
[52] Jain, S., & Wallace, B. C. (2019). Attention is not explanation.Preprint arXiv:1902.10186,ArXiv.
[53] Jaiswal, A., Wu, R. Y., Abd-Almageed, W., & Natarajan, P. (2018). Unsupervised adversarial invariance. InAdvances in Neural Information Processing Systems, pp. 5092-5102.
[54] Johnson, J., Hariharan, B., Van Der Maaten, L., Fei-Fei, L., Lawrence Zitnick, C., & Girshick, R. (2017).Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. InProceedings of the IEEE conference on computer vision and pattern recognition, pp. 2901-2910.
[55] Jung, I.-S., Berges, M., Garrett, J. H., & Poczos, B. (2015). Exploration and evaluation of ar, mpca and kl anomaly detection techniques to embankment dam piezometer data.Adv. Eng. Inform., 29(4), 902-917.
[56] Karch, T., Colas, C., Teodorescu, L., Moulin-Frier, C., & Oudeyer, P.-Y. (2020). Deep sets for generalization in rl.Preprint arXiv:2003.09443,ArXiv.
[57] Karlin, A. R., Klein, N., & Gharan, S. O. (2020). A (slightly) improved approximation algorithm for metric tsp.ArXiv,abs/2007.01409. · Zbl 07298228
[58] Kayhan, O. S., & Gemert, J. C. v. (2020). On translation invariance in cnns: Convolutional layers can exploit absolute spatial location. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14274-14285.
[59] Kersting, K., Kriege, N. M., Morris, C., Mutzel, P., & Neumann, M. (2016). Benchmark data sets for graph kernels.URL http://graphkernels. cs. tu-dortmund. de,29.
[60] Kool, W., Van Hoof, H., & Welling, M. (2019). Attention, learn to solve routing problems!. In7th International Conference on Learning Representations, ICLR 2019, pp. 1-25.
[61] Kosiorek, A. R., Kim, H., & Rezende, D. J. (2020). Conditional Set Generation with Transformers. InWorkshop on Object-Oriented Learning at ICML 2020.
[62] Kumar, P., Brahma, D., Karnick, H., & Rai, P. (2020). Deep Attentive Ranking Networks for Learning to Order Sentences. InAAAI 2020.
[63] Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2019). The omniglot challenge: a 3-year progress report.Current Opinion in Behavioral Sciences,29, 97-104.
[64] Lan, Y., Liu, T.-Y., Ma, Z., & Li, H. (2009). Generalization analysis of listwise learning-to-rank algorithms. InProceedings of the 26th Annual International Conference on Machine Learning, pp. 577-584.
[65] Lee, J., Lee, Y., Kim, J., Kosiorek, A. R., Choi, S., & Teh, Y. W. (2019). Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks.Proceedings of the 36th International Conference on Machine Learning,36.
[66] Lei, Y., Li, W., Lu, Z., & Zhao, M. (2017). Alternating pointwise-pairwise learning for personalized item ranking. InProceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2155-2158.
[67] Liao, S. H., & Chen, Y. J. (2004). Mining customer knowledge for electronic catalog marketing. Expert Systems with Applications,27(4), 521-532.
[68] Lin, T.-Y., Maji, S., & Koniusz, P. (2018). Second-order democratic aggregation. InProceedings of the European Conference on Computer Vision (ECCV), pp. 620-636.
[69] Linderman, S., Mena, G., Cooper, H., Paninski, L., & Cunningham, J. (2018). Reparameterizing the birkhoff polytope for variational permutation inference. InInternational Conference on Artificial Intelligence and Statistics, pp. 1618-1627. PMLR.
[70] Ling, J., Jones, R., & Templeton, J. (2016). Machine learning strategies for systems with invariance properties.Journal of Computational Physics,318, 22-35. · Zbl 1349.76124
[71] Liu, C., & Shum, H.-Y. (2003). Kullback-leibler boosting. In2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., Vol. 1, pp. I-I. IEEE.
[72] Liu, X., Weijer, J. V. D., & Bagdanov, A. D. (2019). Exploiting Unlabeled Data in CNNs by Self-Supervised Learning to Rank.IEEE Transactions on Pattern Analysis and Machine Intelligence,41(8), 1862-1878.
[73] Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV).
[74] Locatello, F., Weissenborn, D., Unterthiner, T., Mahendran, A., Heigold, G., Uszkoreit, J., Dosovitskiy, A., & Kipf, T. (2020). Object-Centric Learning with Slot Attention.arXiv,1(NeurIPS), 1-27.
[75] Logeswaran, L., Lee, H., & Radev, D. (2018). Sentence ordering and coherence modeling using recurrent neural networks. In32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 5285-5292.
[76] Loosli, G., Canu, S., & Bottou, L. (2007). Training invariant support vector machines using selective sampling.Large scale kernel machines,2.
[77] Ma, S., Deng, Z.-H., & Yang, Y. (2016). An unsupervised multi-document summarization framework based on neural document model. InProceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 1514-1523.
[78] Mani, K., Verma, I., Meisheri, H., & Dey, L. (2018). Multi-document summarization using distributed bag-of-words model. In2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI), pp. 672-675. IEEE.
[79] Marcos, Diego, V. (2016). Learning rotation invariant convolutional filters for texture classification. In2016 23rd International Conference on Pattern Recognition, pp. 2012-2017. IEEE.
[80] Meister, C., Vieira, T., & Cotterell, R. (2020). If beam search is the answer, what was the question?. Preprint arXiv:2010.02650,ArXiv.
[81] Mena, G., Belanger, D., Linderman, S., & Snoek, J. (2018). Learning latent permutations with gumbel-sinkhorn networks. InInternational Conference on Learning Representations 2018.
[82] Meng, C., Yang, J., Ribeiro, B., & Neville, J. (2019). HATS: A hierarchical sequence-attention framework for inductive set-of-sets embeddings. InProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019, pp. 783-792.
[83] Merity, S., Xiong, C., Bradbury, J., & Socher, R. (2016). Pointer sentinel mixture models. In 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings.
[84] Mitra, B., Craswell, N., et al. (2018). An introduction to neural information retrieval.Foundations and Trends®in Information Retrieval,13(1), 1-126.
[85] Moore, J., & Neville, J. (2017). Deep collective inference. InAAAI.
[86] Murphy, R. L., Srinivasan, B., Ribeiro, B., & Rao, V. (2019). Janossy pooling: Learning deep permutation-invariant functions for variable-size inputs. In7th International Conference on Learning Representations, ICLR 2019, pp. 1-21.
[87] Nazari, M., Oroojlooy, A., Tak´aˇc, M., & Snyder, L. V. (2018). Reinforcement learning for solving the vehicle routing problem.Advances in Neural Information Processing Systems,2018-December, 9839-9849.
[88] Niepert, M., Ahmed, M., & Kutzkov, K. (2016). Learning convolutional neural networks for graphs. InInternational conference on machine learning, pp. 2014-2023.
[89] Nishida, N., & Nakayama, H. (2017). Word ordering as unsupervised learning towards syntactically plausible word representations. InProceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP 2017), pp. 70-79.
[90] Noreen, N., Palaniappan, S., Qayyum, A., Ahmad, I., Imran, M., & Shoaib, M. (2020). A deep learning model based on concatenation approach for the diagnosis of brain tumor.IEEE Access,8, 55135-55144.
[91] Ntampaka, M., Trac, H., Sutherland, D., Fromenteau, S., Poczos, B., & Schneider, J. (2016). Dynamical mass measurements of contaminated galaxy clusters using machine learning.The Astrophysical Journal,831.
[92] Oladosu, A., Xu, T., Ekfeldt, P., Kelly, B. A., Cranmer, M., Ho, S., Price-Whelan, A. M., & Contardo, G. (2020). Meta-learning one-class classification with deepsets: Application in the milky way. Preprint arXiv:2007.04459,ArXiv.
[93] Oliveira, H., Silva, C., Machado, G. L., Nogueira, K., & Santos, J. A. d. (2020). Fully convolutional open set segmentation.Preprint arXiv:2006.14673,ArXiv.
[94] Pang, L., Xu, J., Ai, Q., Lan, Y., Cheng, X., & Wen, J. (2020). Setrank: Learning a permutationinvariant ranking model for information retrieval. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 499-508.
[95] Pfannschmidt, K., Gupta, P., & H¨ullermeier, E. (2018). Deep architectures for learning contextdependent ranking functions.ArXiv,abs/1803.05796.
[96] Pihera, J., & Musliu, N. (2014). Application of machine learning to algorithm selection for tsp. In2014 IEEE 26th International Conference on Tools with Artificial Intelligence, pp. 47-54. IEEE.
[97] Puduppully, R., Dong, L., & Lapata, M. (2019). Data-to-Text Generation with Content Selection and Planning.Proceedings of the AAAI Conference on Artificial Intelligence,33, 6908-6915.
[98] Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017). PointNet: Deep learning on point sets for 3D classification and segmentation. InProceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp. 77-85.
[99] Rang, B., & Thomas, W. (1981). Zermelo’s discovery of the “russell paradox”.Historia Mathematica, 8(1), 15-22. · Zbl 0458.01009
[100] Rockt¨aschel, T., Grefenstette, E., Hermann, K., Kocisk´y, T., & Blunsom, P. (2016). Reasoning about entailment with neural attention.CoRR,abs/1509.06664.
[101] Rol´ınek, M., Musil, V., Paulus, A., Vlastelica, M., Michaelis, C., & Martius, G. (2020). Optimizing rank-based metrics with blackbox differentiation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7620-7630.
[102] Saito, Y., Nakamura, T., Hachiya, H., & Fukumizu, K. (2019). Deep set-to-set matching and learning. ArXiv,abs/1910.09972.
[103] Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. InProceedings of the 10th International Conference on World Wide Web, WWW ’01, p. 285-295, New York, NY, USA. Association for Computing Machinery.
[104] Schafer (2007). Recurrent neural networks are universal approximators.International Journal of Neural Systems, Vol. 17, No. 4 (2007) 253-263, World Scientific Publishing Company Int., 17(4), 253-263.
[105] See, A., Liu, P., & Manning, C. (2017). Get to the point: Summarization with pointer-generator networks. InAssociation for Computational Linguistics.
[106] Serviansky, H., Segol, N., Shlomi, J., Cranmer, K., Gross, E., Maron, H., & Lipman, Y. (2020). Set2graph: Learning graphs from sets.Preprint arXiv:2002.08772,ArXiv.
[107] Skianis, K., Nikolentzos, G., Limnios, S., & Vazirgiannis, M. (2020). Rep the Set: Neural Networks for Learning Set Representations.AISTATS,1(23rd).
[108] Smith, K. A. (1999). Neural networks for combinatorial optimization: A review of more than a decade of research.INFORMS Journal on Computing,11(1), 15-34. · Zbl 1034.90528
[109] Spearman, C. (1904). The proof and measurement of association between two things.. InAmerican Journal of Psychology.
[110] Stuart, J. L., & Weaver, J. R. (1991). Matrices that commute with a permutation matrix.Linear Algebra and its Applications,150, 255 - 265. · Zbl 0735.15016
[111] Sun, Z., Tang, J., Du, P., Deng, Z. H., & Nie, J. Y. (2019). DivGraphPointer: A graph pointer network for extracting diverse keyphrases. InSIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 755-764.
[112] Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J. Z., Tuyls, K., & Graepel, T. (2018). Value-decomposition networks for cooperative multi-agent learning based on team reward.Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS,3, 2085-2087.
[113] Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems,4(January), 3104-3112.
[114] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. InProceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818-2826.
[115] Takeuti, G., & Zaring, W. M. (2013).Axiomatic set theory, Vol. 8. Springer Science & Business Media. · Zbl 0261.02038
[116] Taylor, L., & Nitschke, G. (2018). Improving deep learning with generic data augmentation. In2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1542-1547.
[117] Vartak, M., & Thiagarajan, A. (2017). A Meta-Learning Perspective on Cold-Start Recommendations for Items Manasi.31st Conference on Neural Information Processing Systems (NIPS 2017),29(3), 171-180.
[118] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017).Attention is all you need.Advances in Neural Information Processing Systems,2017-Decem(Nips), 5999-6009.
[119] Veliˇckovi´c, P., Cucurull, G., Casanova, A., Romero, A., Li‘o, P., & Bengio, Y. (2018). Graph Attention Networks.InInternational Conference on Learning Representations (2018).
[120] Vinyals, O., Babuschkin, I., Czarnecki, W. M., & Mathieu (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning.Nature,575(7782), 350-354.
[121] Vinyals, O., Bengio, S., & Kudlur, M. (2016). Order matters: Sequence to sequence for sets. In 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings, pp. 1-11.
[122] Vinyals, O., Fortunato, M., & Jaitly, N. (2015). Pointer networks. InAdvances in Neural Information Processing Systems, Vol. 2015-January, pp. 2692-2700.
[123] Volkovs, M. N., & Zemel, R. S. (2009). Boltzrank: learning to maximize expected ranking gain. In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1089-1096.
[124] Wagstaff, E., Fuchs, F. B., Engelcke, M., Posner, I., & Osborne, M. (2019). On the limitations of representing functions on sets.36th International Conference on Machine Learning, ICML 2019,2019-June, 11285-11298.
[125] Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., & Wu, Y. (2014). Learning fine-grained image similarity with deep ranking. InProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1386-1393.
[126] Wang, T., & Wan, X. (2019). Hierarchical Attention Networks for Sentence Ordering.Proceedings of the AAAI Conference on Artificial Intelligence,33, 7184-7191.
[127] Weston, J., Chopra, S., & Bordes, A. (2015). Memory Networks. InInternational Conference on Learning Represen- tations, ICLR, 2015, pp. 1-15.
[128] Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning.Machine learning,8(3-4), 229-256. · Zbl 0772.68076
[129] Wiseman, S., & Rush, A. M. (2016). Sequence-to-sequence learning as beam-search optimization. InProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1296-1306, Austin, Texas. Association for Computational Linguistics.
[130] Wolf, T., Sanh, V., Chaumond, J., & Delangue, C. (2019). TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents.Association for the Advancement of Artificial Intelligence,1(ii).
[131] Wu, Y., Riedel, S., Minervini, P., & Stenetorp, P. (2020). Don’t read too much into it: Adaptive computation for open-domain question answering.Preprint arXiv:2011.05435,ArXiv.
[132] Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3d shapenets: A deep representation for volumetric shapes. InProceedings of the IEEE conference on computer vision and pattern recognition, pp. 1912-1920.
[133] Xia, F., Liu, T.-Y., Wang, J., Zhang, W., & Li, H. (2008). Listwise approach to learning to rank: theory and algorithm. InProceedings of the 25th international conference on Machine learning, pp. 1192-1199.
[134] Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., & Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. InInternational conference on machine learning, pp. 2048-2057.
[135] Xu, K., Jegelka, S., Hu, W., & Leskovec, J. (2019). How powerful are graph neural networks?. In 7th International Conference on Learning Representations, ICLR 2019, pp. 1-17.
[136] Xu, Z., Zhu, L., & Yang, Y. (2017). Few-shot object recognition from machine-labeled web images. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1164-1172.
[137] Yang, B., Wang, S., Markham, A., & Trigoni, N. (2020). Robust attentional aggregation of deep feature sets for multi-view 3d reconstruction.International Journal of Computer Vision,128(1), 53-73.
[138] Ying, Z., You, J., Morris, C., Ren, X., Hamilton, W., & Leskovec, J. (2018). Hierarchical graph representation learning with differentiable pooling. InAdvances in neural information processing systems, pp. 4800-4810.
[139] Yu, T., Meng, J., & Yuan, J. (2018). Multi-view harmonized bilinear network for 3d object recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 186-194.
[140] Zaheer, M., Kottur, S., Ravanbhakhsh, S., P´oczos, B., Salakhutdinov, R., & Smola, A. J. (2017). Deep sets.Advances in Neural Information Processing Systems,2017-December(ii), 3392-3402.
[141] Zhang, M., Cui, Z., Neumann, M., & Chen, Y. (2018). An end-to-end deep learning architecture for graph classification. InAAAI.
[142] Zhang, Y., Hare, J., & Pr¨ugel-Bennett, A. (2019). Deep Set Prediction Networks. InNeurIPS 2019. 923
[143] Zhang, Y., Hare, J., & Pr¨ugel-Bennett, A. (2020).FSPool: Learning Set Representations with Featurewise Sort Pooling. InICLR 2020.
[144] Zhang, Y., Pr¨ugel-Bennett, A., & Hare, J. (2019). Learning representations of sets through optimized permutations. In7th International Conference on Learning Representations, ICLR 2019.
[145] Zhao, Z.-Q., tao Xu, S., Liu, D., Tian, W., & Jiang, Z.-D. (2019a). A review of image set classification. Neurocomputing,335, 251-260.
[146] Zhao, Z. Q., Zheng, P., Xu, S. T., & Wu, X. (2019b). Object Detection with Deep Learning: A Review..
[147] Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable person re-identification : A benchmark university of texas at san antonio. InICCV 2015, pp. 1116-1124.
[148] Zhong, V., Xiong, C., & Socher, R. (2018). Seq2sql: Generating structured queries from natural language using reinforcement learning. In6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.