RCV1 swMATH ID: 7279 Software Authors: Lewis, David D.; Yang, Yiming; Rose, Tony G.; Li, Fan Description: RCV1: A New Benchmark Collection for Text Categorization Research. Reuters Corpus Volume I (RCV1) is an archive of over 800,000 manually categorized newswire stories recently made available by Reuters, Ltd. for research purposes. Use of this data for research on text categorization requires a detailed understanding of the real world constraints under which the data was produced. Drawing on interviews with Reuters personnel and access to Reuters documentation, we describe the coding policy and quality control procedures used in producing the RCV1 data, the intended semantics of the hierarchical category taxonomies, and the corrections necessary to remove errorful data. We refer to the original data as RCV1-v1, and the corrected data as RCV1-v2. We benchmark several widely used supervised learning methods on RCV1-v2, illustrating the collection’s properties, suggesting new directions for research, and providing baseline results for future studies. We make available detailed, per-category experimental results, as well as corrected versions of the category assignments and taxonomy structures, via online appendices. Homepage: http://dl.acm.org/citation.cfm?id=1005345 Related Software: LIBSVM; UCI-ml; BoosTexter; L-BFGS; SGD-QN; LIBLINEAR; AdaGrad; Pegasos; OHSUMED; word2vec; ImageNet; t-SNE; HOGWILD; GloVe; Adam; ML-KNN; MULAN; SVMlight; Saga; ElemStatLearn Cited in: 108 Documents all top 5 Cited by 316 Authors 5 Lin, Chih-Jen 3 Bottou, Léon 3 Langford, John 3 Lin, Qihang 3 Yuan, Xiaotong 3 Zhang, Tong 2 Chang, Kai-Wei 2 Crammer, Koby 2 Drineas, Petros 2 Fürnkranz, Johannes 2 Hsieh, Cho-Jui 2 Huang, Yakui 2 Kuang, Da 2 Lebanon, Guy 2 Li, Lihong 2 Li, Ping 2 Liu, Hongwei 2 Nedić, Angelia 2 Park, Haesun 2 Schuster, Assaf 2 Shanbhag, Uday V. 2 Sharfman, Izchak 2 Song, Yangqiu 2 Wang, Xiao 2 Xiao, Lin 2 Ye, Jieping 2 Yin, Wotao 2 Yousefian, Farzad 2 Yun, Sangwoon 1 Abe, Shigeo 1 Agarwal, Alekh 1 Arbabifard, Kamyar 1 Bach, Francis R. 1 Bahamonde, Antonio 1 Balakrishnan, Suhrid 1 Bashar, Md Abul 1 Basu, Sugato 1 Bayoudh, Ines 1 Bechet, Nicolas 1 Benites, Fernando 1 Berry, Michael W. 1 Bianchi, Pascal 1 Bontcheva, Kalina 1 Bordes, Antoine 1 Brinker, Klaus 1 Browne, Murray 1 Brucker, Florian 1 Buntine, Wray L. 1 Burkhardt, Sophie 1 Busygin, Stanislav 1 Cai, Hongmin 1 Cai, Linkun 1 Cen, Shicong 1 Cerri, Ricardo 1 Chambers, America 1 Chawla, Nitesh V. 1 Chen, Jianhui 1 Chen, Jiazhou 1 Chen, Yu 1 Cheng, Hong 1 Chow, Tommy W. S. 1 Cristianini, Nello 1 Cristofari, Andrea 1 Cunningham, Hamish 1 Curtis, Frank E. 1 Cuturi, Marco 1 Damerau, Fred J. 1 Daumé, Hal III 1 Davidson, Ian 1 De Santis, Marianna 1 De Tré, Guy 1 del Coz, Juan José 1 Deligiannakis, Antonios 1 Deng, Sucheng 1 Díez, Jorge 1 Diggavi, Suhas N. 1 Dillon, Joshua V. 1 Dimakis, Alexandros G. 1 Domeniconi, Carlotta 1 Drake, Barry L. 1 Dredze, Mark 1 Du, Lan 1 Du, Rundong 1 Duchi, John C. 1 Dudík, Miroslav 1 Duivesteijn, Wouter 1 Dvurechensky, Pavel E. 1 Dy, Jennifer G. 1 Elenberg, Ethan R. 1 Erhan, Dumitru 1 Fan, Rong-En 1 Fan, Yiwei 1 Fazel, Maryam 1 Fercoq, Olivier 1 Finley, Thomas 1 Flaounas, Ilias 1 Forman, George 1 Fountoulakis, Kimon 1 Gabrilovich, Evgeniy 1 Gallinari, Patrick ...and 216 more Authors all top 5 Cited in 39 Serials 18 Journal of Machine Learning Research (JMLR) 13 Machine Learning 9 SIAM Journal on Optimization 8 Data Mining and Knowledge Discovery 5 Pattern Recognition 4 Information Sciences 3 Artificial Intelligence 2 Neural Networks 2 Journal of Global Optimization 2 Mathematical Programming. Series A. Series B 2 Computational Optimization and Applications 2 The Journal of Artificial Intelligence Research (JAIR) 1 ACM Transactions on Database Systems 1 The Annals of Statistics 1 Automatica 1 Fuzzy Sets and Systems 1 Journal of the American Statistical Association 1 Journal of Applied Probability 1 Mathematics of Operations Research 1 Applied Numerical Mathematics 1 Statistical Science 1 Computers & Operations Research 1 Applied Mathematics Letters 1 SIAM Journal on Matrix Analysis and Applications 1 Journal of Scientific Computing 1 Journal of Parallel and Distributed Computing 1 Neural Computation 1 Computational Statistics 1 SIAM Review 1 Computational Statistics and Data Analysis 1 SIAM Journal on Scientific Computing 1 Numerical Linear Algebra with Applications 1 Computer Networks 1 Mathematical Biosciences and Engineering 1 Electronic Journal of Statistics 1 Foundations and Trends in Machine Learning 1 EURO Journal on Computational Optimization 1 Journal of the Operations Research Society of China 1 Information Geometry all top 5 Cited in 13 Fields 76 Computer science (68-XX) 40 Statistics (62-XX) 31 Operations research, mathematical programming (90-XX) 21 Numerical analysis (65-XX) 5 Linear and multilinear algebra; matrix theory (15-XX) 5 Calculus of variations and optimal control; optimization (49-XX) 2 Combinatorics (05-XX) 2 Probability theory and stochastic processes (60-XX) 2 Game theory, economics, finance, and other social and behavioral sciences (91-XX) 2 Biology and other natural sciences (92-XX) 1 Partial differential equations (35-XX) 1 Operator theory (47-XX) 1 Systems theory; control (93-XX) Citations by Year