zbMATH — the first resource for mathematics

Robust Plackett-Luce model for \(k\)-ary crowdsourced preferences. (English) Zbl 06889042
Summary: The aggregation of \(k\)-ary preferences is an emerging ranking problem, which plays an important role in several aspects of our daily life, such as ordinal peer grading and online product recommendation. At the same time, crowdsourcing has become a trendy way to provide a plethora of \(k\)-ary preferences for this ranking problem, due to convenient platforms and low costs. However, \(k\)-ary preferences from crowdsourced workers are often noisy, which inevitably degenerates the performance of traditional aggregation models. To address this challenge, in this paper, we present a RObust PlAckett-Luce (ROPAL) model. Specifically, to ensure the robustness, ROPAL integrates the Plackett-Luce model with a denoising vector. Based on the Kendall-tau distance, this vector corrects \(k\)-ary crowdsourced preferences with a certain probability. In addition, we propose an online Bayesian inference to make ROPAL scalable to large-scale preferences. We conduct comprehensive experiments on simulated and real-world datasets. Empirical results on “massive synthetic” and “real-world” datasets show that ROPAL with online Bayesian inference achieves substantial improvements in robustness and noisy worker detection over current approaches.

68T05 Learning and adaptive systems in artificial intelligence
Visual Genome
Full Text: DOI
[1] Alfaro, L., & Shavlovsky, M. (2014). Crowdgrader: A tool for crowdsourcing the evaluation of homework assignments. In ACM technical symposium on computer science education (pp. 415-420). ACM.
[2] Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer. · Zbl 1107.68072
[3] Bradley, R; Terry, M, Rank analysis of incomplete block designs: I. the method of paired comparisons, Biometrika, 39, 324-345, (1952) · Zbl 0047.12903
[4] Chen, X., Bennett, P. N., Collins-Thompson, K., & Horvitz, E. (2013). Pairwise ranking aggregation in a crowdsourced setting. In WSDM (pp. 193-202). ACM. · Zbl 1291.62230
[5] Deng, J; Krause, J; Stark, M; Fei-Fei, L, Leveraging the wisdom of the crowd for fine-grained recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 666-676, (2016)
[6] Diaconis, P. (1988). Group representations in probability and statistics. Lecture Notes-Monograph Series (Vol. 11, pp. i-192). Hayward, CA: Institute of Mathematical Statistics. · Zbl 0695.60012
[7] Dwork, C., Kumar, R., Naor, M., & Sivakumar, D. (2001). Rank aggregation methods for the web. In WWW (pp. 613-622). ACM.
[8] Freeman, S; Parks, JW, How accurate is peer grading?, CBE-Life Sciences Education, 9, 482-488, (2010)
[9] Guiver, J., & Snelson, E. (2009). Bayesian inference for Plackett-Luce ranking models. In ICML (pp. 377-384). ACM.
[10] Kazai, G., Kamps, J., Koolen, M., & Milic-Frayling, N. (2011). Crowdsourcing for book search evaluation: Impact of hit design on comparative system ranking. In SIGIR conference on research and development in information retrieval (pp. 205-214). ACM.
[11] Kendall, M. G. (1948). Rank correlation methods. London: Charles Griffin & Company Limited. · Zbl 0032.17602
[12] Kolde, R; Laur, S; Adler, P; Vilo, J, Robust rank aggregation for gene List integration and meta-analysis, Bioinformatics, 28, 573-580, (2012)
[13] Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L. J., Shamma, D. A., Bernstein, M. S., & Fei-Fei, L. (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123, 1-42.
[14] Kulkarni, C., Wei, K., Le, H., Chia, D., Papadopoulos, K., Cheng, J., et al. (2015). Peer and self assessment in massive online classes. In Design thinking research (pp. 131-168). Berlin: Springer.
[15] Liu, TY, Learning to rank for information retrieval, Foundations and Trends Information Retrieval, 3, 225-331, (2009)
[16] Lu, T., & Boutilier, C. (2011). Learning Mallows models with pairwise preferences. In ICML (pp. 145-152).
[17] Lu, T; Boutilier, C, Effective sampling and learning for Mallows models with pairwise-preference data, Journal of Machine Learning Research, 15, 3783-3829, (2014) · Zbl 1312.68171
[18] Luaces, O; Díez, J; Alonso-Betanzos, A; Troncoso, A; Bahamonde, A, A factorization approach to evaluate open-response assignments in MOOCs using preference learning on peer assessments, Knowledge-Based Systems, 85, 322-328, (2015)
[19] Luce, R. D. (1959). Individual choice behavior: A theoretical analysis. New York: Wiley. · Zbl 0093.31708
[20] Lv, Y., Moon, T., Kolari, P., Zheng, Z., Wang, X., & Chang, Y. (2011). Learning to model relatedness for news recommendation. In WWW (pp. 57-66). ACM.
[21] Mallows, CL, Non-null ranking models. I, Biometrika, 44, 114-130, (1957) · Zbl 0087.34001
[22] Maydeu-Olivares, A, Thurstonian modeling of ranking data via mean and covariance structure analysis, Psychometrika, 64, 325-340, (1999) · Zbl 1291.62230
[23] Mollica, C., & Tardella, L. (2017). Bayesian Plackett-Luce mixture models for partially ranked data. Psychometrika, 82(2):442-458. doi:10.1007/s11336-016-9530-0. · Zbl 1402.62045
[24] Ok, J., Oh, S., Shin, J., & Yi, Y. (2016). Optimality of belief propagation for crowdsourced classification. In ICML (pp. 535-544).
[25] Plackett, R. L. (1975). The analysis of permutations. Applied Statistics, 24, 193-202.
[26] Prpić, J., Melton, J., Taeihagh, A., & Anderson, T. (2015). MOOCs and crowdsourcing: Massive courses and massive resources. · Zbl 0047.12903
[27] Raman, K., & Joachims, T. (2014). Methods for ordinal peer grading. In KDD (pp. 1037-1046). ACM.
[28] Schalekamp, F., & Zuylen, A. (2009). Rank aggregation: Together we’re strong. In Workshop on algorithm engineering and experiments (pp. 38-51). Society for Industrial and Applied Mathematics.
[29] Shah, N., Bradley, J., Parekh, A., Wainwright, M., & Ramchandran, K. (2013). A case for ordinal peer-evaluation in MOOCs. In NIPS workshop on data driven education.
[30] Sheng, V., Provost, F., & Ipeirotis, P. (2008). Get another label? Improving data quality and data mining using multiple, noisy labelers. In KDD (pp. 614-622). ACM.
[31] Snow, R., O’Connor, B., Jurafsky, D., & Ng, A. Y. (2008). Cheap and fast—but is it good? Evaluating non-expert annotations for natural language tasks. In EMNLP (pp. 254-263). Association for Computational Linguistics.
[32] Thurstone, L, The method of paired comparisons for social values, Journal of Abnormal and Social Psychology, 21, 384-400, (1927)
[33] Venanzi, M., Guiver, J., Kazai, G., Kohli, P., & Shokouhi, M. (2014). Community-based Bayesian aggregation models for crowdsourcing. In WWW (pp. 155-164). ACM.
[34] Volkovs, M., Larochelle, H., & Zemel, R. (2012). Learning to rank by aggregating expert preferences. In CIKM (pp. 843-851). ACM.
[35] Vuurens, J., Vries, A., & Eickhoff, C. (2011). How much spam can you take? An analysis of crowdsourcing results to increase accuracy. In SIGIR workshop on crowdsourcing for information retrieval (pp. 21-26).
[36] Wang, Y. S., Matsueda, R., & Erosheva, E. A. (2015). A variational EM method for mixed membership models with multivariate rank data: An analysis of public policy preferences. arXiv preprint arXiv:1512.08731 · Zbl 1379.62099
[37] Weng, RC; Lin, CJ, A Bayesian approximation method for online ranking, Journal of Machine Learning Research, 12, 267-300, (2011) · Zbl 1280.68215
[38] Yan, L., Dodier, R., Mozer, M., & Wolniewicz, R. (2003). Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic. In ICML (pp. 848-855).
[39] Zhao, Z., Piech, P., & Xia, L. (2016). Learning mixtures of Plackett-Luce models. In ICML (pp. 2906-2914).
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.