Stagewise learning for noisy \(k\)-ary preferences.

*(English)*Zbl 06990185Summary: The aggregation of \(k\)-ary preferences is a novel ranking problem that plays an important role in several aspects of daily life, such as ordinal peer grading, online image-rating, meta-search and online product recommendation. Meanwhile, crowdsourcing is increasingly emerging as a way to provide a plethora of \(k\)-ary preferences for these types of ranking problems, due to the convenience of the platforms and the lower costs. However, preferences from crowd workers are often noisy, which inevitably degenerates the reliability of conventional aggregation models. In addition, traditional inferences usually lead to massive computational costs, which limits the scalability of aggregation models. To address both of these challenges, we propose a reliable CrowdsOUrced Plackett-LucE (COUPLE) model combined with an efficient Bayesian learning technique. To ensure reliability, we introduce an uncertainty vector for each crowd worker in COUPLE, which recovers the ground truth of the noisy preferences with a certain probability. Furthermore, we propose an Online Generalized Bayesian Moment Matching (OnlineGBMM) algorithm, which ensures that COUPLE is scalable to large-scale datasets. Comprehensive experiments on four large-scale synthetic datasets and three real-world datasets show that, COUPLE with OnlineGBMM achieves substantial improvements in reliability and noisy worker detection over other well-known approaches.

##### MSC:

68T05 | Learning and adaptive systems in artificial intelligence |

##### Keywords:

noisy preferences; rank aggregation; reliable Plackett-Luce model; online ranking; Bayesian moment matching
PDF
BibTeX
XML
Cite

\textit{Y. Pan} et al., Mach. Learn. 107, No. 8--10, 1333--1361 (2018; Zbl 06990185)

Full Text:
DOI

##### References:

[1] | Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer. · Zbl 1107.68072 |

[2] | Bradley, RA; Terry, M, Rank analysis of incomplete block designs: the method of paired comparisons, Biometrika, 39, 324-345, (1952) · Zbl 0047.12903 |

[3] | Chen, X., Bennett, P., Collins-Thompson, K., & Horvitz, E. (2013). Pairwise ranking aggregation in a crowdsourced setting. In WWW. |

[4] | De Alfaro, L., & Shavlovsky, M. (2014). Crowdgrader: A tool for crowdsourcing the evaluation of homework assignments. In ACM technical symposium on computer science education. |

[5] | Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR (pp. 248-255). IEEE. |

[6] | Desarkar, MS; Sarkar, S; Mitra, P, Preference relations based unsupervised rank aggregation for metasearch, Expert Systems with Applications, 49, 86-98, (2016) |

[7] | Dwork, C., Kumar, R., Naor, M., & Sivakumar, D. (2001). Rank aggregation methods for the web. In WWW (pp. 613-622). ACM. |

[8] | Elo, A. E. (1978). The rating of chessplayers, past and present. Nagoya: Arco Pub. |

[9] | Fligner, MA; Verducci, JS, Distance based ranking models, Journal of the Royal Statistical Society Series B (Methodological), 48, 359-369, (1986) · Zbl 0658.62031 |

[10] | Glickman, ME, Parameter estimation in large dynamic paired comparison experiments, Journal of the Royal Statistical Society: Series C (Applied Statistics), 48, 377-394, (1999) · Zbl 0939.62071 |

[11] | Guiver, J., & Snelson, E. (2009). Bayesian inference for Plackett-Luce ranking models. In ICML. ACM. |

[12] | Herbrich, R., Minka, T., & Graepel, T. (2007). Trueskill\(^{{\rm TM}}\): A Bayesian skill rating system. In NIPS (pp. 569-576). |

[13] | Jaini, P., Chen, Z., Carbajal, P., Law, E., Middleton, L., Regan, K., Schaekermann, M., Trimponias, G., Tung, J., & Poupart, P. (2016). Online Bayesian transfer learning for sequential data modeling. In ICLR |

[14] | Kazai, G., Kamps, J., Koolen, M., & Milic-Frayling, N. (2011). Crowdsourcing for book search evaluation: Impact of hit design on comparative system ranking. In SIGIR. |

[15] | Khare, R; Good, BM; Leaman, R; Su, AI; Lu, Z, Crowdsourcing in biomedicine: challenges and opportunities, Briefings in Bioinformatics, 17, 23-32, (2015) |

[16] | Khetan, A; Oh, S, Data-driven rank breaking for efficient rank aggregation, Journal of Machine Learning Research, 17, 1-54, (2016) · Zbl 1394.91090 |

[17] | Knight, H; Keith, O, Ranking facial attractiveness, The European Journal of Orthodontics, 27, 340-348, (2005) |

[18] | Kulkarni, C; Wei, KP; Le, H; Chia, D; Papadopoulos, K; Cheng, J; Koller, D; Klemmer, SR, Peer and self assessment in massive online classes, ACM Transactions on Computer-Human Interaction, 20, 1-31, (2013) |

[19] | Lijphart, A. (1994). Electoral systems and party systems: A study of twenty-seven democracies, 1945-1990. Oxford: Oxford University Press. |

[20] | Liu, Q., Peng, J., & Ihler, A. T. (2012). Variational inference for crowdsourcing. In NIPS (pp. 692-700). |

[21] | Liu, TY, Learning to rank for information retrieval, Foundations and Trends® in Information Retrieval, 3, 225-331, (2009) |

[22] | Luaces, O; Díez, J; Alonso-Betanzos, A; Troncoso, A; Bahamonde, A, A factorization approach to evaluate open-response assignments in moocs using preference learning on peer assessments, Knowledge-Based Systems, 85, 322-328, (2015) |

[23] | Luce, R. D. (1959). Individual choice behavior: A theoretical analysis. New York: Wiley. · Zbl 0093.31708 |

[24] | Mallows, C, Non-null ranking models, Biometrika, 44, 114-130, (1957) · Zbl 0087.34001 |

[25] | Maydeu-Olivares, A, Thurstonian modeling of ranking data via mean and covariance structure analysis, Psychometrika, 64, 325-340, (1999) · Zbl 1291.62230 |

[26] | Mollica, C; Tardella, L, Bayesian plackett-luce mixture models for partially ranked data, Psychometrika, 82, 442-458, (2016) · Zbl 1402.62045 |

[27] | Negahban, S; Oh, S; Shah, D, Rank centrality: ranking from pairwise comparisons, Operations Research, 65, 266-287, (2016) · Zbl 1414.91133 |

[28] | Ok, J., Oh, S., Shin, J., & Yi, Y. (2016). Optimality of belief propagation for crowdsourced classification. In ICML (pp. 535-544). |

[29] | Plackett, R, The analysis of permutations, Applied Statistics, 24, 193-202, (1975) |

[30] | Raman, K., & Joachims, T. (2014). Methods for ordinal peer grading. In KDD. |

[31] | Rashwan, A., Zhao, H., & Poupart, P. (2016). Online and distributed Bayesian moment matching for parameter learning in sum-product networks. In AISTATS (pp. 1469-1477). |

[32] | Richard, B. (2013). Cheap solutions: Managing a co-producing crowd of strangers to solve your problems. Contemporary perspectives on technical innovation, management and policy (pp. 261-287). |

[33] | Saari, DG, Explaining all three-alternative voting outcomes, Journal of Economic Theory, 87, 313-355, (1999) · Zbl 1016.91029 |

[34] | Shah, N., Balakrishnan, S., Bradley, J., Parekh, A., Ramchandran, K., & Wainwright, M. (2015). Estimation from pairwise comparisons: Sharp minimax bounds with topology dependence. In AISTATS. · Zbl 1360.62409 |

[35] | Shah, N., Bradley, J., Parekh, A., Wainwright, M., & Ramchandran, K. (2013). A case for ordinal peer-evaluation in moocs. In NIPS Workshop on Data Driven Education. |

[36] | Soufiani, H. A., Chen, W., Parkes, D. C., & Xia, L. (2013). Generalized method-of-moments for rank aggregation. In NIPS (pp. 2706-2714). |

[37] | Soufiani, H. A., Parkes, D. C., & Xia, L. (2014). Computing parametric ranking models via rank-breaking. In ICML (pp. 360-368). |

[38] | Thurstone, LL, The method of paired comparisons for social values, Journal of Abnormal and Social Psychology, 21, 384, (1927) |

[39] | Tsiporkova, E; Boeva, V, Multi-step ranking of alternatives in a multi-criteria and multi-expert decision making environment, Information Sciences, 176, 2673-2697, (2006) · Zbl 1102.68655 |

[40] | Turner, TL; Miller, PM, Investigating natural variation in drosophila courtship song by the evolve and resequence approach, Genetics, 191, 633-642, (2012) |

[41] | Vitelli, V., Sørensen, Ø., Frigessi, A., & Arjas, E. (2014). Probabilistic preference learning with the mallows rank model. arXiv:1405.7945. · Zbl 06982914 |

[42] | Volkovs, M., & Zemel, R. (2012). A flexible generative model for preference aggregation. In WWW. |

[43] | Vuurens, J., de Vries, A. P., & Eickhoff, C. (2011). How much spam can you take? An analysis of crowdsourcing results to increase accuracy. In SIGIR Workshop on CIR. |

[44] | Wainwright, MJ; Jordan, MI, Graphical models, exponential families, and variational inference, Foundations and Trends® in Machine Learning, 1, 1-305, (2008) · Zbl 1193.62107 |

[45] | Weng, R; Lin, CJ, A Bayesian approximation method for online ranking, Journal of Machine Learning Research, 12, 267-300, (2011) · Zbl 1280.68215 |

[46] | Woodroofe, M; etal., Very weak expansions for sequentially designed experiments: linear models, The Annals of Statistics, 17, 1087-1102, (1989) · Zbl 0683.62039 |

[47] | Yan, L., Dodier, R. H., Mozer, M., & Wolniewicz, R. H. (2003). Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic. In ICML. |

[48] | Zagel, C; Piazza, A; Petrov, Y; Bodendorf, F; Freund, LE (ed.); Cellary, W (ed.), Sciencomat: A gamified research platform for evaluating visual attractiveness, 50-60, (2018), Berlin |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.