×

PLMIX: an R package for modelling and clustering partially ranked data. (English) Zbl 07194321

Summary: The PLMIX package offers a comprehensive framework aimed at endowing the R statistical environment with some recent methodological advances in modelling and clustering partially ranked data. The usefulness of the PLMIX package can be motivated from several perspectives: (i) it contributes to fill the gap concerning Bayesian estimation of ranking models in R, by focusing on the Plackett-Luce model and its extension within the finite mixture approach as the generative sampling distribution; (ii) it addresses computational complexity by combining the flexibility of R routines and the speed of compiled C++ code, with possibly parallel execution; (iii) it covers the fundamental phases of ranking data analysis allowing for a more careful and critical application of ranking models in real contexts; (iv) it provides effective tools for clustering heterogeneous partially ranked data. Specific S3 classes and methods are also supplied to enhance the usability and foster exchange with other packages. The functionality of the novel package is illustrated with several applications to simulated and real datasets.

MSC:

62F07 Statistical ranking and selection procedures
62F15 Bayesian inference
PDF BibTeX XML Cite
Full Text: DOI arXiv

References:

[1] Stern H. Probability models on rankings and the electoral process. In: Probability models and statistical analyses for ranking data. (Lecture Notes in Statist.; Vol. 80). New York: Springer; 1993. p. 173-195. [Google Scholar] · Zbl 0800.62827
[2] Gormley IC, Murphy TB.Exploring voting blocs within the Irish electorate: a mixture modeling approach. J Amer Statist Assoc. 2008;103(483):1014-1027. doi: 10.1198/016214507000001049[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 1205.62198
[3] Vigneau E, Courcoux P, Semenou M.Analysis of ranked preference data using latent class models. Food Qual Prefer. 1999;10(3):201-207. doi: 10.1016/S0950-3293(99)00017-8[Crossref], [Web of Science ®], [Google Scholar]
[4] Yu PLH, Lam KF, Lo SM.Factor analysis for ranked data with application to a job selection attitude survey. J Roy Stat Soc Ser A (Stat Soc). 2005;168(3):583-597. doi: 10.1111/j.1467-985X.2005.00363.x[Crossref], [Google Scholar] · Zbl 1119.62055
[5] Gormley IC, Murphy TB.Analysis of Irish third-level college applications data. J Roy Stat Soc Ser A. 2006;169(2):361-379. doi: 10.1111/j.1467-985X.2006.00412.x[Crossref], [Google Scholar]
[6] Henery RJ.Permutation probabilities as models for horse races. J Roy Stat Soc Ser B (Stat Methodol). 1981;43(1):86-91. [Google Scholar]
[7] Stern H.Models for distributions on permutations. J Am Stat Assoc. 1990;85(410):558-564. doi: 10.1080/01621459.1990.10476235[Taylor & Francis Online], [Web of Science ®], [Google Scholar]
[8] Caron F, Doucet A.Efficient Bayesian inference for generalized Bradley-Terry models. J Comput Graph Stat. 2012;21(1):174-196. doi: 10.1080/10618600.2012.638220[Taylor & Francis Online], [Web of Science ®], [Google Scholar]
[9] Alvo M, Yu PL. Statistical methods for ranking data. New York: Springer; 2014. [Crossref], [Google Scholar] · Zbl 1341.62001
[10] Liu Q, Crispino M, Scheel I, et al. Model-based learning from preference data. Annu Rev Stat Appl. 2019;6:329-354. doi: 10.1146/annurev-statistics-031017-100213[Crossref], [Web of Science ®], [Google Scholar]
[11] Yu PLH, Gu J, Xu H.Analysis of ranking data. Wiley Interdiscip Rev Comput Stat. 2019;11(6):e1483. doi: 10.1002/wics.1483[Crossref], [Google Scholar]
[12] Mattei N, Walsh T. Preflib: A library of preference data http://preflib.org. In: Proceedings of the 3rd International Conference on Algorithmic Decision Theory (ADT 2013). Springer; 2013. Lecture Notes in Artificial Intelligence. [Google Scholar]
[13] Plackett RL.Random permutations. J Roy Stat Soc Ser B (Methodol). 1968;30(3):517-534. [Google Scholar] · Zbl 0185.45403
[14] Critchlow DE, Fligner MA, Verducci JS.Probability models on rankings. J Math Psych. 1991;35(3):294-318. doi: 10.1016/0022-2496(91)90050-4[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0741.62024
[15] Marden JI. Analyzing and modeling rank data. Chapman & Hall; 1995. (Monographs on Statistics and Applied Probability; Vol. 64). [Google Scholar]
[16] Thurstone LL.A law of comparative judgement. Psychol Rev. 1927;34:273-286. doi: 10.1037/h0070288[Crossref], [Google Scholar]
[17] Bradley RA, Terry ME.Rank analysis of incomplete block designs. Biometrika. 1952;39:324-345. [Crossref], [Web of Science ®], [Google Scholar] · Zbl 0047.12903
[18] Bradley RA. Paired comparisons: some basic procedures and examples. In: Nonparametric methods; (Handbook of Statist.; Vol. 4). North-Holland; 1984. p. 299-326. [Google Scholar] · Zbl 0597.62081
[19] Mallows CL.Non-null ranking models. Biometrika. 1957;44:114-130. doi: 10.1093/biomet/44.1-2.114[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0087.34001
[20] Silverberg AR. Statistical models for q-permutations [Ph.D. Thesis]. ProQuest LLC, Ann Arbor, MI; 1980. Princeton University. [Google Scholar]
[21] Fligner MA, Verducci JS.Multistage ranking models. J Amer Statist Assoc. 1988;83(403):892-901. doi: 10.1080/01621459.1988.10478679[Taylor & Francis Online], [Web of Science ®], [Google Scholar] · Zbl 0719.62036
[22] Mollica C, Tardella L. PLMIX: Bayesian analysis of finite mixtures of Plackett-Luce models for partial rankings/orderings; 2019. R package version 2.1.1; Available from: https://CRAN.R-project.org/package=PLMIX. [Google Scholar]
[23] Sorensen O, Vitelli V, Crispino M. BayesMallows: Bayesian preference learning with the Mallows rank model; 2019. R package version 0.4.1; Available from: https://CRAN.R-project.org/package=BayesMallows. [Google Scholar]
[24] Vitelli V, Sørensen Ø, Crispino M, et al. Probabilistic preference learning with the Mallows rank model. J Mach Learn Res. 2018;18(158):1-49. [Google Scholar] · Zbl 1471.62268
[25] Irurozki E, Calvo B, Lozano JA. PerMallows: Permutations and mallows distributions; 2017. R package version 1.13; Available from: https://CRAN.R-project.org/package=PerMallows. [Google Scholar]
[26] Irurozki E, Calvo Molinos B, Lozano Alonso JA. An R package for permutations, Mallows and Generalized Mallows models. Dept of Computer Science and Artificial Intelligence, University of the Basque Country; 2014. [Google Scholar]
[27] Fligner MA, Verducci JS.Distance based ranking models. J Roy Stat Soc Ser B (Stat Methodol). 1986;48(3):359-369. [Google Scholar] · Zbl 0658.62031
[28] Turner H, Kosmidis I, Firth D. PlackettLuce: Plackett-luce models for rankings; 2019. R package version 0.2-9; Available from: https://CRAN.R-project.org/package=PlackettLuce. [Google Scholar]
[29] Lee PH, Yu PLH. pmr: Probability models for ranking data; 2015. R package version 1.2.5; Available from: https://CRAN.R-project.org/package=pmr. [Google Scholar]
[30] Lee PH, Yu PLH.An R package for analyzing and modeling ranking data. BMC Med Res Methodol. 2013;13(1):65. doi: 10.1186/1471-2288-13-65[Crossref], [PubMed], [Google Scholar]
[31] Lee PH, Yu PLH.Distance-based tree models for ranking data. Comput Statist Data Anal. 2010;54(6):1672-1682. doi: 10.1016/j.csda.2010.01.027[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1284.62055
[32] Hatzinger R. prefmod: Utilities to fit paired comparison models for preferences; 2017. R package version 0.8-34; Available from: https://CRAN.R-project.org/package=prefmod. [Google Scholar]
[33] Hatzinger R, Dittrich R.prefmod: an R package for modeling preferences based on paired comparisons, rankings, or ratings. J Stat Softw. 2012;48(10):1-31. Available from: http://www.jstatsoft.org/v48/i10. doi: 10.18637/jss.v048.i10[Crossref], [Web of Science ®], [Google Scholar]
[34] Grimonprez Q, Jacques J. Rankcluster: model-based clustering for multivariate partial ranking data; 2019. R package version 0.94.1; Available from: https://CRAN.R-project.org/package=Rankcluster. [Google Scholar]
[35] Jacques J, Grimonprez Q, Biernacki C.Rankcluster: an R package for clustering multivariate partial rankings. R J. 2014;6(1):10. doi: 10.32614/RJ-2014-010[Crossref], [Google Scholar]
[36] Jacques J, Biernacki C.Model-based clustering for multivariate partial ranking data. J Stat Plan Inference. 2014;149:201-217. doi: 10.1016/j.jspi.2014.02.011[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1285.62069
[37] Qian Z. rankdist: Distance based ranking models; 2019. R package version 1.1.4; Available from: https://CRAN.R-project.org/package=rankdist. [Google Scholar]
[38] Gregory E. RMallow: Fit multi-modal mallows’ models to ranking data.; 2012. R package version 1.0. [Google Scholar]
[39] Murphy TB, Martin D.Mixtures of distance-based models for ranking data. Comput Stat Data Anal. 2003;41(3-4):645-655. doi: 10.1016/S0167-9473(02)00165-2[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1429.62258
[40] Yu PLH.Bayesian analysis of order-statistics models for ranking data. Psychometrika. 2000;65(3):281-299. doi: 10.1007/BF02296147[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1291.62265
[41] Soufiani HA, Chen W. StatRank: statistical rank aggregation: inference, evaluation, and visualization; 2015. R package version 0.0.6; Available from: https://CRAN.R-project.org/package=StatRank. [Google Scholar]
[42] Soufiani HA. Revisiting random utility models [Dissertation]. The School of Engineering and Applied Sciences, Harvard University; 2014. [Google Scholar]
[43] Luce RD. Individual choice behavior: a theoretical analysis. New York: John Wiley & Sons; 1959. [Google Scholar] · Zbl 0093.31708
[44] Plackett RL.The analysis of permutations. J Roy Stat Soc Ser C (Appl Stat). 1975;24(2):193-202. [Google Scholar]
[45] Mollica C, Tardella L.Bayesian mixture of Plackett-Luce models for partially ranked data. Psychometrika. 2017;82(2):442-458. doi: 10.1007/s11336-016-9530-0[Crossref], [PubMed], [Web of Science ®], [Google Scholar] · Zbl 1402.62045
[46] Dempster AP, Laird NM, Rubin DB.Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc Ser B (Stat Methodol). 1977;39(1):1-38. With discussion. [Google Scholar] · Zbl 0364.62022
[47] Hunter DR.MM algorithms for generalized Bradley-Terry models. Ann Statist. 2004;32(1):384-406. doi: 10.1214/aos/1079120141[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1105.62359
[48] Marin JM, Mengersen K, Robert CP.Bayesian modelling and inference on mixtures of distributions. Handbook Stat. 2005;25:459-507. doi: 10.1016/S0169-7161(05)25016-2[Crossref], [Google Scholar]
[49] Papastamoulis P. label.switching: Relabelling MCMC outputs of mixture models; 2019. R package version 1.8; Available from: https://CRAN.R-project.org/package=label.switching. [Google Scholar]
[50] Spiegelhalter DJ, Best NG, Carlin BP, et al. Bayesian measures of model complexity and fit. J Roy Stat Soc Ser B (Stat Methodol). 2002;64(4):583-639. doi: 10.1111/1467-9868.00353[Crossref], [Google Scholar] · Zbl 1067.62010
[51] Ando T.Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models. Biometrika. 2007;94(2):443-458. doi: 10.1093/biomet/asm017[Crossref], [Web of Science ®], [Google Scholar] · Zbl 1132.62005
[52] Raftery AE, Satagopan M Jaya, Newton MA, et al. Bayesian statistics 8. In: Bernardo J, Bayarri M, Berger J, et al., editors. Proceedings of the eighth Valencia International Meeting, June 2-6, 2006. Oxford University Press; 2007. p. 371-416. [Google Scholar]
[53] Cohen A, Mallows CL.Assessing goodness of fit of ranking models to data. J Roy Stat Soc Ser D (The Statistician). 1983;32(4):361-374. [Google Scholar]
[54] Gelman A, Meng XL, Stern H.Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica. 1996;6(4):733-760. [Web of Science ®], [Google Scholar] · Zbl 0859.62028
[55] Mollica C, Tardella L.Epitope profiling via mixture modeling of ranked data. Stat Med. 2014;33(21):3738-3758. doi: 10.1002/sim.6224[Crossref], [PubMed], [Web of Science ®], [Google Scholar]
[56] Fienberg SE, Larntz K.Log linear representation for paired and multiple comparisons models. Biometrika. 1976;63(2):245-254. doi: 10.1093/biomet/63.2.245[Crossref], [Web of Science ®], [Google Scholar] · Zbl 0339.62051
[57] Mollica C, Tardella L. Bayesian analysis of ranking data with the Extended Plackett-Luce model. (submitted). 2019. [Google Scholar]
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.