×

Regression trees for detecting preference patterns from rank data. (English) Zbl 1459.62232

Summary: A regression tree method for analyzing rank data is proposed. A key ingredient of the methodology is to convert ranks into scores by paired comparison. We then utilize the GUIDE tree method on the score vectors to identify the preference patterns in the data. This method is exempt from selection bias and the simulation results show that it is good with respect to the selection of split variables and has a better prediction accuracy than the two other investigated methods in some cases. Furthermore, it is applicable to complex data which may contain incomplete ranks and missing covariate values. We demonstrate its usefulness in two real data studies.

MSC:

62R07 Statistical aspects of big data and data science
62G08 Nonparametric regression and quantile regression
62J15 Paired and multiple comparisons; multiple testing
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62-08 Computational methods for problems pertaining to statistics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Alvo M, Yu PLH (2014) Statistical methods for ranking data. Springer, New York · Zbl 1341.62001 · doi:10.1007/978-1-4939-1471-5
[2] Bradley RA, Terry ME (1952) Rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika 39:324-345 · Zbl 0047.12903
[3] Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont · Zbl 0541.62042
[4] Cattelan M (2012) Models for paired comparison data: a review with emphasis on dependent data. Stat Sci 27:412-433 · Zbl 1331.62368 · doi:10.1214/12-STS396
[5] Cheng W, Hühn J, Hüllermeier E (2009) Decision tree and instance-based learning for label ranking. In: International conference on machine learning, Montreal
[6] Critchlow DE (1985) Metric methods for analyzing partially ranked data. Springer, New York · Zbl 0589.62041 · doi:10.1007/978-1-4612-1106-8
[7] D’Ambrosio A, Heiser WJ (2016) A recursive partitioning method for the prediction of preference rankings based upon Kemeny distances. Psychometrika 81:774-794 · Zbl 1345.62144 · doi:10.1007/s11336-016-9505-1
[8] Davidson RR (1970) On extending the Bradley-Terry model to accommodate ties in paired comparison experiments. J Am Stat Assoc 65:317-328 · doi:10.1080/01621459.1970.10481082
[9] De’ath G (2002) Multivariate regression trees: a new technique for modeling species-environment relationships. Ecology 83:1105-1117
[10] Diaconis P (1988) Group representations in probability and statistics. Institute of Mathematical Statistics, Hayward · Zbl 0695.60012
[11] Emond EJ, Mason DW (2002) A new rank correlation coefficient with application to the consensus ranking problem. J Multi-Criteria Decis Anal 11:17-28 · Zbl 1038.62052 · doi:10.1002/mcda.313
[12] Francis B, Dittrich R, Hatzinger R, Penn R (2002) Analysing partial ranks by using smoothed paired comparison methods: an investigation of value orientation in Europe. J R Stat Soc Ser C (Appl Stat) 51:319-336 · Zbl 1111.62383 · doi:10.1111/1467-9876.00271
[13] Francis B, Dittrich R, Hatzinger R, Humphreys L (2014) A mixture model for longitudinal partially ranked data. Commun Stat Theory Methods 43:722-734 · Zbl 1287.62014 · doi:10.1080/03610926.2013.815779
[14] Hatzinger R, Dittrich R (2012) prefmod: an R package for modeling preferences based on paired comparisons, rankings, or ratings. J Stat Softw 48:1-31 · doi:10.18637/jss.v048.i10
[15] Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15:651-674 · doi:10.1198/106186006X133933
[16] Hsiao WC, Shih YS (2007) Splitting variable selection for multivariate regression trees. Stat Probab Lett 77:265-271 · Zbl 1106.62075 · doi:10.1016/j.spl.2006.08.014
[17] Inglehart R (1977) The silent revolution: changing values and political styles among western publics. Princeton University Press, Princeton
[18] Kass GV (1980) An exploratory technique for investigating large quantities of categorical data. Appl Stat 29:119-127 · doi:10.2307/2986296
[19] Kemeny JG, Snell JL (eds) (1962) Preference rankings: an axiomatic approach. In: Mathematical models in the social sciences. The MIT press, Cambridge, pp 9-23
[20] Kung YH, Lin CT, Shih YS (2012) Split variable selection for tree modeling on rank data. Comput Stat Data Anal 56:2830-2836 · Zbl 1255.62208 · doi:10.1016/j.csda.2012.03.004
[21] Lee PH, Yu PLH (2010) Distance-based tree models for ranking data. Comput Stat Data Anal 54:1672-1682 · Zbl 1284.62055 · doi:10.1016/j.csda.2010.01.027
[22] Liu KH, Shih YS (2016) Score-scale decision tree for paired comparison data. Statistica Sinica 26:429-444 · Zbl 1391.62147
[23] Loh WY (2014) Fifty years of classification and regression trees (with discussion). Int Stat Rev 34:329-370 · Zbl 1416.62347 · doi:10.1111/insr.12016
[24] Loh WY, Zheng W (2013) Regression trees for longitudinal and multiresponse data. Ann Appl Stat 7:495-522 · Zbl 1454.62198 · doi:10.1214/12-AOAS596
[25] Marden JI (1995) Analyzing and modeling rank data. Chapman & Hall, London · Zbl 0853.62006
[26] Qinglong L (2015) StatMethRank: statistical methods for ranking data. R package version 1.3
[27] Strobl C, Wickelmaier F, Zeileis A (2011) Accounting for individual differences in Bradley-Terry models by means of recursive partitioning. J Educ Behav Stat 36:135-153 · doi:10.3102/1076998609359791
[28] Vermunt J (2003) Multilevel latent class models. Sociol Methodol 33:213-239 · doi:10.1111/j.0081-1750.2003.t01-1-00131.x
[29] Yandell BS (1997) Practical data analysis for designed experiments. Chapman & Hall, Boca Raton · Zbl 1056.62500 · doi:10.1007/978-1-4899-3035-4
[30] Yu, PLH; Wan, WM; Lee, PH; Fürnkranz, J. (ed.); Hüllermeier, E. (ed.), Decision tree modeling for ranking data, 83-106 (2010), New York · Zbl 1214.68310 · doi:10.1007/978-3-642-14125-6_5
[31] Yu PLH, Lee PH, Cheung SF, Lau EYY, Mok DSY, Hui HC (2016) Logit tree models for discrete choice data with application to advice-seeking preferences among Chinese Christians. Comput Stat 31:799-827 · Zbl 1342.65074 · doi:10.1007/s00180-015-0588-4
[32] Zeileis A, Hornik K (2007) Generalized M-fluctuation tests for parameter instability. Statistica Neerlandica 61:488-508 · Zbl 1152.62014 · doi:10.1111/j.1467-9574.2007.00371.x
[33] Zeileis A, Hothorn T, Hornik K (2008) Model-based recursive partitioning. J Comput Graph Stat 17:492-514 · doi:10.1198/106186008X319331
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.