Generalised joint regression for count data: a penalty extension for competitive settings. (English) Zbl 1452.62488

Summary: We propose a versatile joint regression framework for count responses. The method is implemented in the R add-on package GJRM and allows for modelling linear and non-linear dependence through the use of several copulae. Moreover, the parameters of the marginal distributions of the count responses and of the copula can be specified as flexible functions of covariates. Motivated by competitive settings, we also discuss an extension which forces the regression coefficients of the marginal (linear) predictors to be equal via a suitable penalisation. Model fitting is based on a trust region algorithm which estimates simultaneously all the parameters of the joint models. We investigate the proposal’s empirical performance in two simulation studies, the first one designed for arbitrary count data, the other one reflecting competitive settings. Finally, the method is applied to football data, showing its benefits compared to the standard approach with regard to predictive performance.


62J02 General nonlinear regression
62H05 Characterization and structure theory for multivariate probability distributions; copulas
62P99 Applications of statistics
Full Text: DOI arXiv


[1] Akaike, H.: Information theory and the extension of the maximum likelihood principle. Second International Symposium on Information Theory. pp. 267-281, Springer, New York (1973) · Zbl 0283.62006
[2] Boshnakov, G.; Kharrat, T.; McHale, IG, A bivariate Weibull count model for forecasting association football scores, Int. J. Forecast. (2017)
[3] Bühlmann, P.; Hothorn, T., Boosting algorithms: regularization, prediction and model fitting, Stat. Sci., 22, 477-505 (2007) · Zbl 1246.62163
[4] Dixon, MJ; Coles, SG, Modelling association football scores and inefficiencies in the football betting market, J. R. Stat. Soc. Ser. C (Appl. Stat.), 46, 2, 265-280 (1997)
[5] Dyte, D.; Clarke, SR, A ratings based Poisson model for World Cup soccer simulation, J. Oper. Res. Soc., 51, 8, 993-998 (2000) · Zbl 1107.62383
[6] Fang, Y.; Madsen, L.; Liu, L., Comparison of two methods to check copula fitting, Int. J. Appl. Math., 44, 1, 53-61 (2014)
[7] Faugeras, OP, Inference for copula modeling of discrete data: a cautionary tale and some facts, Depend. Model., 5, 1, 121-132 (2017) · Zbl 1404.62063
[8] Friedman, J.; Hastie, T.; Tibshirani, R., Regularization paths for generalized linear models via coordinate descent, J. Stat. Softw., 33, 1, 1 (2010)
[9] Geyer, C.J.: Trust: trust region optimization. https://CRAN.R-project.org/package=trust. R package version 0.1-7 (2015)
[10] Groll, A.; Kneib, T.; Mayr, A.; Schauberger, G., On the dependency of soccer scores - a sparse bivariate Poisson model for the UEFA European football championship 2016, J. Quant. Anal. Sports, 14, 2, 65-79 (2018)
[11] Groll, A.; Ley, C.; Schauberger, G.; Van Eetvelde, H., A hybrid random forest to predict soccer matches in international tournaments, J. Quant. Anal. Sports., 15, 4, 271-287 (2019)
[12] Groll, A.; Schauberger, G.; Tutz, G., Prediction of major international soccer tournaments based on team-specific regularized Poisson regression: an application to the FIFA World Cup 2014, J. Quant. Anal. Sports, 11, 2, 97-115 (2015)
[13] Hofert, M., Kojadinovic, I., Maechler, M., Yan, J.: copula: Multivariate Dependence with Copulas. https://CRAN.R-project.org/package=copula. R package version 0.999-18 (2017)
[14] Hofert, M., Mächler, M., McNeil, A.J.: Estimators for Archimedean copulas in high dimensions. arXiv preprint arXiv:1207.1708 (2012) · Zbl 1244.62073
[15] Hothorn, T.; Bühlmann, P.; Kneib, T.; Schmid, M.; Hofner, B., Model-based boosting 2.0, J. Mach. Learn. Res., 11, 2109-2113 (2010) · Zbl 1242.68002
[16] Karlis, D.; Ntzoufras, I., Analysis of sports data by using bivariate poisson models, The Statistician, 52, 381-393 (2003)
[17] Kelly, JL, A new interpretation of information rate, Bell Syst. Tech. J., 35, 4, 917-926 (1956)
[18] Koopman, SJ; Lit, R., A dynamic bivariate Poisson model for analysing and forecasting match results in the English Premier League, J. R. Stat. Soc. Ser. A (Stat. Soc.), 178, 1, 167-186 (2015)
[19] Lee, AJ, Modeling scores in the Premier League: is Manchester United really the best?, Chance, 10, 15-19 (1997)
[20] Lindskog, F., Mcneil, A., Schmock, U.: Kendall’s tau for elliptical distributions. In: Credit Risk, pp. 149-156. Springer, Berlin (2003)
[21] Marra, G.; Radice, R., Bivariate copula additive models for location, scale and shape, Comput. Stat. Data Anal., 112, 99-113 (2017) · Zbl 1464.62127
[22] Marra, G., Radice, R.: GJRM: generalised joint regression modelling. R package version 0.2 (2019)
[23] Marra, G.; Radice, R., Copula link-based additivemodels for right-censored event time data, J. Am. Stat. Assoc., 115, 530, 886-895 (2020)
[24] McHale, I.; Scarf, P., Modelling soccer matches using bivariate discrete distributions with general dependence structure, Stat. Neerl., 61, 4, 432-445 (2007) · Zbl 1149.62338
[25] Nikoloulopoulos, AK; Karlis, D., Regression in a copula model for bivariate count data, J. Appl. Stat., 37, 1555-1568 (2010) · Zbl 07252530
[26] R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ (2019)
[27] Rigby, RA; Stasinopoulos, DM, Generalized additive models for location, scale and shape, J. R. Stat. Soc. Ser. C, 54, 507-554 (2005) · Zbl 05188697
[28] Schauberger, G.; Groll, A., Predicting matches in international football tournaments with random forests, Stat. Model., 18, 5-6, 1-23 (2018)
[29] Schweizer, B., Sklar, A.: Probabilistic Metric Spaces, North-Holland Series in Probability and Applied Mathematics (1983)
[30] Tibshirani, R., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. B, 58, 267-288 (1996) · Zbl 0850.62538
[31] Trivedi, P.; Zimmer, D., A note on identification of bivariate copulas for discrete count data, Econometrics, 5, 1, 10 (2017)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.