×

How often does the best team win? A unified approach to understanding randomness in North American sport. (English) Zbl 1412.62216

Summary: Statistical applications in sports have long centered on how to best separate signal (e.g., team talent) from random noise. However, most of this work has concentrated on a single sport, and the development of meaningful cross-sport comparisons has been impeded by the difficulty of translating luck from one sport to another. In this manuscript we develop Bayesian state-space models using betting market data that can be uniformly applied across sporting organizations to better understand the role of randomness in game outcomes. These models can be used to extract estimates of team strength, the between-season, within-season and game-to-game variability of team strengths, as well each team’s home advantage. We implement our approach across a decade of play in each of the National Football League (NFL), National Hockey League (NHL), National Basketball Association (NBA) and Major League Baseball (MLB), finding that the NBA demonstrates both the largest dispersion in talent and the largest home advantage, while the NHL and MLB stand out for their relative randomness in game outcomes. We conclude by proposing new metrics for judging competitiveness across sports leagues, both within the regular season and using traditional postseason tournament formats. Although we focus on sports, we discuss a number of other situations in which our generalizable models might be usefully applied.

MSC:

62P99 Applications of statistics
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)

Software:

BayesDA; R
PDF BibTeX XML Cite
Full Text: DOI arXiv Euclid

References:

[1] Baker, R. D. and McHale, I. G. (2015). Time varying ratings in association football: The all-time greatest team is … J. Roy. Statist. Soc. Ser. A178 481–492.
[2] Berri, D. (2014). Noll-scully. Available at http://wagesofwins.com/noll-scully/. Accessed May 19, 2016.
[3] Berri, D. J. and Schmidt, M. B. (2006). On the road with the National Basketball Association’s superstar externality. J. Sports Econ.7 347–358.
[4] Boulier, B. L. and Stekler, H. O. (2003). Predicting the outcomes of National Football League games. Int. J. Forecast.19 257–270.
[5] Bradley, R. A. and Terry, M. E. (1952). Rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika39 324–345. · Zbl 0047.12903
[6] Buttrey, S. E. (2016). Beating the market betting on NHL hockey games. J. Quant. Anal. Sports12 87–98.
[7] Carlin, B. P. (1996). Improved NCAA basketball tournament modeling via point spread and team strength information. Amer. Statist.50 39–43.
[8] Cattelan, M., Varin, C. and Firth, D. (2013). Dynamic Bradley–Terry modelling of sports tournaments. J. R. Stat. Soc. Ser. C. Appl. Stat.62 135–150.
[9] CFP (2014). Bowl Championship Series explained. Available at http://www.collegefootballpoll.com/bcs_explained.html. Accessed May 19, 2016.
[10] Colquitt, L. L., Godwin, N. H. and Caudill, S. B. (2001). Testing efficiency across markets: Evidence from the NCAA basketball betting market. J. Bus. Finance Account.28 231–248.
[11] Crabtree, C. (2014). NFL wary of putting Seahawks home games in prime time. Available at http://profootballtalk.nbcsports.com/2014/04/24/nfl-wary-of-putting-seahawks-home-games-in-prime-time-due-to-recent-blowouts/. Accessed October 19, 2016.
[12] Crooker, J. R. and Fenn, A. J. (2007). Sports leagues and parity when league parity generates fan enthusiasm. J. Sports Econ.8 139–164.
[13] Del Moral, P., Doucet, A. and Jasra, A. (2006). Sequential Monte Carlo samplers. J. R. Stat. Soc. Ser. B. Stat. Methodol.68 411–436. · Zbl 1105.62034
[14] Demers, S. (2015). Riding a probabilistic support vector machine to the Stanley Cup. J. Quant. Anal. Sports11 205–218.
[15] Elo, A. E. (1978). The Rating of Chessplayers, Past and Present. Arco Publishing, New York.
[16] Fahrmeir, L. and Tutz, G. (1994). Dynamic stochastic models for time-dependent ordered paired comparison systems. J. Amer. Statist. Assoc.89 1438–1449. · Zbl 0809.62088
[17] Firth, D. (2017). Fair standings in soccer and other round-robin leagues. In New England Symposium on Statistics in Sports.
[18] Gandar, J., Zuber, R., O’brien, T. and Russo, B. (1988). Testing rationality in the point spread betting market. J. Finance43 995–1008.
[19] Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal.1 515–533. · Zbl 1331.62139
[20] Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. and Rubin, D. B. (2014). Bayesian Data Analysis, 3rd ed. CRC Press, Boca Raton, FL. · Zbl 1279.62004
[21] Gilks, W. R. and Berzuini, C. (2001). Following a moving target—Monte Carlo inference for dynamic Bayesian models. J. R. Stat. Soc. Ser. B. Stat. Methodol.63 127–146. · Zbl 0976.62021
[22] Glickman, M. E. (1995). A comprehensive guide to chess ratings. Am. Chess J.3 59–102.
[23] Glickman, M. E. (2001). Dynamic paired comparison models with stochastic variances. J. Appl. Stat.28 673–689. · Zbl 0991.62048
[24] Glickman, M. E. and Stern, H. S. (1998). A state-space model for National Football League scores. J. Amer. Statist. Assoc.93 25–35. · Zbl 0915.62078
[25] Glickman, M. E. and Stern, H. S. (2016). Estimating team strength in the NFL. In Handbook of Statistical Methods and Analyses in Sports (J. Albert, M. E. Glickman, T. B. Swartz and R. H. Koning, eds.) 5, 113–135. Chapman and Hall/CRC Press, Boca Raton, FL.
[26] Harville, D. (1980). Predictions for National Football League games via linear-model methodology. J. Amer. Statist. Assoc.75 516–524.
[27] Humphreys, B. R. (2002). Alternative measures of competitive balance in sports leagues. J. Sports Econ.3 133–148.
[28] James, B., Albert, J. and Stern, H. S. (1993). Answering questions about baseball using statistics. Chance6 17–30.
[29] Knorr-Held, L. (2000). Dynamic rating of sports teams. J. R. Stat. Soc. Ser. D Stat.49 261–276.
[30] Knowles, G., Sherony, K. and Haupert, M. (1992). The demand for Major League Baseball: A test of the uncertainty of outcome hypothesis. Am. Econ.36 72–80.
[31] Koopmeiners, J. S. (2012). A comparison of the autocorrelation and variance of NFL team strengths over time using a Bayesian state-space model. J. Quant. Anal. Sports8 1–19.
[32] Lacey, N. J. (1990). An estimation of market efficiency in the NFL point spread betting market. Appl. Econ.22 117–129.
[33] Lee, Y. H. and Fort, R. (2008). Attendance and the uncertainty-of-outcome hypothesis in baseball. Rev. Ind. Organ.33 281–295.
[34] Leeds, M. and Von Allmen, P. (2004). The economics of sports. Bus. Sports 361–366.
[35] Lenten, L. J. (2015). Measurement of competitive balance in conference and divisional tournament design. J. Sports Econ.16 3–25.
[36] Loeffelholz, B., Bednar, E. and Bauer, K. W. (2009). Predicting NBA games using neural networks. J. Quant. Anal. Sports5 Art. 7, 17.
[37] Lopez, M. J. (2013). Inefficiencies in the national hockey league points system and the teams that take advantage. J. Sports Econ.16 410–424.
[38] Lopez, M. J. (2016). The making and comparison of draft curves. Available at https://statsbylopez.com/2016/06/22/the-making-and-comparison-of-draft-curves/. Accessed October 19, 2016.
[39] Lopez, M. J. and Matthews, G. J. (2015). Building an NCAA men’s basketball predictive model and quantifying its success. J. Quant. Anal. Sports11 5–12.
[40] Lopez, M. J., Matthews, G. J. and Baumer, B. S. (2018). Supplement to “How often does the best team win? A unified approach to understanding randomness in North American sport.” DOI:10.1214/18-AOAS1165SUPP.
[41] Lopez, M. J. and Schuckers, M. (2017). Predicting coin flips: Using resampling and hierarchical models to help untangle the NHL’s shoot-out. J. Sports Sci.35 888–897.
[42] Manner, H. (2016). Modeling and forecasting the outcomes of NBA basketball games. J. Quant. Anal. Sports12 31–41.
[43] Miljković, D., Gajić, L., Kovačević, A. and Konjović, Z. (2010). The use of data mining for basketball matches outcomes prediction. In IEEE 8th International Symposium on Intelligent Systems and Informatics 309–312. IEEE, New York.
[44] Moskowitz, T. and Wertheim, L. J. (2011). Scorecasting: The Hidden Influences Behind How Sports Are Played and Games Are Won. Crown Archetype, New York, NY.
[45] Mullet, G. M. (1977). Simeon Poisson and the National Hockey League. Amer. Statist.31 8–12.
[46] Nichols, M. W. (2012). The impact of visiting team travel on game outcome and biases in NFL betting markets. J. Sports Econ.15 78–96.
[47] Noll, R. G. (1991). Professional basketball: Economic and business perspectives. In The Business of Professional Sports (J. A. Mangan and P. D. Staudohar, eds.) 18–47. Univ. Illinois Press, Urbana, IL.
[48] Owen, P. D. (2010). Limitations of the relative standard deviation of win percentages for measuring competitive balance in sports leagues. Econom. Lett.109 38–41.
[49] Owen, A. (2011). Dynamic Bayesian forecasting models of football match outcomes with estimation of the evolution variance parameter. IMA J. Manag. Math.22 99–113.
[50] Owen, P. D. and King, N. (2015). Competitive balance measures in sports leagues: The effects of variation in season length. Econ. Inq.53 731–744.
[51] Owen, P. D., Ryan, M. and Weatherston, C. R. (2007). Measuring competitive balance in professional team sports using the Herfindahl–Hirschman index. Rev. Ind. Organ.31 289–302.
[52] Paine, N. (2013). Analyzing real home court advantage. Available at http://insider.espn.com/nba/insider/story/_/id/9014283/nba-analyzing-real-home-court-advantage-utah-jazz-denver-nuggets. Accessed October 19, 2016.
[53] Paul, R. J. and Weinbach, A. P. (2014). Market efficiency and behavioral biases in the wnba betting market. Int. J. Financial Stud.2 193–202.
[54] R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
[55] Rottenberg, S. (1956). The baseball players’ labor market. J. Polit. Econ.64 242–258.
[56] Scully, G. W. (1989). The Business of Major League Baseball. Univ. Chicago Press, Chicago, IL.
[57] Soebbing, B. P. and Humphreys, B. R. (2013). Do gamblers think that teams tank? Evidence from the NBA. Contemp. Econ. Policy31 301–313.
[58] Spann, M. and Skiera, B. (2009). Sports forecasting: A comparison of the forecast accuracy of prediction markets, betting odds and tipsters. J. Forecast.28 55–72.
[59] Spiegelhalter, D. J. (1986). Probabilistic prediction in patient management and clinical trials. Stat. Med.5 421–433.
[60] Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B. Stat. Methodol.64 583–639. · Zbl 1067.62010
[61] Stern, H. (1991). On the probability of winning a football game. Amer. Statist.45 179–183.
[62] Thomas, A. C. (2007). Inter-arrival times of goals in ice hockey. J. Quant. Anal. Sports3 Art. 5, 17.
[63] Tutz, G. and Schauberger, G. (2015). Extended ordered paired comparison models with application to football data from German Bundesliga. AStA Adv. Stat. Anal.99 209–227. · Zbl 1443.62537
[64] Utt, J. and Fort, R. (2002). Pitfalls to measuring competitive balance with Gini coefficients. J. Sports Econ.3 367–373.
[65] Wolfson, J., Koopmeiners, J. S. and DiLernia, A. (2018). Who’s good this year? Comparing the information content of games in the four major US sports. J. Sports Anal.4 153–163.
[66] Yang, T. Y. and Swartz, T. (2004). A two-stage Bayesian model for predicting winners in major league baseball. J. Data Sci.2 61–73.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.