Outline analyses of the called strike zone in major league Baseball. (English) Zbl 1435.62445

Summary: We extend statistical shape analytic methods known as outline analysis for application to the strike zone, a central feature of the game of baseball. Although the strike zone is rigorously defined by Major League Baseball’s official rules, umpires make mistakes in calling pitches as strikes (and balls) and may even adhere to a strike zone somewhat different than that prescribed by the rule book. Our methods yield inference on geometric attributes (centroid, dimensions, orientation and shape) of this “called strike zone” (CSZ) and on the effects that years, umpires, player attributes, game situation factors and their interactions have on those attributes. The methodology consists of first using kernel discriminant analysis to determine a noisy outline representing the CSZ corresponding to each factor combination, then fitting existing elliptic Fourier and new generalized superelliptic models for closed curves to that outline and finally analyzing the fitted model coefficients using standard methods of regression analysis, factorial analysis of variance and variance component estimation. We apply these methods to PITCHf/x data comprising more than three million called pitches from the 2008–2016 Major League Baseball seasons to address numerous questions about the CSZ. We find that all geometric attributes of the CSZ, except its size, became significantly more like those of the rule-book strike zone from 2008-2016 and that several player attribute/game situation factors had statistically and practically significant effects on many of them. We also establish that the variation in the horizontal center, width and area of an individual umpire’s CSZ from pitch to pitch is smaller than their variation among CSZs from different umpires.


62P99 Applications of statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62H25 Factor analysis and principal components; correspondence analysis


Momocs; XML2R; pitchRx; R
Full Text: DOI Euclid


[1] Ahn, S. J. (2004). Least Squares Orthogonal Distance Fitting of Curves and Surfaces in Space 3151. Springer, Berlin.
[2] Belongie, S., Malik, J. and Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24 509-522.
[3] Bland, M. (2000). An Introduction to Medical Statistics, 3rd ed. Oxford Univ. Press, Oxford. · Zbl 0663.62007
[4] Bookstein, F. L. (1997). Morphometric Tools for Landmark Data. Cambridge Univ. Press, Cambridge. · Zbl 0770.92001
[5] Brooks, D., Pavlidis, H. and Judge, J. (2015). Moving beyond WOWY: A mixed approach to measuring catcher framing. Available at https://www.baseballprospectus.com/news/article/25514/moving-beyond-wowy-a-mixed-approach-to-measuring-catcher-framing/.
[6] Carruth, M. (2012). The strike zone. Available at https://www.lookoutlanding.com/2012/10/29/3561060/the-strike-zone.
[7] Claude, J. (2008). Morphometrics with R. Springer, New York. · Zbl 1166.62081
[8] del Castillo, E. and Colosimo, B. M. (2011). Statistical shape analysis of experiments for manufacturing processes. Technometrics 53 1-15.
[9] Deshpande, S. K. and Wyner, A. (2017). A hierarchical Bayesian model of pitch framing. J. Quant. Anal. Sports 13 95-112.
[10] Dryden, I. L. and Mardia, K. V. (1998). Statistical Shape Analysis. Wiley Series in Probability and Statistics: Probability and Statistics. Wiley, Chichester. · Zbl 0901.62072
[11] Duong, T. (2007). ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R. J. Stat. Softw. 21 1-16.
[12] Fast, M. (2011a). Spinning yarn: The real strike zone. Available at http://www.baseballprospectus.com/article.php?articleid=12965.
[13] Fast, M. (2011b). Spinning yarn: The real strike zone, part 2. Available at http://www.baseballprospectus.com/article.php?articleid=14098.
[14] Gardiner, M. (1965). The superellipse: A curve that lies between the ellipse and the rectangle. Scientific American 213 222-234.
[15] Green, E. and Daniels, D. P. (2014). What does it take to call a strike? Three biases in umpire decision making. Available at http://www.sloansportsconference.com/wp-content/uploads/2014/02/2014_SSAC_What-Does-it-Take-to-Call-a-Strike.pdf.
[16] Green, E. and Daniels, D. P. (2015). Impact aversion in arbitrator decisions. Available at https://dx.doi.org/10.2139/ssrn.2391558.
[17] Green, E. and Daniels, D. P. (2017). Bayesian instinct. Available at https://site.stanford.edu/sites/default/files/3624-ssrn-id2916929_2.pdf.
[18] Hall, P. and Kang, K.-H. (2005). Bandwidth choice for nonparametric classification. Ann. Statist. 33 284-306. · Zbl 1064.62075
[19] Jäger, J. M. and Schöllhorn, W. I. (2012). Identifying individuality and variability in team tactics by means of statistical shape analysis and multilayer perceptrons. Human Movement Science 31 303-317.
[20] Kagan, D. (2009). The anatomy of a pitch: Doing physics with PITCHf/x data. The Physics Teacher 47 412-416.
[21] Kim, J. W. and King, B. G. (2014). Seeing stars: Matthew effects and status bias in Major League Baseball umpiring. Manage. Sci. 60 2619-2644.
[22] Klassen, E., Srivastava, A., Mio, M. and Joshi, S. H. (2004). Analysis of planar shapes using geodesic paths on shape spaces. IEEE Trans. Pattern Anal. Mach. Intell. 26 372-383.
[23] Kuhl, F. P. and Giardina, C. R. (1982). Elliptic Fourier features of a closed contour. Computer Graphics and Image Processing 18 236-258.
[24] Kurtek, S., Srivastava, A., Klassen, E. and Ding, Z. (2012). Statistical modeling of curves using shapes and related features. J. Amer. Statist. Assoc. 107 1152-1165. · Zbl 1443.62389
[25] Lenth, R. V. (1989). Quick and easy analysis of unreplicated factorials. Technometrics 31 469-473.
[26] Lindbergh, B. (2015). The strike zone time travel test: How much of baseball’s anemic offense really stems from the swollen zone? Available at http://grantland.com/the-triangle/mlb-larger-strike-zone-declining-offense/.
[27] Lopez, M. and Mills, B. (2018). Everyone wants to go home during extra innings—maybe even the umps. Available at https://fivethirtyeight.com/features/everyone-wants-to-go-home-during-extra-innings-maybe-even-the-umps/.
[28] Lu, T.-F. C., Graybill, F. A. and Burdick, R. K. (1987). Confidence intervals on the ratio of expected mean squares \((\theta_1+d\theta_2)/\theta_3\). Biometrics 43 535-543.
[29] Marchi, M. and Albert, J. (2014). Analyzing Baseball Data with R. CRC Press, Boca Raton, FL.
[30] Mills, B. M. (2014). Social pressure at the plate: Inequality aversion, status, and mere exposure. Managerial and Decision Economics 35 387-403.
[31] Mills, B. M. (2016a). Policy changes in Major League Baseball: Improved agent behavior and ancillary productivity outcomes. Economic Inquiry 55 1104-1118.
[32] Mills, B. M. (2016b). Are the umpires at it again? Available at https://www.fangraphs.com/tht/are-the-umpires-at-it-again/.
[33] Mills, B. M. (2017). Technological innovations in monitoring and evaluation: Evidence of performance impacts among Major League Baseball umpires. Labour Econ. 46 189-199.
[34] Molyneux, G. (2016). Prospectus feature: Umpires aren’t compassionate; they’re Bayesian. Available at https://www.baseballprospectus.com/news/article/28513/prospectus-feature-umpires-arent-compassionate-theyre-bayesian/.
[35] Moskowitz, T. J. and Wertheim, L. J. (2011). Scorecasting: The Hidden Influences Behind How Sports Are Played and Games Are Won. Random House, New York.
[36] Parsons, C. A., Sulaeman, J., Yates, M. C. and Hamermesh, D. S. (2011). Strike three: Discrimination, incentives, and evaluation. Am. Econ. Rev. 101 1410-35.
[37] Passan, J. (2015). Sources: MLB could alter strike zone as response to declining offense. Available at https://sports.yahoo.com/news/sources-mlb-could-alter-strike-zone-as-response-to-declining-offense-232940947.html.
[38] Roegele, J. (2013a). A simple strike zone formula and calculating plate discipline stats. Available at https://www.beyondtheboxscore.com/2013/8/5/4576622/simple-strike-zone-formula-calculating-plate-discipline-stats-pitchfx-sabermetrics.
[39] Roegele, J. (2013b). Investigating the “Lefty Strike.” Available at https://www.beyondtheboxscore.com/2013/6/7/4391656/investigating-the-lefty-strike-pitchfx-sabermetrics.
[40] Roegele, J. (2013c). The living strike zone. Available at https://www.baseballprospectus.com/news/article/21262/baseball-proguestus-the-living-strike-zone/.
[41] Roegele, J. (2014a). The strike zone during the PITCHf/x era. Available at https://www.fangraphs.com/tht/the-strike-zone-during-the-pitchfx-era/.
[42] Roegele, J. (2014b). The strike zone expansion is out of control. Available at https://www.fangraphs.com/tht/the-strike-zone-expansion-is-out-of-control/.
[43] Roegele, J. (2015). The 2015 strike zone. Available at https://www.fangraphs.com/tht/the-2015-strike-zone/.
[44] Roegele, J. (2016). The 2016 strike zone. Available at https://www.fangraphs.com/tht/the-2016-strike-zone/.
[45] Rosin, P. L. (2000). Fitting superellipses. IEEE Trans. Pattern Anal. Mach. Intell. 22 726-732.
[46] Sievert, C. (2014). Taming PITCHf/x data with XML2R and pitchRx. R J. 6 5-19.
[47] Sievert, C. (2015). pitchRx: Tools for harnessing MLBAM Gameday data and visualizing PITCHf/x. R package version 1.7.
[48] Srivastava, A. and Klassen, E. P. (2016). Functional and Shape Data Analysis. Springer Series in Statistics. Springer, New York. · Zbl 1376.62003
[49] Srivastava, A., Joshi, S. H., Mio, W. and Liu, X. (2005). Statistical shape analysis: Clustering, learning, and testing. IEEE Trans. Pattern Anal. Mach. Intell. 27 590-602.
[50] Tainsky, S., Mills, B. M. and Winfree, J. A. (2015). Further examination of potential discrimination among MLB umpires. Journal of Sports Economics 16 353-374.
[51] Teutsch, C., Berndt, D., Trostmann, E. and Weber, M. (2013). Real-time detection of elliptic shapes for automated object recognition and object tracking. In Proceedings of SPIE—The International Society for Optical Engineering.
[52] Vacca, J. R. (2007). Biometric Technologies and Verification Systems. Elsevier, Amsterdam.
[53] Walsh, J. (2010). The compassionate umpire. Available at https://www.fangraphs.com/tht/the-compassionate-umpire/.
[54] Wand, M. P. and Jones, M. C. (1994). Multivariate plug-in bandwidth selection. Comput. Statist. 9 97-116. · Zbl 0937.62055
[55] Wasserman, L. (2006). All of Nonparametric Statistics. Springer Texts in Statistics. Springer, New York. · Zbl 1099.62029
[56] Zelditch, M. L., Swiderski, D. L. and Sheets, H. D. (2012). Geometric Morphometrics for Biologists: A Primer, 2nd ed. Elsevier, Amsterdam. · Zbl 1320.92010
[57] Zimmerman, D. L., Tang, J. and Huang, R. (2019). Supplement to “Outline analyses of the called strike zone in Major League Baseball.” DOI:10.1214/19-AOAS1285SUPPA, DOI:10.1214/19-AOAS1285SUPPB, DOI:10.1214/19-AOAS1285SUPPC, DOI:10.1214/19-AOAS1285SUPPD, DOI:10.1214/19-AOAS1285SUPPE, DOI:10.1214/19-AOAS1285SUPPF. · Zbl 1435.62445
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.