##
**Moneybarl: exploiting pitcher decision-making using reinforcement learning.**
*(English)*
Zbl 1454.62547

Summary: This manuscript uses machine learning techniques to exploit baseball pitchers’ decision making, so-called “Baseball IQ”, by modeling the at-bat information, pitch selection and counts, as a Markov Decision Process (MDP). Each state of the MDP models the pitcher’s current pitch selection in a Markovian fashion, conditional on the information immediately prior to making the current pitch. This includes the count prior to the previous pitch, his ensuing pitch selection, the batter’s ensuing action and the result of the pitch.

The necessary Markovian probabilities can be estimated by the relevant observed conditional proportions in MLB pitch-by-pitch game data. These probabilities could be pitcher-specific, using only the data from one pitcher, or general, using the data from a collection of pitchers.

Optimal batting strategies against these estimated conditional distributions of pitch selection can be ascertained by Value Iteration. Optimal batting strategies against a pitcher-specific conditional distribution can be contrasted to those calculated from the general conditional distributions associated with a collection of pitchers.

In this manuscript, a single season of MLB data is used to calculate the conditional distributions to find optimal pitcher-specific and general (against a collection of pitchers) batting strategies. These strategies are subsequently evaluated by conditional distributions calculated from a different season for the same pitchers. Thus, the batting strategies are conceptually tested via a collection of simulated games, a “mock season”, governed by distributions not used to create the strategies. (Simulation is not needed, as exact calculations are available.)

Instances where the pitcher-specific batting strategy outperforms the general batting strategy suggests that the pitcher is exploitable – knowledge of the conditional distributions of their pitch-making decision process in a different season yielded a strategy that worked better in a new season than a general batting strategy built on a population of pitchers. A permutation-based test of exploitability of the collection of pitchers is given and evaluated under two sets of assumptions.

To show the practical utility of the approach, we introduce a spatial component that classifies each pitcher’s pitch-types using a batter-parameterized spatial trajectory for each pitch. We found that heuristically labeled “nonelite” batters benefit from using the exploited pitchers’ pitcher-specific strategies, whereas (also heuristically labeled) “elite” players do not.

The necessary Markovian probabilities can be estimated by the relevant observed conditional proportions in MLB pitch-by-pitch game data. These probabilities could be pitcher-specific, using only the data from one pitcher, or general, using the data from a collection of pitchers.

Optimal batting strategies against these estimated conditional distributions of pitch selection can be ascertained by Value Iteration. Optimal batting strategies against a pitcher-specific conditional distribution can be contrasted to those calculated from the general conditional distributions associated with a collection of pitchers.

In this manuscript, a single season of MLB data is used to calculate the conditional distributions to find optimal pitcher-specific and general (against a collection of pitchers) batting strategies. These strategies are subsequently evaluated by conditional distributions calculated from a different season for the same pitchers. Thus, the batting strategies are conceptually tested via a collection of simulated games, a “mock season”, governed by distributions not used to create the strategies. (Simulation is not needed, as exact calculations are available.)

Instances where the pitcher-specific batting strategy outperforms the general batting strategy suggests that the pitcher is exploitable – knowledge of the conditional distributions of their pitch-making decision process in a different season yielded a strategy that worked better in a new season than a general batting strategy built on a population of pitchers. A permutation-based test of exploitability of the collection of pitchers is given and evaluated under two sets of assumptions.

To show the practical utility of the approach, we introduce a spatial component that classifies each pitcher’s pitch-types using a batter-parameterized spatial trajectory for each pitch. We found that heuristically labeled “nonelite” batters benefit from using the exploited pitchers’ pitcher-specific strategies, whereas (also heuristically labeled) “elite” players do not.

### MSC:

62P99 | Applications of statistics |

### Software:

ElemStatLearn
PDFBibTeX
XMLCite

\textit{G. Sidhu} and \textit{B. Caffo}, Ann. Appl. Stat. 8, No. 2, 926--955 (2014; Zbl 1454.62547)

### References:

[1] | Abbeel, P. and Ng, A. Y. (2004). Apprenticeship learning via inverse Reinforcement Learning. In Proceedings of the Twenty-First International Conference on Machine Learning 1. ACM, New York. |

[2] | Albert, J. (1994). Exploring baseball hitting data: What about those breakdown statistics? J. Amer. Statist. Assoc. 89 1066-1074. |

[3] | Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming . Athena Scientific, Nashua, NH. · Zbl 0924.68163 |

[4] | Bickel, J. E. (2009). On the decision to take a pitch. Decis. Anal. 6 186-193. |

[5] | Bukiet, B., Harold, E. R. and Palacios, J. L. (1997). A Markov chain approach to baseball. Oper. Res. 45 14-23. · Zbl 0892.90124 · doi:10.1287/opre.45.1.14 |

[6] | Cover, T. M. and Keilers, C. W. (1977). An offensive earned-run average for baseball. Oper. Res. 25 729-740. |

[7] | Feller, W. (1968). An Introduction to Probability Theory and Its Applications 1 , 3nd ed. Wiley, New York. · Zbl 0155.23101 |

[8] | Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning : Data Mining , Inference , and Prediction , 2nd ed. Springer, New York. · Zbl 1273.62005 |

[9] | Jensen, S. T., Shirley, K. E. and Wyner, A. J. (2009). Bayesball: A Bayesian hierarchical model for evaluating fielding in major league baseball. Ann. Appl. Stat. 3 491-520. · Zbl 1166.62385 · doi:10.1214/08-AOAS228 |

[10] | Lawler, G. F. (2006). Introduction to Stochastic Processes , 2nd ed. Chapman & Hall/CRC, Boca Raton, FL. · Zbl 1105.60003 |

[11] | Patek, S. D. and Bertsekas, D. P. (1996). Play selection in American football: A case study in neuro-dynamic programming. · Zbl 0893.90178 |

[12] | Reich, B. J., Hodges, J. S., Carlin, B. P. and Reich, A. M. (2006). A spatial analysis of basketball shot chart data. Amer. Statist. 60 3-12. · doi:10.1198/000313006X90305 |

[13] | Sidhu, G. and Caffo, B. (2014). Supplement to “MONEYBaRL: Exploiting pitcher decision-making using Reinforcement Learning.” . · Zbl 1454.62547 |

[14] | Stallings, J., Bennett, B. and American Baseball Coaches Association (2003). Baseball Strategies : American Baseball Coaches Association . Human Kinetics, Champaign, IL. |

[15] | Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning : An Introduction . MIT Press, Cambridge, MA. |

[16] | Vapnik, V. N. (1998). Statistical Learning Theory . Wiley, New York. · Zbl 0935.62007 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.