Predictions based on the clustering of heterogeneous functions via shape and subject-specific covariates. (English) Zbl 1336.62251

Summary: We consider a study of players employed by teams who are members of the National Basketball Association where units of observation are functional curves that are realizations of production measurements taken through the course of one’s career. The observed functional output displays large amounts of between player heterogeneity in the sense that some individuals produce curves that are fairly smooth while others are (much) more erratic. We argue that this variability in curve shape is a feature that can be exploited to guide decision making, learn about processes under study and improve prediction. In this paper we develop a methodology that takes advantage of this feature when clustering functional curves. Individual curves are flexibly modeled using Bayesian penalized B-splines while a hierarchical structure allows the clustering to be guided by the smoothness of individual curves. In a sense, the hierarchical structure balances the desire to fit individual curves well while still producing meaningful clusters that are used to guide prediction. We seamlessly incorporate available covariate information to guide the clustering of curves non-parametrically through the use of a product partition model prior for a random partition of individuals. Clustering based on curve smoothness and subject-specific covariate information is particularly important in carrying out the two types of predictions that are of interest, those that complete a partially observed curve from an active player, and those that predict the entire career curve for a player yet to play in the National Basketball Association.


62P99 Applications of statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62N01 Censored data models
62G05 Nonparametric estimation


BayesDA; BayesX; fda (R)
Full Text: DOI arXiv Euclid


[1] Behseta, S., Kass, R. E., and Wallstrom, G. L. (2005). “Hierarchical Models for Assessing Variability Among Functions.” Biometrika , 92(2): 419-434. · Zbl 1094.62029 · doi:10.1093/biomet/92.2.419
[2] Berry, S. M., Reese, C. S., and Larkey, P. D. (1999). “Bridging Different Eras in Sports.” Journal of the American Statistical Association , 94(447): 661-676.
[3] Bigelow, J. L. and Dunson, D. B. (2007). “Bayesian Adaptive Regression Splines for Hierarchical Data.” Biometrics , 63: 724-732. · Zbl 1147.62089 · doi:10.1111/j.1541-0420.2007.00761.x
[4] Biller, C. (2000). “Adaptive Bayesian Regression Splines in Semiparametric Generalized Linear Models.” Journal of Computational and Graphical Statistics , 9: 122-140.
[5] Blackwell, D. and MacQueen, J. B. (1973). “Ferguson Distributions via Pólya Urn Schemes.” The Annals of Statistics , 1: 353-355. · Zbl 0276.62010 · doi:10.1214/aos/1176342372
[6] Collins, L. M. and Lanza, S. T. (2010). Latent Class and Latent Transition Analysis . Hoboken, New Jersey: John Wiley and Sons, first edition.
[7] Connolly, R. A. and Rendleman Jr., R. J. (2008). “Skill, Luck, and Streaky Play on the PGA Tour.” Journal of the American Statistical Association , 103: 74-88. · Zbl 1469.62417 · doi:10.1198/016214507000000310
[8] Dahl, D. B. (2006). “Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model.” In Vannucci, M., Do, K. A., and Müller, P. (eds.), Bayesian Inference for Gene Expression and Proteomics , 201-218. Cambridge University Press.
[9] Dean, N. and Raftery, A. E. (2010). “Latent Class Analysis Variable Selection.” Annals of the Institute of Statistical Mathematics , 62: 11-35. · Zbl 1422.62085 · doi:10.1007/s10463-009-0258-9
[10] Denison, D. G. T., Holmes, C. C., Mallick, B. K., and Smith, A. F. M. (2002). Bayesian Methods for Nonlinear Classification and Regression . New York: John Wiley & Sons, first edition. · Zbl 0994.62019
[11] Di, C.-Z., Crainiceanu, C. M., Caffo, B. S., and Punjabi, N. M. (2009). “Multilevel Functional Principal Component Analysis.” The Annals of Applied Statistics , 3: 458-488. · Zbl 1160.62061 · doi:10.1214/08-AOAS206
[12] DiMatteo, I., Genovese, C. R., and Kass, R. E. (2001). “Bayesian Curve-Fitting with Free-Knot Splines.” Biometrika , 88: 1055-1071. · Zbl 0986.62026 · doi:10.1093/biomet/88.4.1055
[13] Elliott, M. R., Gallo, J. J., Ten Have, T. R., Bogner, H. R., and Katz, I. R. (2005). “Using a Bayesian latent growth curve model to identify trajectories of positive affect and negative events following myocardial infarction.” Biostatistics , 6: 119-143. · Zbl 1069.62095 · doi:10.1093/biostatistics/kxh022
[14] Fahrmeir, L. and Kneib, T. (2005). Bayesian Smoothing and Regression for Longitudinal, Spatial and Event History Data . New York: Oxford University Press, 1st edition. · Zbl 1249.62003
[15] Gelfand, A. E. and Smith, A. F. M. (1990). “Sampling-Based Approaches to Calculating Marginal Densities.” Journal of the American Statistical Association , 85: 398-409. · Zbl 0702.62020 · doi:10.2307/2289776
[16] Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis . Boca Raton Florida: CRC Press, third edition. · Zbl 1279.62004
[17] Geman, S. and Geman, D. (1984). “Stochastic Relaxation, Gibbs Distribution and Bayesian Restoration of Images.” IEEE Transactions on Pattern Analysis of Machine Intelligence , 6: 721-741. · Zbl 0573.62030 · doi:10.1109/TPAMI.1984.4767596
[18] Goldberg, Y., Ritov, Y., and Mandelbaum, A. (2014). “Predicting the Continuation of a Function with Applications to Call Center Data.” Journal of Statistical Planning and Inference , 147: 53-65. · Zbl 1278.62013 · doi:10.1016/j.jspi.2013.11.006
[19] Hollinger, J. (2002). Pro Basketball Prospectus 2002 . Pro Basketball Forecast. Free Press.
[20] Lang, S. and Brezger, A. (2004). “Bayesian P-Splines.” Journal of Computational and Graphical Statistics , 13(1): 183-212. · doi:10.1198/1061860043010
[21] Little, R. J. A. and Rubin, D. B. (1987). Statistical Analysis with Missing Data . New York: J. Wiley & Sons, 1st edition. · Zbl 0665.62004
[22] Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., and Teller, E. (1953). “Equations of State Calculations by Fast Computing Machines.” Journal of Chemical Physics , 21: 1087-1091.
[23] Montagna, S., Tokdar, S. T., Neelon, B., and Dunson, D. B. (2012). “Bayesian Latent Factor Regression for Functional and Longitudinal Data.” Biometrics , 68: 1064-1073. · Zbl 1258.62030 · doi:10.1111/j.1541-0420.2012.01788.x
[24] Morris, J. S. and Carroll, R. J. (2006). “Wavelet-Based Functional Mixed Models.” Journal of the Royal Statistical Society, Series B , 68: 179-199. · Zbl 1110.62053 · doi:10.1111/j.1467-9868.2006.00539.x
[25] Müller, P., Quintana, F., and Rosner, G. L. (2011). “A Product Partition Model With Regression on Covariates.” Journal of Computational and Graphical Statistics , 20(1): 260-277. · doi:10.1198/jcgs.2011.09066
[26] Neal, R. M. (2000). “Markov Chain Sampling Methods for Dirichlet Process Mixture Models.” Journal of Computational and Graphical Statistics , 9: 249-265.
[27] Park, J.-H. and Dunson, D. B. (2010). “Bayesian Generalized Product Partition Model.” Statistica Sinica , 20: 1203-1226. · Zbl 1507.62242
[28] Petrone, S., Guindani, M., and Gelfand, A. (2009). “Hybrid Dirichlet Mixture Models for Functional Data.” Journal or the Royal Statistical Society Series B , 71: 755-782. · Zbl 1248.62079 · doi:10.1111/j.1467-9868.2009.00708.x
[29] Quintana, F. A. (2006). “A Predictive View of Bayesian Clustering.” Journal of Statistical Planning and Inference , 136: 2407-2429. · Zbl 1090.62023 · doi:10.1016/j.jspi.2004.09.015
[30] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis . New York: Springer, second edition. · Zbl 1079.62006 · doi:10.1007/b98888
[31] Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning . MIT Press. · Zbl 1177.68165
[32] Telesca, D. and Inoue, L. Y. T. (2008). “Bayesian Hierarchical Curve Registration.” Journal of the American Statistical Association , 103: 328-339. · Zbl 1471.62560 · doi:10.1198/016214507000001139
[33] Wang, S., Jank, W., Shmueli, G., and Smith, P. (2008). “Modeling Price Dynamics in eBay Auctions Using Differential Equations.” Journal of the American Statistical Association , 103: 1100-1118. · Zbl 1205.91076 · doi:10.1198/016214508000000670
[34] Zhu, B. and Dunson, D. B. (2012). “Stochastic Volatility Regression for Functional Data Dynamics.”
[35] - (2013). “Locally Adaptive Bayes Nonparametric Regression via Nested Gaussian Processes.” Journal of the American Statistical Association , (504): 1445-1456. · Zbl 1283.62091 · doi:10.1080/01621459.2013.838568
[36] Zhu, B., Taylor, J. M. G., and Song, P. X. K. (2011). “Semiparametric Stochastic Modeling of the Rate Function in Longitudinal Studies.” Journal of the American Statistical Association , 106: 1485-1495. · Zbl 1233.62194 · doi:10.1198/jasa.2011.tm09294
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.