McDonald, Daniel J.; McBride, Michael; Gu, Yupeng; Raphael, Christopher Markov-switching state space models for uncovering musical interpretation. (English) Zbl 1478.62380 Ann. Appl. Stat. 15, No. 3, 1147-1170 (2021). Summary: For concertgoers, musical interpretation is the most important factor in determining whether or not we enjoy a classical performance. Every performance includes mistakes – intonation issues, a lost note, an unpleasant sound – but these are all easily forgotten (or unnoticed) when a performer engages her audience, imbuing a piece with novel emotional content beyond the vague instructions inscribed on the printed page. In this research we use data from the CHARM Mazurka Project – 46 professional recordings of Chopin’s Mazurka Op. 68 No. 3 by consummate artists – with the goal of elucidating musically interpretable performance decisions. We focus specifically on each performer’s use of tempo by examining the interonset intervals of the note attacks in the recording. To explain these tempo decisions, we develop a switching state space model and estimate it by maximum likelihood, combined with prior information gained from music theory and performance practice. We use the estimated parameters to quantitatively describe individual performance decisions and compare recordings. These comparisons suggest methods for informing music instruction, discovering listening preferences and analyzing performances. MSC: 62P99 Applications of statistics 62M02 Markov processes: hypothesis testing 62M05 Markov processes: estimation; hidden Markov models 62M20 Inference from stochastic processes and prediction 62H30 Classification and discrimination; cluster analysis (statistical aspects) Keywords:classification and clustering; Kalman filter; hidden Markov model Software:tidyverse; Magenta.js; Rcpp; batchtools; R × Cite Format Result Cite Review PDF Full Text: DOI arXiv References: [1] Anderson, B. D. O. and Moore, J. B. (1979). Optimal Filtering. Prentice-Hall, Englewood Cliffs, NJ. · Zbl 0688.93058 [2] Andrieu, C., Doucet, A. and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 269-342. · Zbl 1411.65020 · doi:10.1111/j.1467-9868.2009.00736.x [3] Arcos, J. and Mantaras, R. L. (2001). An interactive cbr approach for generating expressive music. J. Appl. Intell. 21 115-129. · Zbl 0972.68549 [4] Ariza, C. (2005). Navigating the landscape of computer aided algorithmic composition systems: A definition, seven descriptors, and a lexicon of systems and research. In Proceedings of International Computer Music Conference. [5] Arzt, A. and Widmer, G. (2015). Real-time music tracking using multiple performances as a reference. In International Society for Music Information Retrieval (ISMIR) 357-363. [6] Bernstein, L. (2005). Young People’s Concerts. Amadeus Press, Pompton Plains, NJ. [7] Bisiani, R. (1992). Beam search. In Encyclopedia of Artificial Intelligence, 2nd ed. (S. Shapiro, ed.) Wiley, New York. [8] Block, B. A., Jonsen, I. D., Jorgensen, S. J., Winship, A. J., Shaffer, S. A., Bograd, S. J., Hazen, E. L., Foley, D. G., Breed, G. et al. (2011). Tracking apex marine predator movements in a dynamic ocean. Nature 475 86. [9] Boulanger-Lewandowski, N., Bengio, Y. and Vincent, P. (2012). Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. In Proceedings of the 29th International Conference on Machine Learning. [10] Bresin, R., Friberg, A. and Sundberg, J. (2002). Director musices: The KTH performance rules system. In Proceedings of SIGMUS-46. [11] Burkholder, J. P., Grout, D. J. and Palisca, C. V. (2014). A History of Western Music, 9th ed. Norton, New York. [12] CHARM (2009). Centre for the History and Analysis of Recorded Music. Online; accessed 12 March 2019. [13] Collins, N. (2016). A funny thing happened on the way to the formula: Algorithmic composition for musical theater. Comput. Music J. 40 41-57. [14] Cont, A. (2010). A coupled duration-focused architecture for real-time music-to-score alignment. IEEE Trans. Pattern Anal. Mach. Intell. 32 974-987. [15] Cont, A., Schwarz, D., Schnell, N. and Raphael, C. (2007). Evaluation of real-time audio-to-score alignment. In International Symposium on Music Information Retrieval (ISMIR). [16] Cook, N. (2013). Beyond the Score: Music as Performance. Oxford Univ. Press, Oxford. [17] Craven, P. and Wahba, G. (1978). Smoothing noisy data with spline functions. Numer. Math. 31 377-403. · Zbl 0377.65007 [18] Dannenberg, R. (1985). An on-line algorithm for real-time accompaniment. In Proceedings of the 1984 International Computer Music Conference 193-198. International Computer Music Association. [19] Dannenberg, R. B. and Raphael, C. (2006). Music score alignment and computer accompaniment. Commun. ACM 49 38-43. [20] Dror, G., Koenigstein, N., Koren, Y. and Weimer, M. (2012). The yahoo! Music dataset and KDD-Cup’11. In KDD Cup 8-18. [21] Dudoit, S. and Fridlyand, J. (2002). A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology 3 research0036.1. [22] Durbin, J. and Koopman, S. J. (1997). Monte Carlo maximum likelihood estimation for non-Gaussian state space models. Biometrika 84 669-684. · Zbl 0888.62086 · doi:10.1093/biomet/84.3.669 [23] Durbin, J. and Koopman, S. J. (2001). Time Series Analysis by State Space Methods. Oxford Statistical Science Series 24. Oxford Univ. Press, Oxford. · Zbl 0995.62504 [24] Earis, A. (2007). An algorithm to extract expressive timing and dynamics from piano recordings. Music. Sci. 11 155-182. [25] Earis, A. (2009). Mazurka in F Major, Op. 68, No. 3. accessed 12 March 2019. [26] Eddelbuettel, D. (2013). Seamless R and \[C}++\]Integration with Rcpp. Springer, New York. · Zbl 1283.62001 [27] Fearnhead, P. and Clifford, P. (2003). On-line inference for hidden Markov models via particle filters. J. R. Stat. Soc. Ser. B. Stat. Methodol. 65 887-899. · Zbl 1059.62098 · doi:10.1111/1467-9868.00421 [28] Flossman, S., Grachten, M. and Widmer, G. (2012). Expressive performance rendering with probabilistic models. In Guide to Computing for Expressive Music Performance (A. Kirke and E. Miranda, eds.) Springer, Berlin. [29] Flossmann, S., Grachten, M. and Widmer, G. (2013). Expressive performance rendering with probabilistic models. In Guide to Computing for Expressive Music Performance 75-98. Springer, Berlin. [30] Forsén, S., Gray, H. B., Lindgren, L. K. O. and Gray, S. B. (2013). Was something wrong with Beethoven’s metronome? Notices Amer. Math. Soc. 60 1146-1153. · Zbl 1322.00026 · doi:10.1090/noti1044 [31] Fox, E. B., Sudderth, E. B., Jordan, M. I. and Willsky, A. S. (2011). A sticky HDP-HMM with application to speaker diarization. Ann. Appl. Stat. 5 1020-1056. · Zbl 1232.62077 · doi:10.1214/10-AOAS395 [32] Fuh, C.-D. (2006). Efficient likelihood estimation in state space models. Ann. Statist. 34 2026-2068. · Zbl 1246.62185 · doi:10.1214/009053606000000614 [33] Ghahramani, Z. and Hinton, G. E. (2000). Variational learning for switching state-space models. Neural Comput. 12 831-864. [34] Golub, G. H., Heath, M. and Wahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21 215-223. · Zbl 0461.62059 · doi:10.2307/1268518 [35] Grindlay, G. and Helmbold, D. (2006). Modeling, analyzing, and synthesizing expressive piano performance with graphical models. Mach. Learn. 65 361-387. [36] Gu, Y. and Raphael, C. (2012). Modeling piano interpretation using switching Kalman filter. In International Society for Music Information Retrieval (ISMIR) 145-150. [37] Hadjeres, G., Pachet, F. and Nielsen, F. (2017). DeepBach: A steerable model for bach chorales generation. In Proceedings of the 34th International Conference on Machine Learning (D. Precup and Y. W. Teh, eds.). Proceedings of Machine Learning Research 70 1362-1371. PMLR, Sydney, Australia. [38] Hamilton, J. D. (2011). Calling recessions in real time. Int. J. Forecast. 27 1006-126. [39] Harvey, A. C. (1990). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge Univ. Press, Cambridge. · Zbl 0725.62083 [40] Kallberg, J. (1996). Chopin at the Boundaries: Sex, History, and Musical Genre. Harvard Univ. Press, Harvard. [41] Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. J. Basic Eng. 82 35-45. [42] Kim, C.-J. (1994). Dynamic linear models with Markov-switching. J. Econometrics 60 1-22. · Zbl 0795.62104 · doi:10.1016/0304-4076(94)90036-1 [43] Kim, C. J. and Nelson, C. R. (1998). Business cycle turning points, a new coincident index, and tests of duration dependence based on a dynamic factor model with regime switching. Rev. Econ. Stat. 80 188-201. [44] Kim, S.-J., Koh, K., Boyd, S. and Gorinevsky, D. (2009). \[{l_1}\] trend filtering. SIAM Rev. 51 339-360. · Zbl 1171.37033 · doi:10.1137/070690274 [45] Kitagawa, G. (1987). Non-Gaussian state-space modeling of nonstationary time series. J. Amer. Statist. Assoc. 82 1032-1063. · Zbl 0644.62088 [46] Kitagawa, G. (1996). Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. J. Comput. Graph. Statist. 5 1-25. · doi:10.2307/1390750 [47] Koyama, S., Castellanos Pérez-Bolde, L., Shalizi, C. R. and Kass, R. E. (2010). Approximate methods for state-space models. J. Amer. Statist. Assoc. 105 170-180. · Zbl 1397.62112 · doi:10.1198/jasa.2009.tm08326 [48] Lang, M., Bischl, B. and Surmann, D. (2017). batchtools: Tools for R to work on batch systems. J. Open Sour. Softw. 2 135. [49] Lang, D. and Freitas, N. D. (2005). Beat tracking the graphical model way. In Advances in Neural Information Processing Systems 745-752. MIT press, Cambridge, MA. [50] Maezawa, A. (2019). Deep linear autoregressive model of interpretable prediction of expressive tempo. In Proceedings of the 16th Sound and Music Computing Conference. [51] McDonald, D. J, McBride, M., Gu, Y. and Raphael, C. (2021). Supplement to “Markov-Switching State Space Models for Uncovering Musical Interpretation.” https://doi.org/10.1214/21-AOAS1457SUPPA, https://doi.org/10.1214/21-AOAS1457SUPPB, https://doi.org/10.1214/21-AOAS1457SUPPC [52] McFee, B. and Lanckriet, G. (2011). Learning multi-modal similarity. J. Mach. Learn. Res. 12 491-523. · Zbl 1280.68181 [53] Mead, A. (2007). On tempo relations. Perspect. New Music 45 64-108. [54] Patterson, T. A., Thomas, L., Wilcox, C., Ovaskainen, O. and Matthiopoulos, J. (2008). State-space models of individual animal movement. Trends Ecol. Evol. 23 87-94. [55] R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. [56] Raphael, C. (2002). A hybrid graphical model for rhythmic parsing. Artificial Intelligence 137 217-238. · Zbl 0995.68029 [57] Raphael, C. (2010). Music plus one and machine learning. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (J. Fürnkranz and T. Joachims, eds.) 21-28. [58] Rauch, H. E., Tung, F. and Striebel, C. T. (1965). Maximum likelihood estimates of linear dynamic systems. AIAA J. 3 1445-1450. · doi:10.2514/3.3166 [59] Ren, L., Dunson, D., Lindroth, S. and Carin, L. (2010). Dynamic nonparametric Bayesian models for analysis of music. J. Amer. Statist. Assoc. 105 458-472. · Zbl 1392.62356 · doi:10.1198/jasa.2009.ap08497 [60] Roberts, A., Hawthorne, C. and Simon, I. (2018). Magenta.js: A JavaScript API for augmenting creativity with deep learning. In Joint Workshop on Machine Learning for Music (ICML). [61] Roberts, A., Engel, J., Raffel, C., Hawthorne, C. and Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In Proceedings of the 35th International Conference on Machine Learning (J. Dy and A. Krause, eds.). Proceedings of Machine Learning Research 80 4364-4373. PMLR, Stockholmsmässan, Stockholm, Sweden. [62] Schedl, M., Gómez, E., Urbano, J. et al. (2014). Music information retrieval: Recent developments and applications. Found. Trends Inf. Retr. 8 127-261. [63] Stowell, D. and Chew, E. (2012). Bayesian MAP estimation of piecewise arcs in tempo time series. In Proceedings of Computer Music Multidisciplinary Research. [64] Sturm, B. L., Ben-Tal, O., Monaghan, Ú., Collins, N., Herremans, D., Chew, E., Hadjeres, G., Deruty, E. and Pachet, F. (2019). Machine learning research that matters for music creation: A case study. J. New Music Res. 48 36-55. [65] Thickstun, J., Harchaoui, Z. and Kakade, S. M. (2017). Learning features of music from scratch. In International Conference on Learning Representations (ICLR). [66] Tibshirani, R. J. (2014). Adaptive piecewise polynomial estimation via trend filtering. Ann. Statist. 42 285-323. · Zbl 1307.62118 · doi:10.1214/13-AOS1189 [67] Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B. Stat. Methodol. 63 411-423. · Zbl 0979.62046 · doi:10.1111/1467-9868.00293 [68] van den Oord, A., Dieleman, S. and Schrauwen, B. (2013). Deep content-based music recommendation. In Advances in Neural Information Processing Systems 26 (C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani and K. Q. Weinberger, eds.) 2643-2651. Curran Associates, Red Hook. [69] Vercoe, B. (1984). The synthetic performer in the context of live performance. In Proceedings of the 1984 International Computer Music Conference 199-200. International Computer Music Association. [70] Wahba, G. (1990). Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics 59. SIAM, Philadelphia, PA. · Zbl 0813.62001 · doi:10.1137/1.9781611970128 [71] Whiteley, N., Andrieu, C. and Doucet, A. (2010). Efficient Bayesian Inference for Switching State-Space Models using Discrete Particle Markov Chain Monte Carlo Methods Technical Report No. 10:04 Bristol Univ. [72] Wickham, H. (2016). Ggplot2: Elegant Graphics for Data Analysis, 2nd ed. Springer, Berlin. · Zbl 1397.62006 [73] Wickham, H. (2017). tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. [74] Widmer, G., Flossmann, S. and Grachten, M. (2009). YQX plays chopin. AI Mag. 30 35 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.