×

An evaluation of linear and non-linear models of expressive dynamics in classical piano and symphonic music. (English) Zbl 1457.00051

Summary: Expressive interpretation forms an important but complex aspect of music, particularly in Western classical music. Modeling the relation between musical expression and structural aspects of the score being performed is an ongoing line of research. Prior work has shown that some simple numerical descriptors of the score (capturing dynamics annotations and pitch) are effective for predicting expressive dynamics in classical piano performances. Nevertheless, the features have only been tested in a very simple linear regression model. In this work, we explore the potential of non-linear and temporal modeling of expressive dynamics. Using a set of descriptors that capture different types of structure in the musical score, we compare linear and different non-linear models in a large-scale evaluation on three different corpora, involving both piano and orchestral music. To the best of our knowledge, this is the first study where models of musical expression are evaluated on both types of music. We show that, in addition to being more accurate, non-linear models describe interactions between numerical descriptors that linear models do not.

MSC:

00A65 Mathematics and music

Software:

PRMLT; LSMR
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bishop, C. M. (2006). Pattern recognition and machine learning. New York, NY: Springer Science. · Zbl 1107.68072
[2] Bishop, L., Bailes, F., & Dean, R. T. (2014). Performing musical dynamics. Music Perception, 32(1), 51-66. · doi:10.1525/mp.2014.32.1.51
[3] Bresin, R. (1998). Artificial neural networks based models for automatic performance of musical scores. Journal of New Music Research, 27(3), 239-270. · doi:10.1080/09298219808570748
[4] Cancino Chacón, C. E., Grachten, M., & Widmer, G. (2014). Bayesian linear basis models with gaussian priors for musical expression. Technical report. Austrian Research Institute for Artificial Intelligence, Vienna, TR-2014-12. · Zbl 1318.68153
[5] Clarke, EF; Sloboda, J. (ed.), Generative principles in music (1988), Oxford
[6] Dauphin, Y. N., de Vries, H., Chung, J., & Bengio, Y. (2015). RMSProp and equilibrated adaptive learning rates for non-convex optimization. arXiv:1502.4390.
[7] De Poli, G., Canazza, S., Rodà, A., & Schubert, E. (2015). The role of individual difference in judging expressiveness of computer-assisted music performances by experts. ACM Transactions on Applied Perception, 11(4), 1-20. · doi:10.1145/2668124
[8] De Poli, G., Canazza, S., Rodà, A., Vidolin, A., & Zanon, P. (2001). Analysis and modeling of expressive intentions in music performance. In Proceedings of the international workshop on human supervision and control in engineering and music. Kassel, Germany.
[9] EBU-R-128. (2011). BU Tech 3341-2011, Practical Guidelines for Production and Implementation in Accordance with EBU R 128. https://tech.ebu.ch/docs/tech/tech3341.pdf.
[10] Flossmann, S., Goebl, W., Grachten, M., Niedermayer, B., & Widmer, G. (2010). The Magaloff project: An interim report. Journal of New Music Research, 39(4), 363-377. · doi:10.1080/09298215.2010.523469
[11] Fong, D. C. L., & Saunders, M. (2011). LSMR: An iterative algorithm for sparse least-squares problems. SIAM Journal on Scientific Computing, 33(5), 2950-2971. · Zbl 1232.65052 · doi:10.1137/10079687X
[12] Friberg, A., Bresin, R., & Sundberg, J. (2006). Overview of the KTH rule system for musical performance. Advances in Cognitive Psychology, 2(2-3), 145-161. · doi:10.2478/v10053-008-0052-x
[13] Friberg, A., & Sundberg, J. (1999). Does music performance allude to locomotion? A model of final ritardandi derived from measurements of stopping runners. Journal of the Acoustical Society of America, 105(3), 1469-1484. · doi:10.1121/1.426687
[14] Gabrielsson, A. (2003). Music performance research at the millennium. The Psychology of Music, 31(3), 221-272. · doi:10.1177/03057356030313002
[15] Gadermaier, T., Grachten, M., & Cancino Chacón, C. E. (2016). Modeling loudness variations in ensemble performance. In Proceedings of the 2nd international conference on new music concepts (ICNMC 2016). ABEditore, Treviso, Italy. · Zbl 1457.00051
[16] Goebl, W. (2001). Melody lead in piano performance: Expressive device or artifact? Journal of the Acoustical Society of America, 110(1), 563-572. · doi:10.1121/1.1376133
[17] Grachten, M., & Cancino Chacón, C. E. (2017). Temporal dependencies in the expressive timing of classical piano performances. In M. Lesaffre, M. Leman, & P. J. Maes (Eds.), The Routledge companion of embodied music interaction (pp. 362-371).
[18] Grachten, M., Cancino Chacón, C. E., Gadermaier, T., & Widmer, G. (2017). Towards computer-assisted understanding of dynamics in symphonic music. IEEE Multimedia, 24(1), 36-46. · doi:10.1109/MMUL.2017.4
[19] Grachten, M., Cancino Chacón, C. E., & Widmer, G. (2014). Analysis and prediction of expressive dynamics using Bayesian linear models. In Proceedings of the 1st international workshop on computer and robotic systems for automatic music performance (pp. 545-552).
[20] Grachten, M., Gasser, M., Arzt, A., & Widmer, G. (2013). Automatic alignment of music performances with structural differences. In Proceedings of the 14th international society for music information retrieval conference, Curitiba, Brazil.
[21] Grachten, M., & Krebs, F. (2014). An assessment of learned score features for modeling expressive dynamics in music. IEEE Transactions on Multimedia, 16(5), 1211-1218. doi:10.1109/TMM.2014.2311013. · doi:10.1109/TMM.2014.2311013
[22] Grachten, M., & Widmer, G. (2012). Linear basis models for prediction and analysis of musical expression. Journal of New Music Research, 41(4), 311-322. · doi:10.1080/09298215.2012.731071
[23] Graves, A. (2013). Generating sequences with recurrent neural networks. arXiv:1308.0850.
[24] Grindlay, G., & Helmbold, D. (2006). Modeling, analyzing, and synthesizing expressive piano performance with graphical models. Machine Learning, 65(2-3), 361-387. · doi:10.1007/s10994-006-8751-3
[25] Hashida, M., Nakra, T., Katayose, H., Murao, T., Hirata, K., Suzuki, K., & Kitahara, T. (2008). Rencon: Performance rendering contest for automated music systems. In Proceedings of the 10th international conference on music perception and cognition (ICMPC), Sapporo, Japan.
[26] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780. · doi:10.1162/neco.1997.9.8.1735
[27] Honing, H. (2006). Computational modeling of music cognition: A case study on model selection. Music Perception, 23(5), 365-376. · doi:10.1525/mp.2006.23.5.365
[28] Juslin, P.; Juslin, P. (ed.); Sloboda, J. (ed.), Communicating emotion in music performance: A review and a theoretical framework, 309-337 (2001), New York
[29] Palmer, C. (1996). Anatomy of a performance: Sources of musical expression. Music Perception, 13(3), 433-453. · doi:10.2307/40286178
[30] Palmer, C. (1997). Music performance. Annual Review of Psychology, 48, 115-138. · doi:10.1146/annurev.psych.48.1.115
[31] Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In Proceedings of the 30th international conference on machine learning, Atlanta, GA, USA (pp. 1-9).
[32] Ramirez, R., & Hazan, A. (2004). Rule induction for expressive music performance modeling. In ECML workshop advances in inductive rule learning.
[33] Repp, B. H. (1992). Diversity and commonality in music performance—An analysis of timing microstructure in Schumann’s “Träumerei”. Journal of the Acoustical Society of America, 92(5), 2546-2568. · doi:10.1121/1.404425
[34] Repp, B. H. (1994). Relational invariance of expressive microstructure across global tempo changes in music performance: An exploratory study. Psychological Research, 56, 285-292. · doi:10.1007/BF00419657
[35] Repp, B. H., London, J., & Keller, P. E. (2013). Systematic distortions inmusicians’ reproduction of cyclic three-intervalrhythms. Music Perception: An Interdisciplinary Journal, 30(3), 291-305. doi:10.1525/mp.2012.30.3.291. http://mp.ucpress.edu/content/30/3/291.
[36] Rodà, A., Schubert, E., De Poli, G., & Canazza, S. (2015). Toward a musical Turing test for automatic music performance. In International symposium on computer music multidisciplinary research.
[37] Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., & Tarantola, S. (2010). Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index. Computer Physics Communications, 181(2), 259-270. · Zbl 1219.93116 · doi:10.1016/j.cpc.2009.09.018
[38] Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673-2681. · doi:10.1109/78.650093
[39] Sloboda, J. A. (1983). The communication of musical metre in piano performance. Quarterly Journal of Experimental Psychology, 35A, 377-396. · doi:10.1080/14640748308402140
[40] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929-1958. · Zbl 1318.68153
[41] Temperley, D. (2007). Music and probability. Cambridge, MA: MIT Press. · Zbl 1136.00303
[42] Teramura, K., Okuma, H., Taniguchi, Y., Makimoto, S., & Maeda, S. (2008). Gaussian process regression for rendering music performance. In Proceedings of the 10th international conference on music perception and cognition (ICMPC 10), Sapporo, Japan.
[43] Timmers, R., Ashley, R., Desain, P., Honing, H., & Windsor, L. (2002). Timing of ornaments in the theme of Beethoven’s Paisiello Variations: Empirical data and a model. Music Perception, 20(1), 3-33. · doi:10.1525/mp.2002.20.1.3
[44] Todd, N. (1992). The dynamics of dynamics: A model of musical expression. Journal of the Acoustical Society of America, 91, 3540-3550. · Zbl 1278.20014 · doi:10.1121/1.402843
[45] van Herwaarden, S., Grachten, M., & de Haas, W. B. (2014). Predicting expressive dynamics using neural networks. In Proceedings of the 15th conference of the international society for music information retrieval (pp. 47-52).
[46] Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600-612. · doi:10.1109/TIP.2003.819861
[47] Widmer, G. (2002). Machine discoveries: A few simple, robust local expression principles. Journal of New Music Research, 31(1), 37-50. · doi:10.1076/jnmr.31.1.37.8103
[48] Widmer, G. (2003). Discovering simple rules in complex data: A meta-learning algorithm and some surprising musical discoveries. Artificial Intelligence, 146(2), 129-148. · Zbl 1082.68734 · doi:10.1016/S0004-3702(03)00016-X
[49] Widmer, G., & Goebl, W. (2004). Computational models of expressive music performance: The state of the art. Journal of New Music Research, 33(3), 203-216. doi:10.1080/0929821042000317804. · doi:10.1080/0929821042000317804
[50] Windsor, W. L., & Clarke, E. F. (1997). Expressive timing and dynamics in real and artificial musical performances: Using an algorithm as an analytical tool. Music Perception, 15(2), 127-152. · doi:10.2307/40285746
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.