Opportunities and challenges applying functional data analysis to the study of open source software evolution. (English) Zbl 1426.68033

Summary: This paper explores the application of functional data analysis (FDA) as a means to study the dynamics of software evolution in the open source context. Several challenges in analyzing the data from software projects are discussed, an approach to overcoming those challenges is described, and preliminary results from the analysis of a sample of open source software (OSS) projects are provided. The results demonstrate the utility of FDA for uncovering and categorizing multiple distinct patterns of evolution in the complexity of OSS projects. These results are promising in that they demonstrate some patterns in which the complexity of software decreased as the software grew in size, a particularly novel result. The paper reports preliminary explorations of factors that may be associated with decreasing complexity patterns in these projects. The paper concludes by describing several next steps for this research project as well as some questions for which more sophisticated analytical techniques may be needed.


68N01 General topics in the theory of software
62R10 Functional data analysis
68N99 Theory of software


R; fda (R)
Full Text: DOI arXiv Euclid


[1] Banker, R. D., Davis, G. B. and Slaughter, S. A. (1998). Software development practices, software complexity, and software maintenance performance: A field study. Management Sci. 44 433–450. · Zbl 1004.68514
[2] Belady, L. A. and Lehman, M. M. (1976). A model of large program development. IBM Systems J. 15 225–252. · Zbl 0329.68014
[3] Chidamber, S. R., Darcy, D. P. and Kemerer, C. F. (1998). Managerial use of metrics for object-oriented software: An exploratory analysis. IEEE Trans. Software Engineering 24 629–639.
[4] Darcy, D. P., Kemerer, C. F., Slaughter, S. A. and Tomayko, J. E. (2005). The structural complexity of software: An experimental test. IEEE Trans. Software Engineering 31 982–995.
[5] Gorla, N. and Ramakrishnan, R. (1997). Effect of software structure attributes on software development productivity. J. Systems and Software 36 191–199.
[6] Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning . Springer, New York. · Zbl 0973.62007
[7] Jank, W. and Shmueli, G. (2005). Profiling price dynamics in online auctions using curve clustering. Working paper RHS-06-004, Smith School of Business, Univ. Maryland. Available at ssrn.com/abstract=902893.
[8] Kaufmann, L. and Rousseeuw, P. J. (1987). Clustering by means of medoids. In Statistical Analysis Based on the \(L_1\)- Norm and Related Methods (Y. Dodge, ed.) 405–416. North-Holland, Amsterdam.
[9] Kemerer, C. F. (1995). Software complexity and software maintenance: A survey of empirical research. Annals of Software Engineering 1 1–22.
[10] Kemerer, C. F. and Slaughter, S. A. (1999). An empirical approach to studying software evolution. IEEE Trans. Software Engineering 25 493–509.
[11] MacCormack, A., Rusnak, J. and Baldwin, C. (2004). Exploring the structure of complex software designs: An empirical study of open source and proprietary code. Working paper 05-016, Harward Business School.
[12] Prahalad, C. K. and Krishnan, M. S. (1999). The new meaning of quality in the information age. Harvard Business Review Sept. 109–118.
[13] Ramsay, J. O. and Silverman, B. W. (2002). Applied Functional Data Analysis : Methods and Case Studies . Springer, New York. · Zbl 1011.62002
[14] Scacchi, W. (2002). Understanding the requirements for developing open source software systems. IEEE Proc. Software 149 24–39.
[15] Shmueli, G. and Jank, W. (2006). Modeling the dynamics of online auctions: A modern statistical approach. In Economics , Information Systems and E-Commerce Research II : Advanced Empirical Methods 1 (R. Kauffman and P. Tallon, eds.). Sharpe, Armonk, NY.
[16] Smith, T. (2002). Open source: Enterprise ready—with qualifiers. Available at www.linuxtoday.com/it_management/2002100101126NWBZ.
[17] Stewart, K., Ammeter, A. and Maruping, L. M. (2006). Impacts of license choice and organizational sponsorship on user interest and development activity in open source software projects. Information Systems Research 17 126–144.
[18] Stewart, K. and Gosain, S. (2006). The impact of ideology on effectiveness in open source software development teams. Management Information Systems Quarterly 30 291–314.
[19] Tan, Y. and Mookerjee, V. S. (2005). Comparing uniform and flexible policies for software maintenance and replacement. IEEE Trans. Software Engineering 31 238–255.
[20] Yu, L., Schach, S. R., Chen, K. and Offutt, J. (2004). Categorization of common coupling and its application to the maintainability of the Linux kernel. IEEE Trans. Software Engineering 30 694–706.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.