×

trackr: a framework for enhancing discoverability and reproducibility of data visualizations and other artifacts in R. (English) Zbl 07499083

Summary: Research is an incremental, iterative process, with new results relying and building upon previous ones. Scientists need to find, retrieve, understand, and verify results to confidently extend them, even when the results are their own. We present the trackr framework for organizing, automatically annotating, discovering, and retrieving results. We identify sources of automatically extractable metadata for computational results, and we define an extensible system for organizing, annotating, and searching for results based on these and other metadata. We present an open-source implementation of these concepts for plots, computational artifacts, and woven dynamic reports generated in the R statistical computing language. Supplementary materials for this article are available online.

MSC:

62-XX Statistics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Allaire, J.; Cheng, J.; Xie, Y.; McPherson, J.; Chang, W.; Allen, J.; Wickham, H.; Atkins, A.; Hyndman, R., rmarkdown: Dynamic Documents for R (2016)
[2] Auguie, B., “gridExtra: Miscellaneous Functions for ‘Grid’ Graphics (2017)
[3] Bates, D.; Maechler, M.; Bolker, B., MEMSS: Data Sets From Mixed-Effects Models in S (2014)
[4] Bates, D.; Maechler, M.; Bolker, B., mlmRev: Examples From Multilevel Modelling Software Review (2014)
[5] Becker, G., Dynamic Documents for Data Analytic Science (2014)
[6] Becker, G.; Barr, C.; Gentleman, R.; Lawrence, M., “Enhancing Reproducibility and Collaboration via Management of R Package Cohorts,”, Journal of Statistical Software, 82, 1-17 (2017) · doi:10.18637/jss.v082.i01
[7] Becker, G.; Jenkins, B., fastdigest: Fast, Low Memory-Footprint Digests of R Objects (2015)
[8] Becker, R. A.; Chambers, J. M., “Auditing of Data Analyses,”, SIAM Journal on Scientific and Statistical Computing, 9, 747-760 (1988) · Zbl 0709.68542 · doi:10.1137/0909049
[9] Biecek, P.; Kosinski, M., “archivist: An R Package for Managing, Recording and Restoring Data Analysis Results,”, Journal of Statistical Software, 82, 1-28 (2017) · doi:10.18637/jss.v082.i11
[10] Boettiger, C.; Eddelbuettel, D., An Introduction to Rocker: Docker Containers for R, arXiv no. 1710.03675 (2017)
[11] Brodlie, K.; Poon, A.; Wright, H.; Brankin, L.; Banecki, G.; Gay, A.; Bergeron, D.; Nielson, G., Proceedings of the 4th Conference on Visualization ’93, VIS ’93, GRASPARC: A Problem Solving Environment Integrating Computation and Visualization, 102-109 (1993), Washington, DC, USA: IEEE Computer Society, Washington, DC, USA
[12] Buckheit, J. B.; Donoho, D. L.; Antoniadis, A.; Oppenheim, G., Wavelets and Statistics, WaveLab and Reproducible Research, 55-81 (1995), New York: Springer, New York · Zbl 0828.62001 · doi:10.1007/978-1-4612-2544-7_5
[13] Callahan, S. P.; Freire, J.; Santos, E.; Scheidegger, C. E.; Silva, C. T.; Vo, H. T.; Yu, C.; Scheuermann, P., Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD ’06, VisTrails: Visualization Meets Data Management, 745-747 (2006), ACM · doi:10.1145/1142473.1142574
[14] Chang, W.; Cheng, J.; Allaire, J.; Xie, Y.; McPherson, J., shiny: Web Application Framework for R (2016)
[15] Claerbout, J. F.; Karrenbach, M., SEG Technical Program Expanded Abstracts 1992, Electronic Documents Give Reproducible Research a New Meaning (1992), Society of Exploration Geophysicists · doi:10.1190/1.1822162
[16] Davison, A., “Automated Capture of Experiment Context for Easier Reproducibility in Computational Research,”, Computing in Science Engineering, 14, 48-56 (2012) · doi:10.1109/MCSE.2012.41
[17] Figshare Support Team, FAQ—How Discoverable Is My Research? (2016)
[18] Freire, J.; Koop, D.; Santos, E.; Silva, C. T., “Provenance for Computational Tasks: A Survey,”, Computing in Science & Engineering, 10, 11-21 (2008) · doi:10.1109/MCSE.2008.79
[19] Garijo, D.; Corcho, O.; Gil, Y., Proceedings of the Seventh International Conference on Knowledge Capture, K-CAP ’13, Detecting Common Scientific Workflow Fragments Using Templates and Execution Provenance, 33-40 (2013), New York, NY, USA: ACM, New York, NY, USA · doi:10.1145/2479832.2479848
[20] Gentleman, R.; Temple Lang, D., Statistical Analyses and Reproducible Research (2004)
[21] Hahnel, M., “figshare—Credit for All Your Research,” (2011)
[22] Heer, J.; Mackinlay, J.; Stolte, C.; Agrawala, M., “Graphical Histories for Visualization: Supporting Analysis Communication, and Evaluation,”, IEEE Trans. Vis. Comput. Graph., 14, 1189-1196 (2008) · doi:10.1109/TVCG.2008.137
[23] Henry, L.; Wickham, H., “rlang: Functions for Base Types and Core R and ‘Tidyverse’ Features,” R Package Version 0.2.0 (2018)
[24] Huynh, T. D.; Moreau, L.; Ludäscher, B.; Plale, B., Provenance and Annotation of Data and Processes, “Provstore: A Public Provenance Repository,”, 275-277 (2015), Cham: Springer International Publishing, Cham
[25] Jenkins, B., “Spookyhash: A 128-Bit Noncryptographic Hash,” (2012)
[26] Lawrence, M.; Becker, G.; Vogel, J., rsolr: R to Solr Interface (2015)
[27] Lee, J.; Grinstein, G., Proceedings Visualization ’95, An Architecture for Retaining and Analyzing Visual Explorations of Databases (1995), Institute of Electrical & Electronics Engineers (IEEE) · doi:10.1109/visual.1995.480801
[28] Leisch, F.; Härdle, W.; Rönz, B., Compstat, Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis, 575-580 (2002), Heidelberg: Springer Science + Business Media, Heidelberg · Zbl 1446.62018 · doi:10.1007/978-3-642-57489-4_89
[29] Lerner, B.; Boose, E., 6th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2014), Cologne, Rdatatracker: Collecting Provenance in an Interactive Scripting Environment (2014), USENIX Association
[30] Michener, W. K.; Brunt, J. W.; Helly, J. J.; Kirchner, T. B.; Stafford, S. G., “Nongeospatial Metadata for the Ecological Sciences,”, Ecological Applications, 7, 330-342 (1997) · doi:10.1890/1051-0761(1997)007[0330:NMFTES
[31] Michener, W.; Vieglais, D.; Vision, T.; Kunze, J.; Cruse, P.; Janée, G., DataONE: Data Observation Network for Earth—Preserving Data and Enabling Innovation in the Biological and Environmental Sciences, D-Lib Magazine, 17, 12 (2011)
[32] Missier, P.; Belhajjame, K.; Cheney, J., Proceedings of the 16th International Conference on Extending Database Technology—EDBT ’13, The W3C PROV Family of Specifications for Modelling Provenance Metadata (2013), Association for Computing Machinery (ACM) · doi:10.1145/2452376.2452478
[33] Moreau, L.; Clifford, B.; Freire, J.; Futrelle, J.; Gil, Y.; Groth, P.; Kwasnikowska, N.; Miles, S.; Missier, P.; Myers, J.; Plale, B.; Simmhan, Y.; Stephan, E.; den Bussche, J. V., “The Open Provenance Model Core Specification (v1.1),”, Future Generation Computer Systems, 27, 743-756 (2011) · doi:10.1016/j.future.2010.07.005
[34] Moreau, L.; Groth, P.; Miles, S.; Vazquez-Salceda, J.; Ibbotson, J.; Jiang, S.; Munroe, S.; Rana, O.; Schreiber, A.; Tan, V.; Varga, L., “The Provenance of Electronic Data,”, Communications of the ACM, 51, 52-58 (2008) · doi:10.1145/1330311.1330323
[35] Moreau, L.; Ludäscher, B.; Altintas, I.; Barga, R. S.; Bowers, S.; Callahan, S.; Chin, G.; Clifford, B.; Cohen, S.; Cohen-Boulakia, S.; Davidson, S.; Deelman, E.; Digiampietri, L.; Foster, I.; Freire, J.; Frew, J.; Futrelle, J.; Gibson, T.; Gil, Y.; Goble, C.; Golbeck, J.; Groth, P.; Holland, D. A.; Jiang, S.; Kim, J.; Koop, D.; Krenek, A.; McPhillips, T.; Mehta, G.; Miles, S.; Metzger, D.; Munroe, S.; Myers, J.; Plale, B.; Podhorszki, N.; Ratnakar, V.; Santos, E.; Scheidegger, C.; Schuchardt, K.; Seltzer, M.; Simmhan, Y. L.; Silva, C.; Slaughter, P.; Stephan, E.; Stevens, R.; Turi, D.; Vo, H.; Wilde, M.; Zhao, J.; Zhao, Y., “Special Issue: The First Provenance Challenge,”, Concurrency and Computation: Practice and Experience, 20, 409-418 (2008) · doi:10.1002/cpe.1233
[36] Nolan, D.; Temple Lang, D., “Dynamic, Interactive Documents for Teaching Statistical Practice,”, International Statistical Review, 75, 295-321 (2007) · doi:10.1111/j.1751-5823.2007.00025.x
[37] Plotly Technologies Inc, Collaborative Data Science,” (2013)
[38] Project Blacklight, Blacklight, Version 5.14.0 (2015)
[39] R Core Team, R: A Language and Environment for Statistical Computing (2015), Vienna, Austria: R Foundation for Statistical Computing, Vienna, Austria
[40] Rossini, A., and Leisch, F. (2003), “Literate Statistical Practice,” UW Biostatistics Working Paper Series.
[41] Rstudio Inc, RPubs—Easy Web Publishing From R (2012)
[42] Rstudio Inc, Shiny User Showcase (2016)
[43] Sarkar, D., Lattice: Multivariate Data Visualization With R (2008), New York: Springer, New York · Zbl 1166.62003
[44] Scheidegger, C.; Vo, H.; Koop, D.; Freire, J.; Silva, C., “Querying and Creating Visualizations by Analogy,”, IEEE Transactions on Visualization and Computer Graphics, 13, 1560-1567 (2007) · doi:10.1109/TVCG.2007.70584
[45] Schwab, M.; Karrenbach, M.; Claerbout, J., “Making Scientific Computations Reproducible,”, Computing in Science & Engineering, 2, 61-67 (2000) · doi:10.1109/5992.881708
[46] Silles, C. A.; Runnalls, A. R., International Provenance and Annotation Workshop, “Provenance-Awareness in R, 64-72 (2010), Berlin, Heidelberg: Springer, Berlin, Heidelberg
[47] Stodden, V.; Miguez, S.; Seiler, J., “ResearchCompendia.org: Cyberinfrastructure for Reproducibility and Collaboration in Computational Science,”, Computing in Science Engineering, 17, 12-19 (2015) · doi:10.1109/MCSE.2015.18
[48] Temple Lang, D.; Peng, R.; Nolan, D.; Becker, G., CodeDepends: Analysis of R Code for Reproducible Research and Code View (2015)
[49] Tufte, E. R.; Graves-Morris, P., The Visual Display of Quantitative Information (1983), Cheshire, CT: Graphics Press, Cheshire, CT
[50] Venables, W. N.; Ripley, B. D., Modern Applied Statistics With S (2002), New York: Springer, New York · Zbl 1006.62003
[51] Wickham, H., ggplot2: Elegant Graphics for Data Analysis (2009), New York: Springer, New York · Zbl 1170.62004
[52] Wickham, H.; Francois, R.; Henry, L.; Müller, K., dplyr: A Grammar of Data Manipulation (2017)
[53] Wilkinson, L., The Grammar of Graphics (1999), New York, NY, USA: Springer-Verlag New York, Inc, New York, NY, USA · Zbl 0940.68158
[54] Wolstencroft, K.; Haines, R.; Fellows, D.; Williams, A.; Withers, D.; Owen, S.; Soiland-Reyes, S.; Dunlop, I.; Nenadic, A.; Fisher, P.; Bhagat, J.; Belhajjame, K.; Bacall, F.; Hardisty, A.; Nieva de la Hidalga, A.; Balcazar Vargas, M. P.; Sufi, S.; Goble, C., “The Taverna Workflow Suite: Designing and Executing Workflows of Web Services on the Desktop, Web or in the Cloud,”, Nucleic Acids Research, 41, W557 (2013) · doi:10.1093/nar/gkt328
[55] Woodruff, A.; Stonebraker, M., Supporting Fine-Grained Data Lineage in a Database Visualization Environment, 91-102 (1997) · doi:10.1109/ICDE.1997.581742
[56] Xie, Y., Dynamic Documents With R and knitr (2015), Boca Raton, FL: Chapman and Hall/CRC, Boca Raton, FL
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.