×

Hands-on tutorial for parallel computing with R. (English) Zbl 1304.65030

Summary: Due to the increasing availability of powerful hardware resources, parallel computing is becoming an important issue, as a noticeable speedup may be achieved. The statistical programming language R allows for parallel computing on computer clusters as well as multicore systems through several packages. This tutorial gives a short, practical overview of four, in view of the authors, important packages for parallel computing in R, namely multicore, snow, snowfall and nws. First, the general principle of parallelizing simple tasks is briefly illustrated based on a statistical cross-validation example. Afterwards, the usage of each of the introduced packages is being demonstrated on the example. Furthermore, we address some specific features of the packages and provide guidance for selecting an adequate package for the computing environment at hand.

MSC:

62-08 Computational methods for problems pertaining to statistics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Bjornson R, Carriero N, Weston S (2007) Python NetWorkSpaces and parallel programs. Dr Dobb’s Journal, pp 1–7. http://www.ddj.com/web-development/200001971
[2] Dolkart V, Pronina L (2007) Change in computer hardware and software paradigms. Russian Electr Eng 78(10): 548–553
[3] Dongarra, J, Foster, I, Fox, G, Gropp, W, Kennedy, K, Torczon, L, White, A (eds) (2003) Sourcebook of parallel computing. Morgan Kaufmann Publishers Inc., San Francisco
[4] Eddelbuettel D (2010a) CRAN task view: high-performance and parallel computing. http://cran.r-project.org/web/views/HighPerformanceComputing.htm
[5] Eddelbuettel D (2010b) R SIG on high-performance computing. http://www.r-project.org/mail.html
[6] Grama A, Karypis G, Kumar V, Gupta A (2003) Introduction to parallel computing. 2nd edn. Addison Wesley, Reading · Zbl 0861.68040
[7] Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. 2nd edn. Springer, New York · Zbl 1273.62005
[8] Knaus J (2010) Snowfall: easier cluster computing based on snow. http://CRAN.R-project.org/package=snowfall , R package version 1.83
[9] Knaus J, Porzelius C, Binder H, Schwarzer G (2009) Easier parallel computing in R with snowfall and sfCluster. R J 1: 54–59
[10] R Development Core Team (2009) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org , ISBN 3-900051-07-0
[11] REvolution Computing (2008) nws: R functions for NetWorkSpaces and Sleigh. REvolution Computing with support and contributions from Pfizer and Inc. http://nws-r.sourceforge.net/ , R package version 1.7.0.0
[12] Rossini A, Tierney L, Li NM (2007) Simple parallel statistical computing in R. J Comput Graph Stat 16(2): 399–420
[13] Schmidberger M, Morgan M, Eddelbuettel D, Yu H, Tierney L, Mansmann U (2009) State of the art in parallel computing with R. J Stat Softw 31(1). http://www.jstatsoft.org/v31/i01/
[14] Simon R, Radmacher MD, Dobbin K, McShane LM (2003) Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J National Cancer Inst 95(1): 14–18
[15] Sloan J (2004) High performance linux clusters with OSCAR, Rocks, OpenMosix, and MPI (Nutshell Handbooks). O’Reilly Media, Inc. http://www.oreilly.de/catalog/9780596005702/
[16] Stevens WR (1992) Advanced programming in the UNIX environment. 1st edn. Addison-Wesley, Reading · Zbl 0883.68034
[17] Tierney L, Rossini AJ, Li MN, Sevcikova H (2008) Snow: simple network of workstations. http://CRAN.R-project.org/package=snow , R package version 0.3–3
[18] Urbanek S (2009) Multicore: parallel processing of R code on machines with multiple cores or CPUs. http://www.RForge.net/multicore/ , R package version 0.1-3
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.