zbMATH — the first resource for mathematics

Efficient data race detection for async-finish parallelism. (English) Zbl 1284.68144
Summary: A major productivity hurdle for parallel programming is the presence of data races. Data races can lead to all kinds of harmful program behaviors, including determinism violations and corrupted memory. However, runtime overheads of current dynamic data race detectors are still prohibitively large (often incurring slowdowns of $$10\times$$ or more) for use in mainstream software development.
In this paper, we present an efficient dynamic race detection algorithm that handles both the async-finish task-parallel programming model used in languages such as X10 and Habanero Java (HJ) and the spawn-sync constructs used in Cilk.
We have implemented our algorithm in a tool called TaskChecker and evaluated it on a suite of 12 benchmarks. To reduce overhead of the dynamic analysis, we have also implemented various static optimizations in the tool. Our experimental results indicate that our approach performs well in practice, incurring an average slowdown of $$3.05\times$$ compared to a serial execution in the optimized case.
MSC:
 68N19 Other programming paradigms (object-oriented, sequential, concurrent, automatic, etc.) 68M20 Performance evaluation, queueing, and scheduling in the context of computer systems 68W10 Parallel algorithms in computer science 68N15 Theory of programming languages
Software:
 [1] Agarwal S, Barik R, Bonachea D, Sarkar V, Shyamasundar RK, Yelick K (2007) Deadlock-free scheduling of X10 computations with bounded resources. In: SPAA ’07: Proceedings of the 19th symposium on parallel algorithms and architectures. ACM, New York, pp 229–240 [2] Agarwal S, Barik R, Sarkar V, Shyamasundar RK (2007) May-happen-in-parallel analysis of $$\times$$10 programs. In: PPoPP ’07: Proceedings of the 12th symposium on principles and practice of parallel programming. ACM, New York, pp 183–193 [3] Barik R, Budimlic Z, Cave V, Chatterjee S, Guo Y, Peixotto D, Raman R, Shirako J, Tasirlar S, Yan Y, Zhao Y, Sarkar V (2009) The habanero multicore software research project. In: OOPSLA ’09: Proceeding of the 24th ACM SIGPLAN conference companion on object oriented programming systems languages and applications, New York, NY, USA. ACM, New York, pp 735–736 [4] Blumofe RD, Joerg CF, Kuszmaul BC, Leiserson CE, Randall KH, Zhou Y (1995) Cilk: an efficient multithreaded runtime system. In: Proceedings of the fifth ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP, Oct 1995, pp 207–216 [5] Blumofe RD, Leiserson CE (1999) Scheduling multithreaded computations by work stealing. J ACM 46(5):720–748 · Zbl 1065.68504 · doi:10.1145/324133.324234 [6] Bocchino R, Adve V, Adve S, Snir M (2009) Parallel programming must be deterministic by default. In: First USENIX workship on hot topics in parallelism (HOTPAR 2009) [7] Bodden E, Lam P, Hendren L (2010) Clara: a framework for statically evaluating finite-state runtime monitors. In: 1st international conference on runtime verification (RV), Nov 2010. LNCS, vol 6418. Springer, Berlin, pp 74–88 [8] Charles P, Grothoff C, Saraswat VA, Donawa C, Kielstra A, Ebcioglu K, von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. In: Proceedings of the twentieth annual ACM SIGPLAN conference on object-oriented programming, systems, languages, and applications, OOPSLA, Oct, pp 519–538 [9] Cheng G-I, Feng M, Leiserson CE, Randall KH, Stark AF (1998) Detecting data races in Cilk programs that use locks. In: Proceedings of the tenth annual ACM symposium on parallel algorithms and architectures (SPAA ’98), Puerto Vallarta, Mexico, June 28–July 2 1998, pp 298–309 [10] Dijkstra EW Cooperating sequential processes. 65–138 [11] Feng M, Leiserson CE (1997) Efficient detection of determinacy races in Cilk programs. In: SPAA ’97: proceedings of the ninth annual ACM symposium on parallel algorithms and architectures. ACM, New York, pp 1–11 [12] Flanagan C, Freund SN (2009) Fasttrack: efficient and precise dynamic race detection. In: PLDI ’09: proceedings of the 2009 ACM SIGPLAN conference on programming language design and implementation. ACM, New York, pp 121–133 [13] Frigo M, Leiserson CE, Randall KH (1998) The implementation of the Cilk-5 multithreaded language. In: PLDI’98, NY, USA, 1998. ACM, New York, pp 212–223 [14] Guo Y, Barik R, Raman R, Sarkar V (2009) Work-first and help-first scheduling policies for async-finish task parallelism. In: IPDPS ’09: proceedings of the international symposium on parallel&distributed processing. IEEE Computer Society, Washington, pp 1–12 [15] Habanero Java http://habanero.rice.edu/hj [16] Larus JR, Rajwar R (2006) Transactional memory. Morgan and Claypool, San Francisco [17] Lea D (2000) A java fork/join framework. In: JAVA ’00: proceedings of the ACM 2000 conference on Java Grande. ACM, New York, pp 36–43 [18] Lee EA (2006) The problem with threads. Computer 39(5):33–42 · Zbl 05088711 · doi:10.1109/MC.2006.180 [19] Lee JK, Palsberg J (2010) Featherweight $$\times$$10: a core calculus for async-finish parallelism. In: PPoPP ’10: proceedings of the 15th ACM SIGPLAN symposium on principles and practice of parallel computing. ACM, New York, pp 25–36 [20] Leijen D, Schulte W, Burckhardt S (2009) The design of a task parallel library. In: OOPSLA ’09: proceeding of the 24th ACM SIGPLAN conference on object oriented programming systems languages and applications. ACM, New York, pp 227–242 [21] Mellor-Crummey J (1993) Compile-time support for efficient data race detection in shared-memory parallel programs. In: PADD ’93: proceedings of the 1993 ACM/ONR workshop on parallel and distributed debugging, New York, NY, USA, 1993. ACM, New York, pp 129–139 [22] Purandare R, Dwyer MB, Elbaum S (2010) Monitor optimization via stutter-equivalent loop transformation. In: Proceedings of the ACM international conference on object oriented programming systems languages and applications, New York, NY, USA, 2010, OOPSLA ’10. ACM, New York, pp 270–285 [23] Sadowski C, Freund SN, Flanagan C (2009) SingleTrack: A dynamic determinism checker for multithreaded programs. In: Programming languages and systems. Lecture notes in computer science, vol 5502. Springer, Berlin, pp 394–409 · Zbl 1234.68066 [24] Tarjan RE (1975) Efficiency of a good but not linear set union algorithm. J ACM 22:215–225 · Zbl 0307.68029 · doi:10.1145/321879.321884 [25] Tarjan RE (1983) Data structures and network algorithms. Society for Industrial and Applied Mathematics, Philadelphia · Zbl 0584.68077 [26] Vallée-Rai R et al. (1999) Soot–a Java optimization framework. In: Proceedings of CASCON 1999, pp 125–135 [27] Zhao J, Sarkar V (2011) Intermediate language extensions for parallelism. In: VMIL’11, pp 333–334