×

Task-based parallel programming model supporting fault tolerance. (Chinese. English summary) Zbl 1363.68036

Summary: Task-based parallel programming model has become the mainstream parallel programming model to improve the performance of parallel computer systems by exploiting task parallelism. This paper presents a novel task-based parallel programming model which supports hardware fault tolerance. This model incorporates fault tolerance mechanisms into the task-based parallel programming model and aim to improve system performance and reliability. It uses task as the basic unit of scheduling, execution, fault detection and recovery, and supports fault tolerance in the application level. A buffer-commit computation model is used for transient fault tolerance and application-level diskless check-pointing technique is employed for permanent fault tolerance. A work-stealing scheduling scheme supporting fault tolerance is adopted to achieve dynamic load balancing. Experimental results show that the proposed model provides hardware fault tolerance with low performance overhead.

MSC:

68N19 Other programming paradigms (object-oriented, sequential, concurrent, automatic, etc.)
68M15 Reliability, testing and fault tolerance of networks and computer systems
68M20 Performance evaluation, queueing, and scheduling in the context of computer systems
PDFBibTeX XMLCite
Full Text: DOI