Summary: In this paper, fault-tolerant feature of TOPAS parallel programming environment for distributed systems is presented. TOPAS automatically analyzes data dependence among tasks and synchronizes data, which reduces the time needed for parallel program developments. TOPAS also provides supports for scheduling, load balancing and fault tolerance. The main topics of this paper is to present the solution for transparent recovery of asynchronous distributed computation on clusters of workstations without hardware spare when a fault occurs on a node. Experiments show simplicity and efficiency of parallel programming in TOPAS environment with fault-tolerant integration, which provides graceful performance degradation and quick reconfiguration time for application recovery.
