×

Reliability-aware scheduling strategy for heterogeneous distributed computing systems. (English) Zbl 1233.68091

Summary: Heterogeneous computing systems are promising computing platforms, since single parallel architecture based systems may not be sufficient to exploit the available parallelism with the running applications. In some cases, heterogeneous distributed computing (HDC) systems can achieve higher performance with lower cost than single-machine supersystems. However, in HDC systems, processors and networks are not failure free and any kind of failure may be critical to the running applications. One way of dealing with such failures is to employ a reliable scheduling algorithm. Unfortunately, most existing scheduling algorithms for precedence constrained tasks in HDC systems do not adequately consider reliability requirements of inter-dependent tasks. In this paper, we design a reliability-driven scheduling architecture that can effectively measure system reliability, based on an optimal reliability communication path search algorithm, and then we introduce reliability priority rank (RRank) to estimate the task’s priority by considering reliability overheads. Furthermore, based on directed acyclic graph (DAG) we propose a reliability-aware scheduling algorithm for precedence constrained tasks, which can achieve high quality of reliability for applications. The comparison studies, based on both randomly generated graphs and the graphs of some real applications, show that our scheduling algorithm outperforms the existing scheduling algorithms in terms of makespan, scheduling length ratio, and reliability. At the same time, the improvement gained by our algorithm increases as the data communication among tasks increases.

MSC:

68M14 Distributed systems
68M20 Performance evaluation, queueing, and scheduling in the context of computer systems
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Adam, T. L.; Chandy, K. M.; Dickson, J. R.: A comparison of list schedules for parallel processing systems, Comm. ACM 17, No. 12, 685-690 (1974) · Zbl 0293.68047
[2] Ahmad, I.; Kwok, Y. -K.: On exploiting task duplication in parallel program scheduling, IEEE trans. Parallel distrib. Syst. 9, No. 9, 872-892 (1998)
[3] Bansal, S.; Kumar, P.; Singh, K.: An improved duplication strategy for scheduling precedence constrained graphs in multiprocessor systems, IEEE trans. Parallel distrib. Syst. 14, No. 6, 533-544 (2003)
[4] Bansal, S.; Kumar, P.; Singh, K.: Dealing with heterogeneity through limited duplication for scheduling precedence constrained task graphs, J. parallel distrib. Comput. 65, No. 4, 479-491 (2005) · Zbl 1101.68405
[5] Benoit, Anne; Hakem, Mourad; Robert, Yves: Contention awareness and fault-tolerant scheduling for precedence constrained tasks in heterogeneous systems, Parallel comput. 35, No. 2, 83-108 (2009)
[6] Casanova, H.: Network modeling issues for grid application scheduling, Internat. J. Found. comput. Sci. 16, No. 2, 145-162 (2005)
[7] Chu, W. W.; Holloway, L. J.; Lan, M. T.; Efe, K.: Task allocation in distributed data processing, Computer 13, No. 11, 57-69 (1980)
[8] Y. Chung, S. Ranka, Application and performance analysis of a compile-time optimization approach for list scheduling algorithms on distributed memory multiprocessors, in: Proc. Super Computing, 1992, pp. 512–521.
[9] Daoud, Mohammad I.; Kharma, Nawwaf: A high performance algorithm for static task scheduling in heterogeneous distributed computing systems, J. parallel distrib. Comput. 68, No. 4, 399-409 (2008) · Zbl 1243.68103
[10] Darbha, S.; Agrawal, D. P.: Optimal scheduling algorithm for distributed-memory machines, IEEE trans. Parallel distrib. Syst. 9, No. 1, 87-95 (1998)
[11] Dhodhi, M. K.; Ahmad, I.; Yatama, A.: An integrated technique for task matching and scheduling onto distributed heterogeneous computing system, J. parallel distrib. Comput. 62, No. 9, 1338-1361 (2002) · Zbl 1063.68021
[12] A. Dogan, F. Özgüner, Optimal and suboptimal reliable scheduling of precedence-constrained tasks in heterogeneous computing, in: Proc. 2000 Int’l Conf. Parallel Processing Workshop Network Based Computing, 2000, pp. 429–436.
[13] Dogan, A.; Özgüner, F.: Matching and scheduling algorithms for minimizing execution time and failure probability of applications in heterogeneous computing, IEEE trans. Parallel distrib. Syst. 13, No. 3, 308-323 (2002)
[14] El-Rewini, H.; Lewis, T. G.: Scheduling parallel program tasks onto arbitrary target machines, J. parallel distrib. Comput. 9, No. 2, 138-153 (1990) · Zbl 0861.90072
[15] Gary, M. R.; Johnson, D. S.: Computers and intractability: A guide to the theory of NP-completeness, (1979) · Zbl 0411.68039
[16] A. Girault, C. Lavarenne, M. Sighireanu, Y. Sorel, Generation of fault-tolerant static scheduling for real-time embedded systems with multi-point links, in: IEEE Workshop on Fault-Tolerant Parallel and Systems, San Francisco, USA, 2001.
[17] Hagras, T.; Janeček, J.: A high performance, low complexity algorithm for compile-time task scheduling in heterogeneous systems, Parallel comput. 31, No. 7, 653-670 (2005)
[18] M. Iverson, F. Ozuner, G. Follen, Parallelizing existing applications in a distributed heterogeneous environment, in: Proceedings of Heterogeneous Computing Workshop, 1995, pp. 93–100.
[19] Kartik, S.; Murthy, C. S. R.: Task allocation algorithms for maximizing reliability of distributed computing systems, IEEE trans. Comput. 46, No. 6, 719-724 (1997)
[20] Kim, D.; Yi, B. G.: A two-pass scheduling algorithm for parallel programs, Parallel comput. 20, No. 6, 869-885 (1994) · Zbl 0811.68061
[21] Kwok, Y. -K.; Ahmad, I.: Dynamic critical-path scheduling: an effective technique for allocating task graphs onto multiprocessors, IEEE trans. Parallel distrib. Syst. 7, No. 5, 506-521 (1996)
[22] Liu, G. Q.; Poh, K. L.; Xie, M.: Iterative list scheduling for heterogeneous computing, J. parallel distrib. Comput. 65, No. 5, 654-665 (2005) · Zbl 1101.68420
[23] Ma, P. Y. R.; Lee, E. Y. S.; Tsuchiya, M.: A task allocation model for distributed computing systems, IEEE trans. Comput. 31, No. 1, 41-47 (1982)
[24] Manimaran, G.; Murthy, C. Siva Ram: A fault-tolerant dynamic scheduling algorithm for multiprocessor real-time systems and its analysis, IEEE trans. Parallel distrib. Syst. 9, No. 11, 1137-1152 (1998)
[25] Papadimitriou, C. H.; Yannakakis, M.: Towards an architecture-independent analysis of parallel algorithms, SIAM J. Comput. 19, No. 2, 322-328 (1990) · Zbl 0692.68032
[26] Park, H. J.; Kim, B. K.: An optimal scheduling algorithm for minimizing the computing period of cyclic synchronous tasks on multiprocessors, J. syst. Softw. 56, No. 3, 213-229 (2001)
[27] J.S. Plank, W.R. Elwasif, Experimental assessment of workstation failures and their impact on checkpointing system, in: Int’l Symp. Fault-Tolerant Computing, 1998, pp. 48–57.
[28] Qin, Xiao; Jiang, Hong: A dynamic and reliability-driven scheduling algorithm for parallel real-time jobs executing on heterogeneous clusters, J. parallel distrib. Comput. 65, No. 8, 885-900 (2005) · Zbl 1082.68523
[29] Qin, Xiao; Jiang, Hong: A novel fault-tolerant scheduling algorithm for precedence constrained tasks in real-time heterogeneous systems, Parallel comput. 32, No. 5, 331-356 (2006)
[30] Radulescu, A.; Van Gemund, A. J. C.: Low-cost task scheduling for distributed-memory machines, IEEE trans. Parallel distrib. Syst. 13, No. 6, 648-658 (2002)
[31] Shatz, S. M.; Wang, J. P.; Goto, M.: Task allocation for maximizing reliability of distributed computer systems, IEEE trans. Comput. 41, No. 9, 1156-1168 (1992)
[32] Sih, G. C.; Lee, E. A.: A compile-time scheduling heuristic for interconnection-constrained heterogeneous machine architectures, IEEE trans. Parallel distrib. Syst. 4, No. 2, 175-187 (1993)
[33] Sinnen, Oliver; Sousa, Leonel Augusto; Sandnes, Frode Eika: Toward a realistic task scheduling model, IEEE trans. Parallel distrib. Syst. 17, No. 3, 263-275 (2006)
[34] Stone, H. S.: Multiprocessor scheduling with the aid of network flow algorithms, IEEE trans. Softw. eng. 3, No. 1, 85-93 (1977) · Zbl 0355.68042
[35] Tang, Xiaoyong; Kenli, Li; Padua, D.: Communication contention in APN list scheduling algorithm, Sci. China ser. F 52, No. 1, 59-69 (2009) · Zbl 1181.90127
[36] Tang, Xiaoyong; Li, Kenli; Liao, Guiping; Li, Renfa: List scheduling with duplication for heterogeneous computing systems, J. parallel distrib. Comput. 70, No. 4, 323-329 (2010) · Zbl 1233.68092
[37] Topcuoglu, H.; Hariri, S.; Wu, M. -Y.: Performance-effective and low complexity task scheduling for heterogeneous computing, IEEE trans. Parallel distrib. Syst. 13, No. 3, 260-274 (2002)
[38] Woodside, C. M.; Monforton, G. G.: Fast allocation of processes in distributed and parallel systems, IEEE trans. Parallel distrib. Syst. 4, No. 2, 164-174 (1993)
[39] Wu, M.; Dajski, D.: Hypertool: a programming aid for message passing systems, IEEE trans. Parallel distrib. Syst. 1, No. 3, 330-343 (1990)
[40] Zheng, Qin; Veeravalli, Bharadwaj: On the design of communication-aware fault-tolerant scheduling algorithms for precedence constrained tasks in grid computing systems with dedicated communication devices, J. parallel distrib. Comput. 69, No. 3, 282-294 (2009)
[41] Zhou, Xiaobo; Xu, Cheng-Zhong: Harmonic proportional bandwidth allocation and scheduling for service differentiation on streaming servers, IEEE trans. Parallel distrib. Syst. 15, No. 9, 835-848 (2004)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.