×

Preserving stabilization while practically bounding state space using incorruptible partially synchronized clocks. (English) Zbl 1460.68018

Summary: Stabilization is a key dependability property for dealing with unanticipated transient faults, as it guarantees that even in the presence of such faults, the system will recover to states where it satisfies its specification. One of the desirable attributes of stabilization is the use of bounded space for each variable. In this paper, we present an algorithm that transforms a stabilizing program that uses variables with unbounded domain into a stabilizing program that uses bounded variables by using partially synchronized physical time. Specifically, our algorithm relies on bounded clock drift \(\epsilon\) among processes and message delivery that either delivers the message in time \(\delta\) or loses it. If we let \(\epsilon\) to be as much as 100s and \(\delta\) to be as much as 1h, this property is satisfied by any practical system. While non-stabilizing programs (that do not handle transient faults) can deal with unbounded variables by assigning large enough but bounded space, stabilizing programs – that need to deal with arbitrary transient faults – cannot do the same since a transient fault may corrupt the variable to its maximum value. We show that our transformation algorithm is applicable to several problems including logical clocks, vector clocks, mutual exclusion, diffusing computations, and so on. Moreover, our approach can also be used to bound counters used in an earlier work by Katz and Perry for adding stabilization to a non-stabilizing program. By combining our algorithm with that work by Katz and Perry and by assuming incorruptible partially synchronized clocks, it would be possible to provide stabilization for a rich class of problems, by assigning large enough but bounded space for variables.

MSC:

68M14 Distributed systems
68M15 Reliability, testing and fault tolerance of networks and computer systems

Software:

UNITY
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Alon, N.; Attiya, H.; Dolev, S.; Dubois, S.; Potop-Butucaru, M.; Tixeuil, S., Practically stabilizing SWMR atomic memory in message-passing systems, J. Comput. Syst. Sci., 81, 4, 692-701 (2015) · Zbl 1320.68042 · doi:10.1016/j.jcss.2014.11.014
[2] Arora, A.; Gouda, MG, Distributed reset, IEEE Trans. Comput., 43, 9, 1026-1038 (1994) · Zbl 1068.68572 · doi:10.1109/12.312126
[3] Arora, A., Kulkarni, S., Demirbas, M.: Resettable vector clocks. In: Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing, pp. 269-278. ACM, NY (2000). 10.1145/343477.343628 · Zbl 1314.68055
[4] Arora, A.; Kulkarni, SS, Designing masking fault-tolerance via nonmasking fault-tolerance, IEEE Trans. Softw. Eng., 24, 6, 435-450 (1998) · doi:10.1109/32.689401
[5] Awerbuch, B., Patt-Shamir, B., Varghese, G.: Bounding the unbounded. In: Proceedings IEEE INFOCOM ’94, The Conference on Computer Communications, Thirteenth Annual Joint Conference of the IEEE Computer and Communications Societies, Networking for Global Communications, Toronto, Ontario, Canada, June 12-16, 1994, pp. 776-783 (1994). 10.1109/INFCOM.1994.337661
[6] Blanchard, P.; Dolev, S.; Beauquier, J.; Delaët, S., Practically Self-stabilizing Paxos Replicated State-Machine, 99-121 (2014), Cham: Springer, Cham
[7] Chandy, KM; Lamport, L., Distributed snapshots: determining global states of distributed systems, ACM Trans. Comput. Syst., 3, 1, 63-75 (1985) · doi:10.1145/214451.214456
[8] Chandy, KM; Misra, J., Parallel Program Design—A Foundation (1989), Reading: Addison-Wesley, Reading
[9] Dasgupta, A.; Ghosh, S.; Xiao, X.; Masuzawa, T.; Tixeuil, S., Probabilistic fault-containment, Stabilization, Safety, and Security of Distributed Systems, 9th International Symposium, 2007, Paris, France, November 14-16, 2007, Proceedings, Lecture Notes in Computer Science, 189-203 (2007), New York: Springer, New York
[10] Dijkstra, EW, Self-stabilizing systems in spite of distributed control, Commun. ACM, 17, 11, 643-644 (1974) · Zbl 0305.68048 · doi:10.1145/361179.361202
[11] Dijkstra, EW; Scholten, CS, Termination detection for diffusing computations, Inf. Process. Lett., 11, 1, 1-4 (1980) · Zbl 0439.68039 · doi:10.1016/0020-0190(80)90021-6
[12] Dolev, S., Georgiou, C., Marcoullis, I., Schiller, E.M.: Self-stabilizing virtual synchrony. In: Proceedings of the Stabilization, Safety, and Security of Distributed Systems—17th International Symposium, SSS 2015, Edmonton, AB, Canada, August 18-21, 2015, pp. 248-264 (2015). 10.1007/978-3-319-21741-3_17 · Zbl 1428.68063
[13] Fidge, C.J.: Timestamps in message-passing systems that preserve the partial ordering. In: Proceedings of the 11th Australian Computer Science Conference, vol. 10, no. 1, pp. 56-66 (1988)
[14] Fischer, MJ; Lynch, NA; Paterson, M., Impossibility of distributed consensus with one faulty process, J. ACM, 32, 2, 374-382 (1985) · Zbl 0629.68027 · doi:10.1145/3149.214121
[15] Garcia-Luna-Aceves, JJ, Loop-free routing using diffusing computations, IEEE/ACM Trans. Netw., 1, 1, 130-141 (1993) · doi:10.1109/90.222913
[16] Ghosh, S., Distributed Systems: An Algorithmic Approach (2014), London: Chapman & Hall, London
[17] Ghosh, S.; Gupta, A.; Herman, T.; Pemmaraju, SV, Fault-containing self-stabilizing distributed protocols, Distrib. Comput., 20, 1, 53-73 (2007) · Zbl 1266.68064 · doi:10.1007/s00446-007-0032-2
[18] Katz, S.; Perry, KJ, Self-stabilizing extensions for meassage-passing systems, Distrib. Comput., 7, 1, 17-26 (1993) · Zbl 1282.68077 · doi:10.1007/BF02278852
[19] Kulkarni, S.S., Arora, A.: Multitolerance in distributed reset. Chicago J. Theor. Comput. Sci. (1998). http://cjtcs.cs.uchicago.edu/articles/1998/4/contents.html · Zbl 0924.68006
[20] Lamport, L., Time, clocks, and the ordering of events in a distributed system, Commun. ACM, 21, 7, 558-565 (1978) · Zbl 0378.68027 · doi:10.1145/359545.359563
[21] Lamport, L.; Lynch, NA, Distributed Computing: Models and Methods (1990), Cambridge: MIT Press, Cambridge
[22] Lee, S.; Muhammad, RM; Kim, C., A Leader Election Algorithm Within Candidates on Ad Hoc Mobile Networks, 728-738 (2007), Berlin: Springer, Berlin
[23] Mattern, F.: Virtual time and global states of distributed systems. In: Cosnard M (ed.) Parallel and Distributed Algorithms, pp. 215-226. North-Holland (1989)
[24] Valapil, V.T., Kulkarni, S.S.: Preserving stabilization while practically bounding state space. In: 13th European Dependable Computing Conference, EDCC 2017, Geneva, Switzerland, September 4-8, 2017, pp. 26-33 (2017). 10.1109/EDCC.2017.13
[25] Vasudevan, S., Kurose, J.F., Towsley, D.F.: Design and analysis of a leader election algorithm for mobile ad hoc networks. In: 12th IEEE International Conference on Network Protocols, Berlin, Germany, pp. 350-360. IEEE Computer Society (2004). 10.1109/ICNP.2004.1348124
[26] Yingchareonthawornchai, S., Kulkarni, S.S., Demirbas, M.: Analysis of bounds on hybrid vector clocks. In: OPODIS 2015, December 14-17, 2015, Rennes, France, pp. 34:1-34:17 (2015). 10.4230/LIPIcs.OPODIS.2015.34 · Zbl 1380.68070
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.