Distributed Systems: Logical Clocks
By Nathan B. Smith
In many computer systems which experience an exponential growth in the number of users, scalability becomes the most influential characteristic for sustainability. Logical clocks play an enabling technology in the implementation of concurreny across distributed systems. Distributed systems depend on parallelism and concurrency, which is largely based on the concepts of software threading and tasking. To leverage the the theory of threading on a much larger scale in distributed systems, logical clocks offer an important solution for keeping distributed nodes synchronized while sharing the computing power to solve a given calculation (Gorton, 2022).
The theory of logical clocks in distributed systems is complex. Leslie Lamport (2015) won the 2013 ACM Turing award based on his early work on logical clocks and the ordering of system events. It is interesting to note that most of Lamport’s work was done near 35 years before he won the Turing Award. That is, it took more than three decades to fully understand Lamport’s contributions and their practical applications (Sandén, 2021).
In an effort to fully understand this topic, it may be helpful to summarize a lecture (or “chat”) presented by Sandén (2021) as part of a course in Concurrency and Distributed Systems. Coulouris, et al. (2011) describes a distributed system is characterized by having software components located at networked computers (or nodes), which coordinate the actions py passing synchronization messages among one another. Due to the sheer magnitude of distributed systems and the amount of coordination that must occure, there is no global clock that keeps track of all messages sent and received. Software, consisting of complex algorithms, operating on distributed nodes are generally concurrent. Provisions are made in the design of these systems to deal with nodes that mail fail independently, and not as en entire system. Lamport (2015) stated that the problem of implementing a distributed system can be viewed as that of maintaining a global invariant despite different processes (or nodes) may have incompatible views of what the curent state of the system at any given time.
Complex algorithms allow the nodes of a distributed system to cooperate without a global clock. Again, studying the necessary mechanisms that allow a distributed system to operate is analougous to threading (or tasking) in the context of concurrent software applications. Logical clocks allow operation of the network in the absense of a global, “real” clock. However, distances in time are not captured. Group communications, consisting of coordination or sychnronized messages sent and received amongts distributed nodes, is enabled by logical clocks.
Lamport (1978) proposed that it is sufficient that all nodes in a distributed system agrees on a partial order of event occurences. In desribing the relationship of event occurences, it is helpful to use an arrow indicated that one event occurs before another. For example, a -> b indicates that event a occurred before event b. It must be assumed that clocked even occurences at any given node ar completely ordered: a -> b -> c -> d. Coordination or synchronization messages are sent (s) and received (r). A logical clock only describes the order of messages sent and received. In effect, what is described is the relative order of messages that are sent and received. The clock can be likened to a music score, and does not refer to something laike your grandfather clock (Sandén, 2021). In any case, the term clock is useful to describe the sequence of event occurences.
In describing the flow of event occurrences, the before relation is transitive: (a -> ) and (b->c) implies a -> c. The before relation describes a partial order of events. However not all occurrences are necessarily concurrent if: neither a -> b nor b -> a is true. A relation that is concurrent can be annotated as a || b. A set of occurences where either a -> b or b -> a is true is annotated as a ≠ b. All event occurrences at a given node comprise a single event sequence. A locking order of resources, for example, forms a partial order.
The basic or simplified implementation of Lamport’s (1987) clock is based on each node maintaining an integer incremented counter in a safe object. It follows two rules. First, if a -> b within the same node, (subscript i), then Ci(a) < Ci(b). Second, if s is the occurrence where node Pi sends a message, and r is where node Pj receives the message, then s r
and Ci (s) < Cj (r).
There is a limitation of the basic implementation described above: Ci(a) < Cj(b) does not imply a b (unless i = j). This can be overcome using a so-called vector logical clock, in which each node assigns a serial number to each (clocked) event occurrence at that node. So, a vector-clock value represents a vector with a serial number for each node. The sender of a message timestamps the message with its whole vector (its own serial number that is incremented. The receiver then increments is own serial number (Mattern, 1988).
An interesting investigation conducted by Dr. Bradley Hodgins (Colorado Technical University DCS, 2015)) proposes the use of a vector clock in a military war game. In this study, a WEPTAC (weapons tactics) simulation ran on a single computer, which could be replayed after the event based on one global clock. Distributing the simulation over a network of nodes could benefit from more processing power. In this case, the playback could be ordered by the vector clock values. (I am attempting to locate possible documentation for further research and development. It does not appear that his dissertation addressed this topic).
In conclusion, the concept of logical clocks were discussed in terms of enabling a network of distributed computing nodes. Next, we addressed to possible solutions or software implementations. The implementation of a basic solution, which is simple, but still relevant and useful. The second solution is based on vector-clocks, which is exact but requires a very long vector, which must be sent around the network. This solution is interesting but largely impractical for most purposes. As a current student of Amazon Web Services, I am interested to learn more about distributed systems theory and will research some of the vast collateral resources of respective graduate courses on the topic, particularly videos hosted on YouTube
References
Baquero, C., & Preguiça, N. (2016). Why logical clocks are easy. Communications of the ACM, 59(4), 43-47. https://doi.org/10.1145/2890782
Coulouris, G., Dollimore, J., Kindberg, T., & Blair, G. (2011). Distributed Systems: Concepts and design. Upper Saddle River, NJ: Pearson.
Gorton, I. (2022). Concurrency and scalability for distributed systems (Early Release ed.). Sebastopol, CA: O'Reilly Media.
Hoffmann, L. (2020). Reinventing virtual machines. Communications of the ACM, 63(4), 128-ff. https://doi.org/10.1145/3381947
Lamport, L. (1978). Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7), 558-565. https://doi.org/10.1145/359545.359563
Lamport, L. (2015). The computer science of concurrency: the early years. In D. Malki, & L. Lamport, Concurrency: The works of Leslie Lamport (pp. 13-26). New York, NY: Association of Computer Machinery. https://doi.org/10.1145/3335772.3335775
Lamport, L. (2015). Turing lecture: The computer science of concurrency: The early years. Communications of the ACM, 58(6), 71-76. https://doi.org/10.1145/2771951
Mattern, F. (1988). Virtual time and global states of distributed systems. Department of Computer Science. Kaiserslautern, DE: University of Kaiserslautem. http://courses.csail.mit.edu/6.852/05/papers/VirtTimeGlobStatesFull.pdf
Sandén, B. (2021, June 21). Study Guide for CS844 Concurrent and Distributed Systems. (25th Anniversary Edition). Colorado Springs, CA, USA.
Sandén, B. I. (2011). Design of multithreaded software: The entity-life modeling approach. Hoboken, NJ: John Wiley & Sons.
Sandén, B. I. (2021, August 10). CS844 Unit 4 Live Chat: Dead Lock Prevention and Logical Clocks. Colorado Springs, CO, USA: Colorado Technical University.
Simms, G. (2018). Dining Philosophers Problem with Solution. Gary Explains: https://www.youtube.com/watch?v=NbwbQQB7xNQ
Comments
Post a Comment