SWIM Protocol
From Wikipedia, the free encyclopedia

The Scalable Weakly Consistent Infection-style Process Group Membership (SWIM) Protocol is a group membership protocol based on "outsourced heartbeats"[1] used in distributed systems, first introduced by Abhinandan Das, Indranil Gupta and Ashish Motivala in 2002.[2][3] It is a hybrid algorithm which combines failure detection with group membership dissemination.
The protocol has two components, the Failure Detector Component and the Dissemination Component.
The Failure Detector Component functions as follows:
- Every T' time units, each node () sends a ping to random other node () in its membership list.
- If receives a response from , is decided to be healthy and updates its "last heard from" timestamp for to be the current time.
- If does not receive a response, contacts k other nodes on its list (), and requests that they ping .
- If after T' units of time: if no successful response is received, marks as failed.
The Dissemination Component functions as follows:
- Upon detecting a failed node , sends a multicast message to the rest of the nodes in its membership list, with information about the failed node.
- Voluntary requests for a node to enter/leave the group are also sent via multicast.
Properties
The protocol provides the following guarantees:
- Strong Completeness: Full completeness is guaranteed (e.g. the crash-failure of any node in the group is eventually detected by all live nodes).
- Detection Time: The expected value of detection time (from node failure to detection) is , where is the length of the protocol period, and is the fraction of non-faulty nodes in the group.[3]