Zookeeper Study Summary

By | February 26, 2016
Share the joy
  •  
  •  
  •  
  •  
  •  
  •  

In terms of CAP theorem, Zookeeper is a CP system. It emphasizes consistency in distributed system. It is a Leader – Follower based system. Zookeeper assumes that at least n/2+1 nodes are available. This guarantees brain-split won’t happen to zookeeper.

How to handle partial failure?
For example, there are 5 nodes. 2 of them are down. The rest of 3 keeps working. Because more than half is running. If another one is down, only 2 are working. When client connects to one of the 2 working nodes. Even though 2 of them are not down. But because 2 nodes know that less than 3 nodes are available among them, the 2 nodes will become unavailable to client. We should know that 3 out of 5 nodes are down is unacceptable to zookeeper.

How to write?
When a client submits a write request. This request will be forwarded to leader. Then leader will forward this write request to all followers. Only more than n/2+1 nodes acknowledged, then it means a write is successful. Write to data will be firstly persistent to disk, then it will be available in memory.

How to read?
Client reach a node, node directly returns the data values in-memory. Because they maybe network delay, the data in a node still may not be the up-to-date. So in order to keep a more strict consistency, we can call sync() first before read.

How to recover?
Zookeeper keeps a snapshot of each state called fuzzy snapshot. Because there may be network delay, it is possible that a state at a node doesn’t correspond to any state in leader. So we called it fuzzy snapshot. When a node is crashed and tried to recover, it will recover based on the fuzz snapshot it has and log from other nodes.

Study materials:
ZooKeeper: Wait-freecoordinationforInternet-scalesystems
http://zookeeper.apache.org/doc/r3.1.2/zookeeperInternals.html