Consensus Solutions with ZooKeeper

Broad Vision of the Topic


Coordination between distributed processes in ZooKeeper is based on a shared hierarchical name space of data registers (znodes). Main application of a service is Consensus, Group management, Presence protocol.

Key features:

  • High throughput
  • Low latency
  • Highly availabile
  • Strictly ordered access to the znodes

The Tao of ZooKeeper

  • Order. There are schedules to be kept and procedures that need to be followed. Total order for the changes.
  • Reliability.
  • Efficiency.
  • Time.
  • Avoid Contention.
  • Ambition Free.

A service itself is replicated over a set of machines that comprise the service. Providing high throughput and low latency is simply implemented on ZooKeeper by means of in-memory data storing, that's why amount of data stored on znodes is quite small.

Clients only connect to a single ZooKeeper server. All communication for clients are maintained through TCP connection. ZooKeeper sets the connection for client to the server and if there is necessity to change the server, connection could be easily reestablished.

Read requests by ZooKeeper users are processed locally on its server, while write one are forwarder to other ZooKeeper servers and go through a consensus before the response is granted.

Finally, order in Zookeeper is very important, it uses total order. Each update has its own unique zxid. Reads are ordered in respect to updates.

What can you ACTUALLY do with ZK?

Out of the box:

  • Leader election
  • Group membership
  • Configuration maintenance

You can implement these services using ZK (see ZK Recipes):

  • Barriers
  • Queues
  • Locks






Implementation and Experiments

Applications with ZooKeeper

Stress Tests

Results and Conclusions

ZooKeeper is:

  • Coordination Service
  • Highly Available
  • Low Latency
  • Wait-Free

Developed applications:

  • Dynamic Configuration Management for LogBack
  • Reliable Multicast using group membership and leader election

Stress test results:

  • Prefer more servers for high availability and higher reads/writes ratio
  • Low latency even with more servers



ZooKeeper - A Reliable, Scalable Distributed Coordination System

ZooKeeper - A Reliable, Scalable Distributed Coordination System

Ideas for the article: ZooKeeper works using distributed processes to coordinate with each other through a shared hierarchical name space that is modeled after a file system. Data is kept in memory and is backed up to a log for reliability. Using a memory based system also mean you are limited to the amount of data that can fit in memory, so it's not useful as a general data store. Replication is used for scalability and reliability which means it prefers applications that are heavily read based.

It uses a version of the famous Paxos Algorithm to keep replicas consistent in the face of the failures most daunting.

  • Watches are ordered with respect to other events, other watches, and asynchronous replies. The ZooKeeper client libraries ensures that everything is dispatched in order.
  • A client will see a watch event for a znode it is watching before seeing the new data that corresponds to that znode.
  • The order of watch events from ZooKeeper corresponds to the order of the updates as seen by the ZooKeeper service.

ZooKeeper: Wait-free coordination for Internet-scale systems

ZooKeeper Presentations

zookeeper_research.txt · Last modified: 2012/05/31 12:15 by julia
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki