Get Started. It's Free
or sign up with your email address
ZooKeeper by Mind Map: ZooKeeper

1. challenges

1.1. difficulties of writing concurrent code within a single application

1.1.1. e.g. processes & threads

1.2. Unreliable network connections

1.2.1. latency can vary widely, bandwidth can change over time

1.2.2. connection can be lost

1.2.3. How to handle when connection between two groups of system is lost enter a degraded state quorum algorithm odd number of servers in total the group with odd number of servers remain functional the group with even number of servers becomes read-only or has degraded functionality

1.3. Clock synchronization

1.3.1. the times on the machines are different

1.3.2. most circumstances, servers are synchronized using the network time protocol (NTP)

1.3.3. might be wrong at international size

2. history solutions

2.1. two categories

2.1.1. message passing

2.1.2. shared storage

2.2. paxos algorithm

2.2.1. basic Paxos algorithm procedure a process, called proposer, that wants to mutate state first submits a proposal to the cluster transmitted to a quorum of other processes, called acceptors after proposer process receives enough positive responses then sends the actual change to the acceptors acceptors propagate the information to other nodes, called learners, in the cluster the proposal whose proposal is accepted will be elected as leader

2.2.2. paxos algorithm notoriously to implement in practice

2.2.3. multi Paxos algorithm are usually used in practice.

2.3. similar systems

2.3.1. consul

2.3.2. etcd

2.3.3. eureka

3. Use case

3.1. master election

3.1.1. def: the process of deciding who is the master

3.1.2. steps have all the process go and create sequential ephemeral node znode with smallest sequence number is the leader setup watch for changes

3.2. crash detection

3.2.1. def: the master need to detect when workers crash

3.2.2. steps have slaves create ephemeral znodes setup watcher on the znodes

3.2.3. overview heartbeat, implement through ephemeral znode Untitled /app1/c1 connects to /app1 /app1/c2 connects to /app1 port

3.3. metadata management

3.3.1. def: all the nodes must be able to reliably store/retrieve status

3.3.2. store metadata in persistent znode

3.4. group membership

3.4.1. def: the master must learn who is available for tasks

4. Def

4.1. a high-available, high-performance coordination service

4.1.1. inspired by google chubby

4.1.2. developed in Yahoo using Java

4.1.3. distributed system that provides strong consistency ordering durability guarantees the ability to implement typical synchronization primitives a simpler way of dealing with concurrency

4.1.4. a cluster what does leader do type of messages heartbeat and accept request what does follower do receive proposal message from leader check if the leader is the correct leader check if the transaction is in correct order send acknowledgement back to leader receive commit message from leader apply change to the data tree forward write request from client respond to client

4.2. a stripped-down file system that exposes a few simple operations and some extra abstractions such as ordering and notifications

4.2.1. no need to provide open, close, or seek operations because files are small, read or written entirely

4.2.2. does not have files and directories, but a unified concept of node, called a container of data like a file and a container of other znodes Untitled

5. architecture

5.1. Coordinate with shared storage

5.1.1. Zookeeper use this model to implement coordination

5.1.2. more straightforward

5.1.3. still relies on network to process data

5.2. Data tree

5.2.1. overview organized in hierarchical structure similar to file system Untitled

5.2.2. basic unit for a data tree Znode data model types supported APIs

5.2.3. monitor znode watch use case characteristics

5.2.4. API Untitled

5.3. ACLs

5.3.1. defines who can perform certain operations on it

5.3.2. consist of authentication scheme an identity of that scheme a set of permissions

5.3.3. Untitled

5.4. internal architecture

5.4.1. Client-server interaction connection TCP connection client only connects to server with newer/equal state configurable session timeout read read can be read from any of the servers write write requests will be forwarded to the leader of zookeeper cluster

5.4.2. zxid a 64-bit integer id first 32-bit, who is the leader last 32-bit, incrementing update status node status if zxid is higher, the state is newer generated when there is a new update of znode

5.4.3. leader election within zookeeper Zookeeper cluster AKA ensemble server has the following mode leader follower observer server has the following state looking leading following vote vote for others in the beginning vote for the node with latest zxid Untitled

5.4.4. state replication within zookeeper follows a consensus protocol called zab Zab algorithm Implementation Zab vs Paxos Consistency

6. Operation modes

6.1. standalone mode

6.2. replicated mode

6.2.1. core concepts guarantee that every modification to the tree of nodes is replicated to a majority of the ensemble mental model clients connected to ZooKeeper servers that are following the leader

6.2.2. high available: runs on a collection of machines called an ensemble as long as a majority of machines are up can provide a service quorum size generally speaking, no need to have a ZooKeeper cluster larger than five nodes usually choose 3 to 5 nodes

6.2.3. Sessions kept alive by the client sending ping requests timeout low session timeouts leads to faster detection of machine failure long session timeouts are applicable for more complex ephemeral state general rule: