Get Started. It's Free
or sign up with your email address
ZooKeeper by Mind Map: ZooKeeper

1. challenges

1.1. difficulties of writing concurrent code within a single application

1.1.1. e.g. processes & threads

1.2. Unreliable network connections

1.2.1. latency can vary widely, bandwidth can change over time

1.2.2. connection can be lost

1.2.3. How to handle when connection between two groups of system is lost

1.2.3.1. enter a degraded state

1.2.3.1.1. quorum algorithm

1.2.3.1.2. odd number of servers in total

1.2.3.1.3. the group with odd number of servers remain functional

1.2.3.1.4. the group with even number of servers becomes read-only or has degraded functionality

1.3. Clock synchronization

1.3.1. the times on the machines are different

1.3.2. most circumstances, servers are synchronized using the network time protocol (NTP)

1.3.3. might be wrong at international size

2. history solutions

2.1. two categories

2.1.1. message passing

2.1.2. shared storage

2.2. paxos algorithm

2.2.1. basic Paxos algorithm

2.2.1.1. procedure

2.2.1.1.1. a process, called proposer, that wants to mutate state first submits a proposal to the cluster

2.2.1.1.2. transmitted to a quorum of other processes, called acceptors

2.2.1.1.3. after proposer process receives enough positive responses

2.2.1.1.4. then sends the actual change to the acceptors

2.2.1.1.5. acceptors propagate the information to other nodes, called learners, in the cluster

2.2.1.2. the proposal whose proposal is accepted will be elected as leader

2.2.2. paxos algorithm notoriously to implement in practice

2.2.3. multi Paxos algorithm are usually used in practice.

2.3. similar systems

2.3.1. consul

2.3.2. etcd

2.3.3. eureka

3. Use case

3.1. master election

3.1.1. def: the process of deciding who is the master

3.1.2. steps

3.1.2.1. have all the process go and create sequential ephemeral node

3.1.2.2. znode with smallest sequence number is the leader

3.1.2.3. setup watch for changes

3.2. crash detection

3.2.1. def: the master need to detect when workers crash

3.2.2. steps

3.2.2.1. have slaves create ephemeral znodes

3.2.2.2. setup watcher on the znodes

3.2.3. overview

3.2.3.1. heartbeat, implement through ephemeral znode

3.2.3.2. Untitled

3.2.3.2.1. /app1/c1 connects to /app1

3.2.3.2.2. /app1/c2 connects to /app1

3.2.3.2.3. port

3.3. metadata management

3.3.1. def: all the nodes must be able to reliably store/retrieve status

3.3.2. store metadata in persistent znode

3.4. group membership

3.4.1. def: the master must learn who is available for tasks

4. Def

4.1. a high-available, high-performance coordination service

4.1.1. inspired by google chubby

4.1.2. developed in Yahoo using Java

4.1.3. distributed system that provides

4.1.3.1. strong consistency

4.1.3.2. ordering

4.1.3.3. durability guarantees

4.1.3.4. the ability to implement typical synchronization primitives

4.1.3.5. a simpler way of dealing with concurrency

4.1.4. a cluster

4.1.4.1. what does leader do

4.1.4.1.1. type of messages

4.1.4.1.2. heartbeat and accept request

4.1.4.2. what does follower do

4.1.4.2.1. receive proposal message from leader

4.1.4.2.2. check if the leader is the correct leader

4.1.4.2.3. check if the transaction is in correct order

4.1.4.2.4. send acknowledgement back to leader

4.1.4.2.5. receive commit message from leader

4.1.4.2.6. apply change to the data tree

4.1.4.2.7. forward write request from client

4.1.4.2.8. respond to client

4.2. a stripped-down file system that exposes a few simple operations and some extra abstractions such as ordering and notifications

4.2.1. no need to provide open, close, or seek operations because files are small, read or written entirely

4.2.2. does not have files and directories, but a unified concept of node, called

4.2.2.1. a container of data like a file and a container of other znodes

4.2.2.2. Untitled

5. architecture

5.1. Coordinate with shared storage

5.1.1. Zookeeper use this model to implement coordination

5.1.2. more straightforward

5.1.3. still relies on network to process data

5.2. Data tree

5.2.1. overview

5.2.1.1. organized in hierarchical structure

5.2.1.2. similar to file system

5.2.1.3. Untitled

5.2.2. basic unit for a data tree

5.2.2.1. Znode

5.2.2.1.1. data model

5.2.2.1.2. types

5.2.2.1.3. supported APIs

5.2.3. monitor znode

5.2.3.1. watch

5.2.3.1.1. use case

5.2.3.1.2. characteristics

5.2.4. API

5.2.4.1. Untitled

5.3. ACLs

5.3.1. defines who can perform certain operations on it

5.3.2. consist of

5.3.2.1. authentication scheme

5.3.2.2. an identity of that scheme

5.3.2.3. a set of permissions

5.3.3. Untitled

5.4. internal architecture

5.4.1. Client-server interaction

5.4.1.1. connection

5.4.1.1.1. TCP connection

5.4.1.1.2. client only connects to server with newer/equal state

5.4.1.1.3. configurable session timeout

5.4.1.2. read

5.4.1.2.1. read can be read from any of the servers

5.4.1.3. write

5.4.1.3.1. write requests will be forwarded to the leader of zookeeper cluster

5.4.2. zxid

5.4.2.1. a 64-bit integer id

5.4.2.1.1. first 32-bit, who is the leader

5.4.2.1.2. last 32-bit, incrementing update status

5.4.2.2. node status

5.4.2.2.1. if zxid is higher, the state is newer

5.4.2.2.2. generated when there is a new update of znode

5.4.3. leader election within zookeeper

5.4.3.1. Zookeeper cluster AKA ensemble

5.4.3.2. server has the following mode

5.4.3.2.1. leader

5.4.3.2.2. follower

5.4.3.2.3. observer

5.4.3.3. server has the following state

5.4.3.3.1. looking

5.4.3.3.2. leading

5.4.3.3.3. following

5.4.3.4. vote

5.4.3.4.1. vote for others in the beginning

5.4.3.4.2. vote for the node with latest zxid

5.4.3.4.3. Untitled

5.4.4. state replication within zookeeper

5.4.4.1. follows a consensus protocol called zab

5.4.4.2. Zab algorithm

5.4.4.2.1. Implementation

5.4.4.2.2. Zab vs Paxos

5.4.4.2.3. Consistency

6. Operation modes

6.1. standalone mode

6.2. replicated mode

6.2.1. core concepts

6.2.1.1. guarantee that every modification to the tree of nodes is replicated to a majority of the ensemble

6.2.1.2. mental model

6.2.1.2.1. clients connected to ZooKeeper servers that are following the leader

6.2.2. high available: runs on a collection of machines called an ensemble

6.2.2.1. as long as a majority of machines are up

6.2.2.1.1. can provide a service

6.2.2.2. quorum size

6.2.2.2.1. generally speaking, no need to have a ZooKeeper cluster larger than five nodes

6.2.2.2.2. usually choose 3 to 5 nodes

6.2.3. Sessions

6.2.3.1. kept alive by the client sending ping requests

6.2.3.2. timeout

6.2.3.2.1. low session timeouts leads to faster detection of machine failure

6.2.3.2.2. long session timeouts are applicable for more complex ephemeral state

6.2.3.2.3. general rule: