Esiry.com
Focus on Machine Learning.

Consistent Hash Algorithm and Highly Available Cluster Agent

Assume that N is the number of background service nodes. When the current station carries the keyword key to initiate the request, we usually hash the key and then use the modulo operation (hash(key)%N) to distribute the request to different nodes.

In the scenario where the foreground request is insensitive to the background stateless service node, as long as the request key has a certain randomness, even if the node dynamically adds or deletes, the algorithm can achieve a good load balancing effect in the background.

But for distributed caches, or distributed databases and other scenarios, the above approach is not appropriate. The addition and deletion of background nodes will cause remapping of almost all keys. Thus, in the case of distributed caching, cache miss occurs; in the case of distributed databases, data is garbled, and its impact is catastrophic.

The goal of the consistent hash algorithm is when K request keys initiate a request. Adding or removing nodes in the background will only cause the K/N key to be remapped. That is, the consistency hash algorithm, when the background node is stable, the node mapped to each request of the same key is the same. When the background node increases or decreases, the algorithm tries to map the K keys to the same nodes as before.

1)Consistent hash algorithm principle

A consistent hash algorithm maps each Node node to the same circle. The key of each Node is hashed to get an array of integers. After sorting the array, the end-to-end is a circle. As shown in the figure below, the key of Node is distributed on different arcs of the circle. Similarly, if there is a request key, the hash falls into an arc of the circle (shown in the triangle below), and the node closest to it is clockwise to be its service node (Node 2 below). Thus each node covers an arc segment from the previous node to itself on the circle. If a node fails, the request that previously falls into its arc interval will move clockwise to the node adjacent to it (the following figure fails, such as Node2, the request that falls into the arc of Node3 to Node2 will fall into Node1). Nodes that do not fall within the failed arc interval are unaffected (previously falling into the Node2 to Node3 arc segment request, when Node2 fails, it is not affected). The scenario of adding a node is similar. The new node carries a new interval, so that the request falling into the failed node to the new node will be carried by the new node.
Consistent Hash Algorithm and Highly Available Cluster Agent
In the case where the node is fixed, in order to increase the uniformity and dispersion of the distribution of the nodes on the circle, the replicas of the nodes can be set. The figure below sets the replicas to 2, and the range of arcs carried by each node is finer and more distributed overall. Therefore, proper adjustment of the parameters of the replica can improve the balance of the algorithm.
Consistent Hash Algorithm and Highly Available Cluster Agent

2)Golang Consistent Hash Algorithm Implementation Code

The hash function of this article is to do md5Sum on the key first, then use crc32 to do checkSum to get a positive number.

package consistent_hashing
import (
    "crypto/md5"
    "hash/crc32"
    "strconv"
    "sort"
    "sync"
)
type Node struct {
    Id      string
    Address string
}
type ConsistentHashing struct {
    mutex    sync.RWMutex
    nodes    map[int]Node
    replicas int
}
func NewConsistentHashing(nodes []Node, replicas int) *ConsistentHashing {
    ch := &ConsistentHashing{nodes: make(map[int]Node), replicas: replicas}
    for _, node := range nodes {
        ch.AddNode(node)
    }
    return ch
}
func (ch *ConsistentHashing) AddNode(node Node) {
    ch.mutex.Lock()
    defer ch.mutex.Unlock()
    for i := 0; i < ch.replicas; i++ {
        k := hash(node.Id + "_" + strconv.Itoa(i))
        ch.nodes[k] = node
    }
}
func (ch *ConsistentHashing) RemoveNode(node Node) {
    ch.mutex.Lock()
    defer ch.mutex.Unlock()
    for i := 0; i < ch.replicas; i++ {
        k := hash(node.Id + "_" + strconv.Itoa(i))
        delete(ch.nodes, k)
    }
}
func (ch *ConsistentHashing) GetNode(outerKey string) Node {
    key := hash(outerKey)
    nodeKey := ch.findNearestNodeKeyClockwise(key)
    return ch.nodes[nodeKey]
}
func (ch *ConsistentHashing) findNearestNodeKeyClockwise(key int) int {
    ch.mutex.RLock()
    sortKeys := sortKeys(ch.nodes)
    ch.mutex.RUnlock()
    for _, k := range sortKeys {
        if key <= k {
            return k
        }
    }
    return sortKeys[0]
}
func sortKeys(m map[int]Node) []int {
    var sortedKeys []int
    for k := range m {
        sortedKeys = append(sortedKeys, k)
    }
    sort.Ints(sortedKeys)
    return sortedKeys
}
func hash(key string) int {
    md5Chan := make(chan []byte, 1)
    md5Sum := md5.Sum([]byte(key))
    md5Chan <- md5Sum[:]
    return int(crc32.ChecksumIEEE(<-md5Chan))
}

3)Uniformity Analysis

When building a service node, to simulate the distribution of node keys on a circle, simply use id(0,1,2) as the initial key and replicas as 100. According to the point spacing, the circle is divided into equal proportions to obtain its position, and the colored dots are its corresponding nodes (red, green and blue correspond to 0, 1, 2);

The triangle point represents the three strings (10.10.10.10, 10.10.20.11, 10.10.30.12) of the external request, and the service node obtained by the algorithm after hashing;

Use the Python matplotlib tool to draw the plot as follows.

As can be seen from the figure, although there are few nodes (3), after the amount of replicas is enlarged, the distribution of keys has certain uniformity and dispersion, and the final landing node of the external key request is relatively uniform in the overall service node.

consistent-hashing-circle

4)Golang High Availability Cluster Proxy Code

Consistent hashing algorithms have a wide range of usage scenarios. Such as request splitting and load balancing, distributed caching, distributed storage and so on. The following code calls the Golang reverse proxy class library, combined with the above consistent hash algorithm, according to the request header tag for distribution, a few lines of code, you can build a small and highly available proxy server.

package main
import (
    "consistent_hashing"
    "net/http"
    "net/http/httputil"
    "net/url"
)
func main() {
    nodes := []consistent_hashing.Node{
        {"0", "http://10.10.1.10/"},
        {"1", "http://10.10.1.11/"},
        {"2", "http://10.10.1.12/"},
    }
    ch := consistent_hashing.NewConsistentHashing(nodes, 100)
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        sign := r.Header.Get("sign")
        node := ch.GetNode(sign)
        uri, _ := url.Parse(node.Address)
        httputil.NewSingleHostReverseProxy(uri)
    })
    http.ListenAndServe(":8080", nil)
}

Reference material
[1] https://en.m.wikipedia.org/wiki/Consistent_hashing
[2] http://michaelnielsen.org/blog/consistent-hashing/
[3] http://www8.org/w8-papers/2a-webserver/caching/paper2.html

Your support will encourage us to be creative continuously!

Use [WeChat] Scan QR code for Appreciation

Use [Alipay] Scan QR code for Appreciation

Jumping to PayPal...
赞(0)
Please indicate the source:Esiry » Consistent Hash Algorithm and Highly Available Cluster Agent

Comment 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址