Skip to content

gyounes/consistent-hashing

Repository files navigation

Consistent-Hasher

Python Tests License Code Style

A high-performance, thread-safe Consistent Hashing library for Python 3.13+.

Consistent hashing is a strategy used to distribute keys across a dynamic set of nodes (servers). This implementation minimizes data migration when nodes are added or removed from the cluster by using a "Hash Ring" and virtual nodes.

Features

  • Thread-Safe: Uses threading.Lock to ensure integrity during concurrent add/remove operations.
  • Efficient Lookups: Utilizes the bisect module for successor searches on the ring.
  • Weighted Nodes: Support for heterogeneous clusters by assigning different replica counts (weights) per node.
  • Modern Python: Built for Python 3.13+ using uv for lightning-fast dependency management.

Installation

Since this is a local package, you can install it into your project using uv:

uv add ./path/to/consistent-hasher

Quick Start

from hashring import HashRing, EmptyRingError

# Initialize with 100 virtual nodes (replicas) per physical node
ring = HashRing(nodes=["server-1", "server-2", "server-3"], default_replicas=100)

# Add a high-capacity node with more weight
ring.add_node("server-4-large", weight=500)

try:
    # Map a key (e.g., a filename or user ID) to a node
    target_node = ring.get_node("user_profile_pic_123.jpg")
    print(f"File stored on: {target_node}")
except EmptyRingError:
    print("The ring is empty!")

# Remove a node (e.g., server maintenance)
ring.remove_node("server-1")

How It Works

The Hash Ring

The library maps every physical node to multiple points on a circle (the "Ring") using MD5 hashes. When searching for a node for a specific key:

  1. The key is hashed to a position on the same circle.
  2. The library searches clockwise to find the first virtual node hash that is greater than or equal to the key's hash.
  3. If the key's hash is greater than the highest hash in the ring, it "wraps around" to the first node.

Weighted Replicas

By default, the library creates 100 "virtual nodes" for every physical node. This ensures that if a node fails, its load is distributed evenly among the remaining neighbors rather than overwhelming just one.

Development

Running Tests

This project uses pytest for quality assurance.

uv run pytest

Logging

The library uses Python's standard logging module. To see the internal state changes, configure your logger:

import logging
logging.basicConfig(level=logging.INFO)

Performance & Distribution

Key Distribution Balance

With 100 virtual nodes per physical node, keys distribute evenly:

  • Coefficient of Variation: ~15-18% (lower is better)
  • Standard Deviation: ~350-400 keys for 10,000 total keys across 5 nodes

Minimal Key Movement

Consistent hashing minimizes redistribution when nodes change:

Adding a node (5 → 6 nodes):

  • Keys moved: ~20% (close to theoretical minimum of 16.67%)
  • Only keys from neighboring virtual nodes redistribute
  • Compare to naive hashing: 80-100% of keys would move!

Removing a node (6 → 5 nodes):

  • Keys moved: ~13-20% (depending on which node removed)
  • Removed node's keys redistribute to neighbors
  • Again, much better than naive hashing

Visualization

Run the included visualization script to see distribution:

uv run python visual.py --nodes 5 --replicas 100 --keys 10000

This generates charts showing:

  • Initial key distribution across nodes
  • Distribution after adding a node
  • Distribution after removing a node

Distribution Example

Use Cases

  • CDNs - Distribute cached content across servers, minimal cache invalidation when servers added/removed
  • Distributed Databases - Partition data across shards (Cassandra, DynamoDB use this)
  • Load Balancers - Route requests to backend servers with session affinity
  • Distributed Caching - Memcached, Redis Cluster use consistent hashing

About

Library for Consistent Hashing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages