A high-performance, thread-safe Consistent Hashing library for Python 3.13+.
Consistent hashing is a strategy used to distribute keys across a dynamic set of nodes (servers). This implementation minimizes data migration when nodes are added or removed from the cluster by using a "Hash Ring" and virtual nodes.
- Thread-Safe: Uses
threading.Lockto ensure integrity during concurrent add/remove operations. - Efficient Lookups: Utilizes the
bisectmodule for successor searches on the ring. - Weighted Nodes: Support for heterogeneous clusters by assigning different replica counts (weights) per node.
- Modern Python: Built for Python 3.13+ using
uvfor lightning-fast dependency management.
Since this is a local package, you can install it into your project using uv:
uv add ./path/to/consistent-hasher
from hashring import HashRing, EmptyRingError
# Initialize with 100 virtual nodes (replicas) per physical node
ring = HashRing(nodes=["server-1", "server-2", "server-3"], default_replicas=100)
# Add a high-capacity node with more weight
ring.add_node("server-4-large", weight=500)
try:
# Map a key (e.g., a filename or user ID) to a node
target_node = ring.get_node("user_profile_pic_123.jpg")
print(f"File stored on: {target_node}")
except EmptyRingError:
print("The ring is empty!")
# Remove a node (e.g., server maintenance)
ring.remove_node("server-1")The library maps every physical node to multiple points on a circle (the "Ring") using MD5 hashes. When searching for a node for a specific key:
- The key is hashed to a position on the same circle.
- The library searches clockwise to find the first virtual node hash that is greater than or equal to the key's hash.
- If the key's hash is greater than the highest hash in the ring, it "wraps around" to the first node.
By default, the library creates 100 "virtual nodes" for every physical node. This ensures that if a node fails, its load is distributed evenly among the remaining neighbors rather than overwhelming just one.
This project uses pytest for quality assurance.
uv run pytest
The library uses Python's standard logging module. To see the internal state changes, configure your logger:
import logging
logging.basicConfig(level=logging.INFO)With 100 virtual nodes per physical node, keys distribute evenly:
- Coefficient of Variation: ~15-18% (lower is better)
- Standard Deviation: ~350-400 keys for 10,000 total keys across 5 nodes
Consistent hashing minimizes redistribution when nodes change:
Adding a node (5 → 6 nodes):
- Keys moved: ~20% (close to theoretical minimum of 16.67%)
- Only keys from neighboring virtual nodes redistribute
- Compare to naive hashing: 80-100% of keys would move!
Removing a node (6 → 5 nodes):
- Keys moved: ~13-20% (depending on which node removed)
- Removed node's keys redistribute to neighbors
- Again, much better than naive hashing
Run the included visualization script to see distribution:
uv run python visual.py --nodes 5 --replicas 100 --keys 10000This generates charts showing:
- Initial key distribution across nodes
- Distribution after adding a node
- Distribution after removing a node
- CDNs - Distribute cached content across servers, minimal cache invalidation when servers added/removed
- Distributed Databases - Partition data across shards (Cassandra, DynamoDB use this)
- Load Balancers - Route requests to backend servers with session affinity
- Distributed Caching - Memcached, Redis Cluster use consistent hashing
