I’m working on adding cluster support to my Redis client and I’m trying to figure out how to structure it. The connection pool I’m using is DB::Pool
, which I’m already using for connection pooling for a standalone Redis server.
For a bit of background, Redis clusters come in a few different flavors:
- 1 writable, 1 read-only replica
- 1 writable, 2+ read replicas
- 2+ writable shards, 1+ read replica per shard
Distinction between 1 and 2
1 and 2 seem like the same thing, but they tend to be different in practice. For example, with a single replica, that replica is frequently long-lived and replacing it is a notable event. But when you have _many_ replicas, you likely treat them as ephemeral so they can be replaced at any time (especially with things like fault injection/chaos engineering or routine host replacement by a cloud Kubernetes provider) and that replacement is therefore uneventful. It's often a pets-vs-cattle distinction and applications tend to codify that distinction.I’d like to make this implementation respond to changes in server topology while the application is running, so that if a server is replaced the client won’t try to read from it. The hard part about that is that DB::Pool
doesn’t provide a way to iterate over connections sitting in the pool waiting for checkout and delete them in response to those changes, so my options seem to be between these two:
- Use a connection pool for each server and throw away that pool when the server is removed from the cluster
- This could lead to an explosion of socket usage across all clients
- Use a single connection pool for all servers of each type (one for read/write, one for read-only), and when checking one out from the pool, close+release if the cluster does not recognize it as an active server
- This may require holding onto connections long after the topology has changed.
Has anyone implemented something like this before that could help a friend out? Do these downsides matter all that much?
The way I did this in my Neo4j shard was option 2 and throw away both read and write on any changes, but Neo4j queries tend to be more resilient there — all queries happen inside a transaction which can be automatically retry if there is a failure (similar to how some service meshes automatically retry HTTP requests), whereas Redis queries are meant to be as lightweight as possible.