DB::Pool for a cluster implementation

I’m working on adding cluster support to my Redis client and I’m trying to figure out how to structure it. The connection pool I’m using is DB::Pool, which I’m already using for connection pooling for a standalone Redis server.

For a bit of background, Redis clusters come in a few different flavors:

  1. 1 writable, 1 read-only replica
  2. 1 writable, 2+ read replicas
  3. 2+ writable shards, 1+ read replica per shard
Distinction between 1 and 2 1 and 2 seem like the same thing, but they tend to be different in practice. For example, with a single replica, that replica is frequently long-lived and replacing it is a notable event. But when you have _many_ replicas, you likely treat them as ephemeral so they can be replaced at any time (especially with things like fault injection/chaos engineering or routine host replacement by a cloud Kubernetes provider) and that replacement is therefore uneventful. It's often a pets-vs-cattle distinction and applications tend to codify that distinction.

I’d like to make this implementation respond to changes in server topology while the application is running, so that if a server is replaced the client won’t try to read from it. The hard part about that is that DB::Pool doesn’t provide a way to iterate over connections sitting in the pool waiting for checkout and delete them in response to those changes, so my options seem to be between these two:

  1. Use a connection pool for each server and throw away that pool when the server is removed from the cluster
    • This could lead to an explosion of socket usage across all clients
  2. Use a single connection pool for all servers of each type (one for read/write, one for read-only), and when checking one out from the pool, close+release if the cluster does not recognize it as an active server
    • This may require holding onto connections long after the topology has changed.

Has anyone implemented something like this before that could help a friend out? Do these downsides matter all that much?

The way I did this in my Neo4j shard was option 2 and throw away both read and write on any changes, but Neo4j queries tend to be more resilient there — all queries happen inside a transaction which can be automatically retry if there is a failure (similar to how some service meshes automatically retry HTTP requests), whereas Redis queries are meant to be as lightweight as possible.

Turns out, clustering Redis is even more complicated than it is for most other databases. For the 3 cluster scenarios I mentioned in my previous post, Redis handles them differently all the way down to the wire protocol.

For example, if you’re operating on a key that hashes to shard 2, my client library has to send it to shard 2 — there is no server-side routing like many other databases have. So at the very least, for option 3, every single writable shard needs its own connection pool and every set of replicas for each shard needs their own pool.

So if I’ve got 3 writable shards, each with 2 replicas, for a grand total of 9 Redis nodes, I need 6 connection pools at an absolute minimum. I suppose that answers that question. :joy: