Backpressure and queueing

When every candidate node for a (database, user, role) is at its max_connections, halephant queues the checkout instead of returning an immediate error. The client waits until capacity frees up or checkout_timeout expires.

Checkout lifecycle

Pop an idle connection from a candidate node's pool if available.
Open a new connection if the node is below max_connections.
Enqueue on the shared (database, user, role) wait queue if every candidate is full.

Candidates are tried in least-connections order. Replica queues are shared across every replica node for the same (database, user) — a waiter that enqueued while replica A was full is served by replica B if B frees up first.

Each checkout has a single timeout budget (checkout_timeout). Spurious wakeups do not reset the clock. On shutdown, all waiters are woken immediately so drain time stays bounded.

Configuration

halephant.toml

toml

[server]
checkout_timeout = "30s"    # server-wide default

[cluster.main.pool.api]
checkout_timeout = "2s"     # per-pool override: fail fast for OLTP

[cluster.warehouse.pool.analytics]
checkout_timeout = "5m"     # per-pool override: patient analytics clients

Diagnosing contention

Queue state: GET /admin/queues returns non-empty queues with depth and oldest wait duration.

Waiting clients: GET /admin/clients — look for "state": "waiting" and the waiting_for field.

Metrics:

Metric	What it means
`halephant.client.connections{state="waiting"}`	Clients currently blocked. Sustained `> 0` means regular queueing.
`halephant.client.queue_depth`	Per-queue depth by `(database, user, role)`.
`halephant.client.wait_duration`	Time clients spent blocked (seconds). Only recorded for checkouts that waited.
`db.client.connection.errors{error.type="checkout_timeout"}`	Checkout failures from exhausted budget.

Backpressure and queueing ​

Checkout lifecycle ​

Configuration ​

Diagnosing contention ​

Backpressure and queueing

Checkout lifecycle

Configuration

Diagnosing contention