Skip to content

Commit 1f4cf9f

Browse files
Berik AshimovBerik Ashimov
authored andcommitted
docs(bulkhead): align design doc with shipped HSET+HLEN model + race-note in guide
1 parent 02be119 commit 1f4cf9f

2 files changed

Lines changed: 8 additions & 6 deletions

File tree

docs/guide/bulkhead.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,5 +103,6 @@ unless at least one `Bulkhead(metrics=True)` is constructed.
103103
- Fairness is not guaranteed — waiters are not served strictly FIFO.
104104
- Nested same-name acquires in the same task work, but can deadlock if
105105
`limit` is too small; avoid them.
106+
- Under heavy contention the Redis backend may briefly reject acquires even when capacity exists (several clients race on `HSET` + `HLEN`, observe the counter over the limit, and all roll back). Set `max_wait > 0` so waiters retry through the burst.
106107
- The Redis backend does not provide Redlock-strength guarantees — if that
107108
matters, wrap a strict-mode lock around the call yourself.

docs/plans/2026-04-18-tier2-bulkhead-design.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -52,15 +52,16 @@ In context-manager form there is no HTTP mapping — the caller catches `Bulkhea
5252

5353
### Redis lease-TTL model
5454

55-
Each `acquire` on the Redis backend:
55+
Each `acquire` on the Redis backend uses a hash per bulkhead name. Every occupied slot is one field in the hash; the field name is a random lease ID, and the field value is the acquisition timestamp (seconds, float).
5656

57-
1. Atomic `INCR hawkapi:bulkhead:{name}` + `SET hawkapi:bulkhead:{name}:lease:{uuid}` with TTL (default 30 s, configurable).
58-
2. If post-INCR value > `limit`: `DECR` + `DEL` the lease + `BulkheadFullError` (or retry loop if `max_wait > 0`, backing off by `min(max_wait, 10 ms)`).
59-
3. On `release`: `DECR hawkapi:bulkhead:{name}` + `DEL` the lease key.
57+
1. `HSET hawkapi:bulkhead:{name} {lease_id} {timestamp}` — insert a lease field.
58+
2. `HLEN hawkapi:bulkhead:{name}` — count occupied slots (done in the same pipeline as step 1).
59+
3. If `HLEN` > `limit`: `HDEL` the just-set field (roll back) + raise `BulkheadFullError` (or retry with `asyncio.sleep(poll_interval)` if `max_wait > 0`).
60+
4. On `release`: `HDEL hawkapi:bulkhead:{name} {lease_id}`.
6061

61-
If a worker crashes mid-hold, the lease key expires by TTL. A reaper (`Bulkhead.reap_expired_leases(name)` method, plus a future `hawkapi bulkhead reap` CLI subcommand) reconciles the counter with still-living lease keys.
62+
If a worker crashes mid-hold, the lease field remains in the hash. A reaper (`RedisBulkheadBackend.reap_expired_leases(name)`) scans `HGETALL`, parses each value as a float timestamp, and deletes fields older than `lease_ttl`.
6263

63-
This is the standard "sloppy distributed semaphore" pattern. Correct enough for capacity control; bounded over-capacity window is `≤ TTL` during a crash. Redlock-level correctness is explicitly not a goal.
64+
Under heavy contention several clients may simultaneously `HSET` + observe `HLEN > limit` + roll back — producing false-negative rejections (acquire fails even though a slot is momentarily free). This is deliberate "sloppy semaphore" behaviour; `max_wait > 0` amortises it via polling.
6465

6566
### Error class
6667

0 commit comments

Comments
 (0)