Skip to content

feat(raft): server-side 307 redirect on ErrNotLeader#590

Merged
osvaldoandrade merged 1 commit into
mainfrom
feat/raft-307-forwarding
May 18, 2026
Merged

feat(raft): server-side 307 redirect on ErrNotLeader#590
osvaldoandrade merged 1 commit into
mainfrom
feat/raft-307-forwarding

Conversation

@osvaldoandrade

Copy link
Copy Markdown
Owner

Summary

The M2 multi-raft bench (#588) showed throughput parity-at-best vs single-shard, dominated by HTTP-level retry on `ErrNotLeader`. This PR closes that gap for the realistic case: writes that hit a follower now respond with HTTP 307 + `Location: <leader URL + original path>`. Go's `http.Client` follows 307 transparently (RFC 7231 preserves method + body) so naïve clients land on the leader in one extra round-trip instead of bouncing back with an error.

What landed

Layered to avoid circular deps:

  • `pkg/domain/errors.go` (new): `LeaderHint` interface — `error + LeaderHTTPAddr() string`. Shared vocabulary between the storage layer (where the error is constructed) and the HTTP layer (where it's interpreted).
  • `internal/repository/pebble.NotLeaderError`: typed error carrying `LeaderURL`. Satisfies `domain.LeaderHint`. `errors.Is(err, ErrNotLeader)` still matches via a `.Is` method on the type — existing callers don't change.
  • `pebble.Replicator` interface gains `LeaderHTTPAddr() string` — `raft.DB` already exposes it (PR feat(raft): leader HTTP URL in /raft/status #589). `pebble.Set/Delete/CommitBatch` build `NotLeaderError{LeaderURL: d.repl.LeaderHTTPAddr()}` when the local node isn't the leader.
  • `internal/controllers/respond.go`: `respondWriteError` + `maybeRedirectLeader` helpers. `errors.As` against `domain.LeaderHint`, write 307 + `Location` from leader URL + the original request URI.
  • Updated controllers: `create_task`, `claim_task`, `submit_result`, `heartbeat`, `nack`, `abandon`. Batch endpoints kept on the per-item error path (mixed-shard batches can't redirect cleanly — a follow-up could split by shard).

Behavior on the wire

```
POST /v1/codeq/tasks → 307 Temporary Redirect
Location: http://leader-host:8080/v1/codeq/tasks
Body: {"error":"not leader","leader":"http://leader-host:8080\"}
```

A standard Go `http.Client` follows automatically and the second POST lands on the leader returning 202.

Test plan

  • `TestRaft_307RedirectsToLeader` — 3-node single-shard cluster. Identifies leader + follower via probe writes; sends to follower with redirect-following disabled, asserts 307 + valid `Location`. Then re-sends with standard `http.Client` and asserts the 202. ~210 ms.
  • All existing raft + pebble + app tests still pass
  • Manual against the new compose template once `codeq-service:cluster` image is built — verify a curl loop with `--location` succeeds against any of the 3 nodes

🤖 Generated with Claude Code

Writes that hit a follower now respond with HTTP 307 Temporary
Redirect + Location header pointing at the leader's HTTP URL. Go's
http.Client follows 307 transparently (RFC 7231 preserves method +
body), so naïve clients land on the leader in one extra round-trip
instead of bouncing back with an error.

Wiring (layered to avoid circular deps):
- pkg/domain/errors.go (new): LeaderHint interface — just
  error + LeaderHTTPAddr() string. Shared vocabulary between the
  storage layer that constructs the error and the HTTP layer that
  interprets it.
- internal/repository/pebble.NotLeaderError: typed error with
  LeaderURL field. Satisfies domain.LeaderHint. errors.Is(err,
  ErrNotLeader) still matches because NotLeaderError.Is matches the
  sentinel — existing callers don't change.
- pebble.Replicator gains LeaderHTTPAddr() — the raft.DB already
  exposes it (PR #589). pebble's Set/Delete/CommitBatch build
  NotLeaderError{LeaderURL: d.repl.LeaderHTTPAddr()} when the local
  node isn't the leader.
- internal/controllers/respond.go: respondWriteError + maybeRedirectLeader
  helpers. errors.As against domain.LeaderHint, write 307 + Location
  built from leader URL + the original request URI.
- Controllers updated to call the helpers on service errors:
  create_task, claim_task, submit_result, heartbeat, nack,
  abandon. Batch endpoints kept on the per-item error path
  (mixed-shard batches can't redirect cleanly).

Test:
- TestRaft_307RedirectsToLeader: 3-node single-shard cluster.
  Identifies leader + follower via probe writes; sends to follower
  with redirect-following disabled, asserts 307 + Location starts
  with leader URL + ends with /v1/codeq/tasks. Then re-sends with
  standard http.Client (follows redirects automatically), asserts
  202. ~210ms run.

All existing tests still pass.
@osvaldoandrade osvaldoandrade merged commit 40c0151 into main May 18, 2026
2 checks passed
@osvaldoandrade osvaldoandrade deleted the feat/raft-307-forwarding branch May 18, 2026 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant