Skip to content

test(raft): smart-routing bench shows 6× speedup over manual rotation#594

Merged
osvaldoandrade merged 1 commit into
mainfrom
feat/raft-smart-routing-bench
May 18, 2026
Merged

test(raft): smart-routing bench shows 6× speedup over manual rotation#594
osvaldoandrade merged 1 commit into
mainfrom
feat/raft-smart-routing-bench

Conversation

@osvaldoandrade

Copy link
Copy Markdown
Owner

Summary

The previous `TestRaftBench_MultiShardScale` bench (#588) reported ~596 cycles/s on a 3-node raft cluster — and concluded multi-shard wasn't faster than single-shard. Turns out it never exercised the 307 redirect or the redirect-following client, so manual node rotation was the dominant cost. With smart routing engaged the same workload runs at ~3,949 cycles/s — a 6.6× speedup.

What landed

`pkg/app/raft_smart_routing_bench_test.go` — reproduces the SAME workload (create → claim → submit, 32 goroutines, 5 s window) with the full smart-routing stack:

  • `RAFT_MUX_ENABLED=true` on every node (single TCP listener per node)
  • `PeerHTTPAddrs` wired pre-bootstrap via the `httptest.NewUnstartedServer` + handler-swap pattern (the only way to know HTTP URLs before `NewApplication` runs)
  • `http.Client` with default `CheckRedirect` — follows 307 with POST body preserved per RFC 7231

Numbers (dev box, WSL2, load ~3.5)

Topology Cycles/s Speedup vs manual rotation
3-node × 1-shard, manual rotation (old bench) 596 baseline
3-node × 1-shard, smart routing 3,949 6.6×
3-node × 4-shard, smart routing 3,883 6.5×

Single-shard vs multi-shard ratio stays at ~1.0× — multi-raft doesn't multiply throughput at this workload because the mux acceptor serializes connection accepts across shards. Per-shard commit pipelines have plenty of headroom; the wire layer is the bottleneck. Documented as a follow-up.

Context

  • Single-node Pebble (no raft) on same hardware: 76,639 tasks/s.
  • Raft 3-node smart-routed: ~3,900 cycles/s × 3 ops/cycle ≈ 12k ops/s.
  • ≈ 6× throughput cost of consensus on 3 writes/cycle across 3 replicas.

Test plan

  • Smart-routing bench passes on dev box (~13 s)
  • No code changes outside the new test file
  • Manual: run on a 12-core bare-metal box to compare against the original 83k tasks/s single-node baseline

🤖 Generated with Claude Code

The previous TestRaftBench_MultiShardScale bench rotated nodes
manually on every retry — it never exercised the 307 redirect or the
http.Client redirect-following path, so the headline number (596
cycles/s on a 3-node raft cluster) understated reality by ~6×.

This bench reproduces the SAME workload (create → claim → submit, 32
goroutines, 5 s window) with the full smart-routing stack engaged:

- RAFT_MUX_ENABLED on every node → 1 TCP listener per node for raft
- PeerHTTPAddrs wired pre-bootstrap via httptest.NewUnstartedServer +
  handler swap (the only way to wire HTTP URLs before NewApplication
  needs them)
- http.Client with default CheckRedirect (follows 307 with POST body
  preserved per RFC 7231)

Numbers on the dev box (WSL2, load ~3.5):

  3-node × 1-shard (raft + smart routing): ~3,949 cycles/s
  3-node × 4-shard (raft + smart routing): ~3,883 cycles/s
  multi/single ratio:                       0.98x

The 6× speedup over manual rotation is the value of server-side 307
redirects. The remaining gap to single-node Pebble (76k tasks/s) is
the cost of consensus on 3 writes per cycle through 3 nodes.

Multi-shard doesn't speed up further over single-shard at this load
because the mux acceptor (one TCP listener per node) serializes
connection accepts across shards — the per-shard commit pipelines
have plenty of headroom, but the wire layer becomes the bottleneck.
Future work: per-shard listeners with mux + a connection pool, OR a
gRPC stream-multiplexed transport that doesn't serialize on accept.

Test runtime: ~13 s end-to-end (2 subtests × ~6.5 s each).
@osvaldoandrade osvaldoandrade merged commit 6f193d5 into main May 18, 2026
@osvaldoandrade osvaldoandrade deleted the feat/raft-smart-routing-bench branch May 18, 2026 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant