feat(raft): MuxAcceptor — single port per node (M2.T3)#591
Merged
Conversation
…art 1)
The first piece of M2.T3: a multiplexed transport so every Pebble
shard's raft group can share one TCP port instead of binding
shardIdx-offset listeners. Reduces operational complexity (one
firewall rule, one network policy entry per node) without changing
the hashicorp/raft protocol — the wire shape is just a 4-byte BE
group ID prefix followed by raft's existing NetworkTransport bytes.
Layered to plug into hraft.NewNetworkTransport via its StreamLayer
abstraction:
acceptor := NewMuxAcceptor(":7000", logOut)
shard0 := acceptor.RegisterGroup(0) // hraft.StreamLayer
shard1 := acceptor.RegisterGroup(1)
transport0 := hraft.NewNetworkTransport(shard0, ...)
transport1 := hraft.NewNetworkTransport(shard1, ...)
What the acceptor does:
- Opens one TCP listener on bindAddr.
- Each accepted connection's first 4 bytes are read as a BE uint32
group ID (1-second deadline, then handed off raw to the matching
registered StreamLayer's accept queue).
- Unknown group IDs close the connection silently — a malformed peer
doesn't block the route goroutine.
- Close() unwinds the listener + all registered StreamLayer queues;
pending Accept() calls return immediately.
Each StreamLayer:
- Accept() pops connections routed to its group; blocks until one
arrives or the acceptor closes.
- Dial(addr, timeout) opens a TCP connection to addr, writes the
group ID prefix, then yields the raw net.Conn so hashicorp/raft's
NetworkTransport runs its handshake on top.
Wire format: 4-byte BE uint32 group ID, then raft.NetworkTransport
bytes. Backward-incompatible with hashicorp/raft's stock
NewTCPTransport (which doesn't write the prefix), so M2.T3-part2
(wiring) must flip all shards atomically — not a rolling upgrade.
Tests (5 passing):
- TwoGroupsRouteIndependently: dial + accept round-trip per group
with no crossover.
- DuplicateRegistrationErrors: same groupID twice → error.
- UnknownGroupClosesConn: connection with an unregistered groupID
is dropped cleanly.
- AcceptUnblocksOnClose: pending Accept returns on acceptor.Close.
- ConcurrentTraffic: 10 simultaneous dials per group × 2 groups
all route correctly under contention.
Next: wire MuxAcceptor into application_pebble.go's raft startup so
all shards share one port. That's M2.T3 part 2.
The big win behind M2.T3: every Pebble shard's raft group now shares one TCP listener per node when cfg.Raft.MuxEnabled=true. The non-mux path keeps the M1/M2 per-shard +offset behavior so existing deployments don't break. Wiring: - internal/raft.Config gains StreamLayer (optional hraft.StreamLayer). openInternal uses hraft.NewNetworkTransport on top when set, otherwise falls back to hraft.NewTCPTransport with cfg.BindAddr. - pkg/config.RaftConfig gains MuxEnabled bool (default false) + RAFT_MUX_ENABLED env override. - pkg/app/application_pebble.go: when cfg.Raft.MuxEnabled, opens one MuxAcceptor at cfg.Raft.BindAddr, registers a group per shardIdx, passes the StreamLayer through to raftpkg.OpenWithPebble. Every shard binds the same port (the acceptor's); peers come through cfg.Raft.Peers unchanged (no per-shard offset). Non-mux path unchanged. - Shutdown order: raft.Close → muxAcceptor.Close → pebble.Close. cleanupStartupFailure and TracingShutdown both honor the order. Wire format: 4-byte BE uint32 group ID prefix, then raft's NetworkTransport bytes. Incompatible with hraft's stock TCPTransport (which doesn't write the prefix), so flipping MuxEnabled on a live cluster requires re-bootstrap. M1/M2 deployments stay on the legacy path until they opt in. Tests: - TestRaft_Mux_3Node_4Shard: 3 nodes × 4 shards = 12 raft groups across just 3 listeners. Same failover semantics as TestRaft_MultiShard_3Node (kill node, re-elect, 60 tasks consistent on survivors) but using mux throughout. Runs in ~2.8 s. - Pre-existing raft tests (non-mux path) all still pass — the legacy flag-off route is unchanged. This closes M2.T3 as a feature: future deployments use mux for the cleaner single-port story, legacy ones get there at next re-bootstrap.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes M2.T3. Every Pebble shard's raft group can now share one TCP listener per node, opt-in via `cfg.Raft.MuxEnabled` (env: `RAFT_MUX_ENABLED=true`). 4-shard 3-node deployment goes from 12 listeners to 3.
The non-mux path (M1/M2 per-shard +offset) stays intact so existing deployments don't break — flipping the flag on a live cluster requires re-bootstrap because the wire format gains a 4-byte BE group ID prefix.
Architecture
```
node-1
└── TCP listener :7000 (MuxAcceptor)
├── conn[groupID=0] → shard 0 raft group
├── conn[groupID=1] → shard 1 raft group
├── conn[groupID=2] → shard 2 raft group
└── conn[groupID=3] → shard 3 raft group
```
Wire format on each accept: 4-byte BE `uint32` group ID, then raft's existing `NetworkTransport` protocol takes over. No new RPCs, no protobuf — the demux is at the bottom of the stack.
What landed
Part 1 — `MuxAcceptor` foundation (commit 1 in the branch)
Part 2 — wireup (commit 2)
Test plan
🤖 Generated with Claude Code