From 45c2ce0c88614e0462d0c9be0e3f672f5c3afc3c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christoffer=20Vikstr=C3=B6m?= <cvi@heya.tech>
Date: Sun, 15 Mar 2026 21:23:29 +0100
Subject: [PATCH 01/15] specs: IPv6 and separate responses design

---
 .../specs/2026-03-15-ipv6-design.md           | 105 +++++++++++
 .../2026-03-15-separate-responses-design.md   | 174 ++++++++++++++++++
 2 files changed, 279 insertions(+)
 create mode 100644 docs/superpowers/specs/2026-03-15-ipv6-design.md
 create mode 100644 docs/superpowers/specs/2026-03-15-separate-responses-design.md

diff --git a/docs/superpowers/specs/2026-03-15-ipv6-design.md b/docs/superpowers/specs/2026-03-15-ipv6-design.md
new file mode 100644
index 0000000..5d62b2f
--- /dev/null
+++ b/docs/superpowers/specs/2026-03-15-ipv6-design.md
@@ -0,0 +1,105 @@
+# IPv6 Support Design (RFC 7252 §1, Roadmap 1.4)
+
+## Goal
+
+Remove all IPv4-only hardcoding. Support IPv6 and dual-stack operation. Address family is auto-detected from the bind/host address string — no new config fields.
+
+## Architecture
+
+The bind address string determines the socket family. `"0.0.0.0"` creates AF_INET, `"::"` creates AF_INET6 with `IPV6_V6ONLY=0` (dual-stack, accepts both v4 and v6 clients). The hot path gains one well-predicted branch in `decode_recv()` to interpret the sockaddr.
+
+## Changes by File
+
+### Io.zig
+
+**`setup()`:**
+- Remove the `AF.INET` guard and `UnsupportedAddressFamily` error.
+- Create socket with `address.any.family` instead of hardcoded `AF.INET`.
+- For `AF.INET6`: set `IPV6_V6ONLY` to `0` (dual-stack) via `setsockopt`.
+- `bind()` already uses `address.getOsSockLen()` — no change needed.
+
+**`decode_recv()`:**
+- Read the family from the first 2 bytes of the name area in the recvmsg buffer.
+- Branch on family:
+  - `AF.INET`: cast to `sockaddr.in`, construct `Address` via `.{ .in = sa.* }`.
+  - `AF.INET6`: cast to `sockaddr.in6`, construct `Address` via `.{ .in6 = sa.* }`.
+  - Other: return `error.UnsupportedAddressFamily`.
+- The branch is well-predicted since all packets on a given socket share the same family.
+
+### Server.zig
+
+**`addrs_response` storage:**
+- Change type from `[]linux.sockaddr` (16 bytes) to a type large enough for IPv6.
+  Use `[]linux.sockaddr.storage` or `[]std.net.Address`.
+  `std.net.Address` is the natural choice since `decode_recv` already returns it.
+
+**`send_raw()`:**
+- Store full `peer_address` (not just `.any` which truncates IPv6).
+- Use `peer_address.getOsSockLen()` for `namelen` instead of `@sizeOf(linux.sockaddr)`.
+
+**Rate limiting call site (line ~805):**
+- Change `recv.peer_address.in.sa.addr` to use the new `AddrKey.fromAddress()`.
+
+**Config:**
+- `bind_address` default stays `"0.0.0.0"` for backward compatibility.
+- Doc comment updated to mention `"::"` for dual-stack IPv6.
+
+### Client.zig
+
+**`init()`:**
+- Remove `AF.INET` family check and `UnsupportedAddressFamily` error.
+- Create socket with `dest.any.family` instead of hardcoded `AF.INET`.
+
+**Config:**
+- `host` default stays `"127.0.0.1"` for backward compatibility.
+- Doc comment updated to mention IPv6 (`"::1"`).
+
+### rate_limiter.zig
+
+**New `AddrKey` type:**
+
+```
+AddrKey = struct {
+    family: u16,
+    addr: [16]u8,   // 4 bytes used for IPv4, 16 for IPv6
+    // zero-initialized remainder for IPv4
+
+    fn fromAddress(address: std.net.Address) AddrKey
+    fn eql(a: AddrKey, b: AddrKey) bool
+    fn hash(self: AddrKey) u32   // FNV-1a over family + addr
+}
+```
+
+**Slot change:** `ip_addr: u32` → `addr_key: AddrKey`
+
+**API change:** `allow(ip_addr: u32, now_ns: i64)` → `allow(addr_key: AddrKey, now_ns: i64)`
+
+Table indexing uses `addr_key.hash()`. Equality uses `addr_key.eql()` — no collisions.
+
+### exchange.zig
+
+**`addr_hash()` and `peer_key()`:**
+- Replace `@bitCast(address.any)` → `[16]u8` with hashing over
+  `std.mem.asBytes(&address)[0..address.getOsSockLen()]`.
+- This captures the full address regardless of family.
+
+### Tests
+
+- Existing tests use `"127.0.0.1"` / `parseIp("127.0.0.1")` — these continue to work unchanged (IPv4).
+- New tests bind to `"::1"` (IPv6 loopback) and verify round-trip.
+- Rate limiter tests: add IPv6 address test cases.
+- Exchange tests: add IPv6 address hashing test.
+- handler.zig tests use `initIp4` — no change needed (tests don't touch the network).
+
+## Performance
+
+- **Hot path (decode_recv):** One branch on family (well-predicted). Same cost as before for IPv4-only deployments.
+- **Rate limiter:** `AddrKey` comparison is 18 bytes instead of 4. Still < 1 cache line. Negligible.
+- **Exchange hashing:** Hashes up to 28 bytes instead of 16 for IPv6. One extra cache line read. Negligible.
+- **addrs_response:** Larger per-slot storage (Address is ~128 bytes vs sockaddr's 16). With default batch of 256, adds ~28 KB. Acceptable.
+
+## Non-goals
+
+- IPv6 multicast (roadmap item 5.1, separate feature).
+- Configurable `IPV6_V6ONLY` — always dual-stack when binding `::`. Users who want v6-only can bind a specific IPv6 address.
+- DNS resolution (roadmap item 5.4, separate feature).
diff --git a/docs/superpowers/specs/2026-03-15-separate-responses-design.md b/docs/superpowers/specs/2026-03-15-separate-responses-design.md
new file mode 100644
index 0000000..d626c7b
--- /dev/null
+++ b/docs/superpowers/specs/2026-03-15-separate-responses-design.md
@@ -0,0 +1,174 @@
+# Separate (Delayed) Responses Design (RFC 7252 §5.2.2, Roadmap 1.2)
+
+## Goal
+
+Allow handlers to defer responses for slow operations (I/O, inter-service calls). The server sends an empty ACK immediately, then delivers the actual response later as a new CON message with retransmission.
+
+## RFC 7252 §5.2.2 Flow
+
+```
+Client                  Server
+  |  CON [0xAB01]  GET  -->  |   (1) Client sends CON request
+  |  <--  ACK [0xAB01]      |   (2) Server sends empty ACK immediately
+  |       ... time ...       |   (3) Handler does slow work
+  |  <--  CON [0xAB02] 2.05 |   (4) Server sends response as NEW CON
+  |  ACK [0xAB02]  -->      |   (5) Client ACKs the separate response
+```
+
+Key: step 4 uses a **new msg_id** but the **same token** as the original request.
+
+## Architecture
+
+No handler signature change. The existing `null` return for CON (= empty ACK) is the entry point. A new `Server.sendResponse()` method delivers the late response. An MPSC queue makes it thread-safe. A pre-allocated retransmission pool drives CON reliability for outgoing separate responses.
+
+### Zero-cost for synchronous handlers
+
+- The MPSC queue drain in `tick()` is a single atomic load (queue empty = no work).
+- The retransmission scan is skipped when the pool is empty (one comparison).
+- No new branches in the normal piggybacked response path.
+
+## API
+
+### Handler side (no change to signature)
+
+```zig
+fn handler(ctx: *AppState, req: coap.Request) ?coap.Response {
+    if (is_fast(req)) return coap.Response.ok("quick");
+
+    // Slow: copy token + peer, enqueue background work.
+    var tok: [8]u8 = undefined;
+    @memcpy(tok[0..req.packet.token.len], req.packet.token);
+    ctx.enqueue_work(.{
+        .token = tok,
+        .token_len = @intCast(req.packet.token.len),
+        .peer = req.peer_address,
+    });
+    return null; // server sends empty ACK
+}
+```
+
+### Delivering the late response
+
+```zig
+// From any thread (background worker, I/O callback, etc.):
+try server.sendResponse(.{
+    .peer = item.peer,
+    .token = item.token[0..item.token_len],
+    .code = .content,
+    .payload = result_data,
+});
+```
+
+`sendResponse()` encodes the response as a CON packet with a server-generated
+msg_id, enqueues the wire bytes into the MPSC queue, and returns. The tick
+loop sends it and handles retransmission.
+
+Returns `error.SeparatePoolFull` if the retransmission pool is exhausted.
+
+## Internal Components
+
+### SeparateResponse struct (new: `src/separate.zig`)
+
+```
+Config = struct {
+    /// Max concurrent separate responses pending ACK.
+    count: u16 = 16,
+    /// Max encoded response size.
+    response_size: u16 = 1280,
+};
+
+Slot = struct {
+    state: enum { free, pending },
+    msg_id: u16,
+    retransmit_count: u4,
+    next_retransmit_ns: i128,
+    timeout_ns: u64,
+    wire_len: u16,
+    next_free: u16,
+};
+```
+
+Pre-allocated pool with:
+- `slots: []Slot` — retransmission state per pending response.
+- `wire_buffer: []u8` — encoded CON packets, `count * response_size` bytes.
+- `table: []u16` — hash table keyed on msg_id for O(1) ACK matching.
+- `free_head: u16` — intrusive free list.
+
+Methods:
+- `insert(msg_id, wire_data, now_ns) ?u16` — allocate slot, copy wire, start timer.
+- `find(msg_id) ?u16` — look up by msg_id (for ACK matching).
+- `remove(slot_idx)` — free slot, return to free list.
+- `cached_wire(slot_idx) []const u8` — get wire data for retransmission.
+
+### MPSC Queue (new: `src/mpsc.zig`)
+
+Bounded lock-free ring buffer for cross-thread submission.
+
+```
+Entry = struct {
+    peer: std.net.Address,
+    wire_len: u16,
+    wire: [response_size]u8, // pre-encoded CON packet
+};
+
+Queue = struct {
+    buffer: []Entry,
+    mask: u32,
+    head: std.atomic.Value(u32), // producers (atomic CAS)
+    tail: u32,                    // consumer (tick loop only)
+
+    fn push(entry: Entry) error{Full}!void   // any thread
+    fn pop() ?*Entry                          // tick loop only
+};
+```
+
+The caller (`sendResponse`) encodes the packet and pushes the wire bytes.
+The tick loop pops entries, allocates retransmission slots, and sends.
+
+### Server.zig integration
+
+**New config field:**
+```zig
+/// Max concurrent separate (deferred) responses. 0 = disabled.
+separate_response_count: u16 = 16,
+```
+
+**New state:**
+- `separate_pool: SeparateResponse` — retransmission tracking.
+- `separate_queue: mpsc.Queue` — cross-thread submission queue.
+- `next_separate_msg_id: std.atomic.Value(u16)` — atomic msg_id generator for `sendResponse()`.
+
+**`sendResponse()` method (thread-safe):**
+1. Generate msg_id via atomic increment.
+2. Encode response as CON packet into a stack buffer.
+3. Push wire bytes + peer into MPSC queue.
+4. Return error if queue is full.
+
+**`tick()` additions (after existing CQE processing):**
+1. **Drain MPSC queue:** Pop entries, allocate retransmission slots, send wire data.
+2. **Retransmission scan:** For each pending separate slot, check timeout. Retransmit
+   or mark as timed out (free the slot after `max_retransmit` attempts).
+
+**`handle_recv()` addition (ACK handling):**
+Currently the server ignores incoming ACK messages. Add: if `packet.kind == .acknowledgement`,
+look up `packet.msg_id` in the separate pool. If found, remove the slot (response delivered).
+
+## Thread Safety
+
+- `sendResponse()` is safe to call from any thread — only touches the MPSC queue (lock-free) and an atomic msg_id counter.
+- The MPSC queue uses atomic CAS on `head` for producers, plain load on `tail` for the consumer.
+- The separate pool and retransmission state are only accessed from the tick loop (single consumer).
+- For `thread_count > 1`: each server thread has its own io_uring and pools. The application must call `sendResponse()` on the correct server instance (the one that received the original request).
+
+## Performance
+
+- **Synchronous handler path:** One atomic load to check if queue has entries (empty = skip). One comparison to check if separate pool has pending slots (empty = skip). Both are cache-hot. Negligible overhead.
+- **Separate response path:** One MPSC push (atomic CAS) + one send + one slot allocation. Same order as a normal response.
+- **Memory:** `16 * 1280 = 20 KB` wire buffer + `16 * ~64B` slots + queue overhead. ~25 KB total at defaults.
+
+## Edge Cases
+
+- **Application never responds:** Client times out on its end. Server's retransmission pool entry stays until `max_retransmit` (4 retransmits × exponential backoff ≈ 45s), then is freed automatically.
+- **Client sends RST for the separate CON:** Server should match RST msg_id to the separate pool and free the slot. (Reuse existing RST handling path.)
+- **Queue full:** `sendResponse()` returns `error.SeparatePoolFull`. Application can retry or drop.
+- **Duplicate ACK:** `find()` returns null (already removed). Harmless.

From 85fe552e0d9a41cac3af8bcb4318ce0828f72583 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christoffer=20Vikstr=C3=B6m?= <cvi@heya.tech>
Date: Sun, 15 Mar 2026 21:27:31 +0100
Subject: [PATCH 02/15] spec: address review feedback for separate responses

---
 .../2026-03-15-separate-responses-design.md   | 190 +++++++++++++-----
 1 file changed, 143 insertions(+), 47 deletions(-)

diff --git a/docs/superpowers/specs/2026-03-15-separate-responses-design.md b/docs/superpowers/specs/2026-03-15-separate-responses-design.md
index d626c7b..9eb32e8 100644
--- a/docs/superpowers/specs/2026-03-15-separate-responses-design.md
+++ b/docs/superpowers/specs/2026-03-15-separate-responses-design.md
@@ -19,7 +19,10 @@ Key: step 4 uses a **new msg_id** but the **same token** as the original request
 
 ## Architecture
 
-No handler signature change. The existing `null` return for CON (= empty ACK) is the entry point. A new `Server.sendResponse()` method delivers the late response. An MPSC queue makes it thread-safe. A pre-allocated retransmission pool drives CON reliability for outgoing separate responses.
+No handler signature change. The existing `null` return for CON (= empty ACK) is the
+entry point. A new `Server.sendResponse()` method delivers the late response. An MPSC
+queue makes it thread-safe. A pre-allocated retransmission pool drives CON reliability
+for outgoing separate responses.
 
 ### Zero-cost for synchronous handlers
 
@@ -29,45 +32,55 @@ No handler signature change. The existing `null` return for CON (= empty ACK) is
 
 ## API
 
-### Handler side (no change to signature)
+### Deferring a response
+
+The handler calls `server.deferResponse(req)` to capture request context, then
+returns `null` (triggering the automatic empty ACK). The returned `DeferredCtx`
+is a small value type (no heap allocation) that holds everything needed to deliver
+the response later.
 
 ```zig
 fn handler(ctx: *AppState, req: coap.Request) ?coap.Response {
     if (is_fast(req)) return coap.Response.ok("quick");
 
-    // Slow: copy token + peer, enqueue background work.
-    var tok: [8]u8 = undefined;
-    @memcpy(tok[0..req.packet.token.len], req.packet.token);
-    ctx.enqueue_work(.{
-        .token = tok,
-        .token_len = @intCast(req.packet.token.len),
-        .peer = req.peer_address,
-    });
+    // Slow: capture context, enqueue background work.
+    const deferred = ctx.server.deferResponse(req) orelse
+        return coap.Response.withCode(.internal_server_error);
+    ctx.enqueue_work(deferred);
     return null; // server sends empty ACK
 }
 ```
 
+`DeferredCtx` is a plain struct — safe to copy, send across threads, store in queues:
+
+```zig
+const DeferredCtx = struct {
+    peer: std.net.Address,
+    token: [8]u8,
+    token_len: u3,
+    is_dtls: bool,
+    session_idx: u16, // DTLS session table index (only valid when is_dtls)
+};
+```
+
 ### Delivering the late response
 
 ```zig
 // From any thread (background worker, I/O callback, etc.):
-try server.sendResponse(.{
-    .peer = item.peer,
-    .token = item.token[0..item.token_len],
+try server.sendResponse(deferred, .{
     .code = .content,
     .payload = result_data,
 });
 ```
 
-`sendResponse()` encodes the response as a CON packet with a server-generated
-msg_id, enqueues the wire bytes into the MPSC queue, and returns. The tick
-loop sends it and handles retransmission.
+`sendResponse()` encodes the response as a CON packet, enqueues the wire bytes
+into the MPSC queue, and returns. The tick loop sends it and handles retransmission.
 
 Returns `error.SeparatePoolFull` if the retransmission pool is exhausted.
 
 ## Internal Components
 
-### SeparateResponse struct (new: `src/separate.zig`)
+### SeparateResponse pool (new: `src/separate.zig`)
 
 ```
 Config = struct {
@@ -80,25 +93,34 @@ Config = struct {
 Slot = struct {
     state: enum { free, pending },
     msg_id: u16,
+    peer: std.net.Address,
     retransmit_count: u4,
     next_retransmit_ns: i128,
     timeout_ns: u64,
     wire_len: u16,
+    is_dtls: bool,
+    session_idx: u16,
     next_free: u16,
 };
 ```
 
 Pre-allocated pool with:
 - `slots: []Slot` — retransmission state per pending response.
-- `wire_buffer: []u8` — encoded CON packets, `count * response_size` bytes.
+- `wire_buffer: []u8` — plaintext CoAP packets, `count * response_size` bytes.
 - `table: []u16` — hash table keyed on msg_id for O(1) ACK matching.
 - `free_head: u16` — intrusive free list.
 
 Methods:
-- `insert(msg_id, wire_data, now_ns) ?u16` — allocate slot, copy wire, start timer.
+- `insert(msg_id, peer, wire_data, is_dtls, session_idx, now_ns) ?u16`
 - `find(msg_id) ?u16` — look up by msg_id (for ACK matching).
 - `remove(slot_idx)` — free slot, return to free list.
-- `cached_wire(slot_idx) []const u8` — get wire data for retransmission.
+- `cached_wire(slot_idx) []const u8` — get plaintext CoAP data.
+
+**Important:** The pool stores **plaintext CoAP** bytes, not encrypted wire bytes.
+For DTLS, each retransmission must re-encrypt with a fresh DTLS record sequence
+number (RFC 6347 requires unique sequence numbers; replaying the same encrypted
+record would be rejected by the client's replay window). For plain UDP, the
+plaintext IS the wire format, so retransmission sends the stored bytes directly.
 
 ### MPSC Queue (new: `src/mpsc.zig`)
 
@@ -108,22 +130,29 @@ Bounded lock-free ring buffer for cross-thread submission.
 Entry = struct {
     peer: std.net.Address,
     wire_len: u16,
-    wire: [response_size]u8, // pre-encoded CON packet
+    is_dtls: bool,
+    session_idx: u16,
+    wire: [response_size]u8,  // plaintext CoAP packet
+    ready: std.atomic.Value(bool), // publication flag
 };
 
 Queue = struct {
     buffer: []Entry,
     mask: u32,
-    head: std.atomic.Value(u32), // producers (atomic CAS)
+    head: std.atomic.Value(u32), // producers reserve via CAS
     tail: u32,                    // consumer (tick loop only)
 
-    fn push(entry: Entry) error{Full}!void   // any thread
-    fn pop() ?*Entry                          // tick loop only
+    fn push(entry) error{Full}!void  // any thread
+    fn pop() ?*Entry                 // tick loop only
 };
 ```
 
-The caller (`sendResponse`) encodes the packet and pushes the wire bytes.
-The tick loop pops entries, allocates retransmission slots, and sends.
+**Publication protocol (prevents reading partial writes):**
+1. Producer reserves slot via atomic CAS on `head`.
+2. Producer writes entry data into `buffer[slot]`.
+3. Producer sets `buffer[slot].ready.store(true, .release)`.
+4. Consumer checks `buffer[tail].ready.load(.acquire)` before reading.
+5. Consumer clears `ready` after processing.
 
 ### Server.zig integration
 
@@ -136,39 +165,106 @@ separate_response_count: u16 = 16,
 **New state:**
 - `separate_pool: SeparateResponse` — retransmission tracking.
 - `separate_queue: mpsc.Queue` — cross-thread submission queue.
-- `next_separate_msg_id: std.atomic.Value(u16)` — atomic msg_id generator for `sendResponse()`.
 
-**`sendResponse()` method (thread-safe):**
-1. Generate msg_id via atomic increment.
-2. Encode response as CON packet into a stack buffer.
-3. Push wire bytes + peer into MPSC queue.
+**Message ID generation:**
+The existing `next_msg_id: u16` is converted to `std.atomic.Value(u16)` and used
+for both piggybacked responses (tick loop) and separate responses (`sendResponse()`
+from any thread). Single counter eliminates msg_id collisions. The tick loop uses
+`fetchAdd(1, .monotonic)` instead of the current `id +% 1`.
+
+**`deferResponse(req: Request) ?DeferredCtx`:**
+Captures token, peer address, and DTLS session index. Returns null if
+`separate_response_count == 0` (feature disabled). No allocation — just copies
+fields into a stack struct.
+
+**`sendResponse(ctx: DeferredCtx, response: Response) !void` (thread-safe):**
+1. Generate msg_id via `next_msg_id.fetchAdd(1, .monotonic)`.
+2. Encode response as CON packet (with ctx.token, new msg_id) into stack buffer.
+3. Push plaintext wire bytes + peer + DTLS info into MPSC queue.
 4. Return error if queue is full.
 
 **`tick()` additions (after existing CQE processing):**
-1. **Drain MPSC queue:** Pop entries, allocate retransmission slots, send wire data.
-2. **Retransmission scan:** For each pending separate slot, check timeout. Retransmit
-   or mark as timed out (free the slot after `max_retransmit` attempts).
 
-**`handle_recv()` addition (ACK handling):**
-Currently the server ignores incoming ACK messages. Add: if `packet.kind == .acknowledgement`,
-look up `packet.msg_id` in the separate pool. If found, remove the slot (response delivered).
+1. **Drain MPSC queue:** Pop entries, allocate retransmission slots, send.
+   - For plain UDP: send stored plaintext directly.
+   - For DTLS: encrypt plaintext via `send_dtls_packet()` using session_idx.
+2. **Retransmission scan:** For each pending separate slot:
+   - If `now >= next_retransmit_ns` and `retransmit_count < constants.max_retransmit`:
+     retransmit (re-encrypt for DTLS), double timeout, increment count.
+   - If `retransmit_count >= constants.max_retransmit`: free the slot (timed out).
+   - Initial timeout: `randomizedTimeout(constants.ack_timeout_ms)` (2-3s per RFC 7252 §4.2).
+   - Timeout doubles on each retransmit (exponential backoff).
+
+**`handle_recv()` addition (ACK/RST matching):**
+
+Currently the server processes RST (cancels exchange) and ignores ACK. Add:
+
+- **ACK:** If `packet.kind == .acknowledgement`, look up `packet.msg_id` in the
+  separate pool. If found, remove the slot (response delivered successfully).
+- **RST:** Also check the separate pool (in addition to the existing exchange pool
+  check). If found, remove the slot (client rejected the separate response).
+
+## Interactions with Existing Mechanisms
+
+### Client retransmits original CON after empty ACK
+
+The empty ACK for the original request is cached in the exchange pool (existing
+behavior at `Server.zig:1011-1031`). If the client retransmits its CON (same
+msg_id) before receiving the ACK, the server's duplicate detection retransmits
+the cached empty ACK. This is correct per RFC 7252.
+
+### Exchange pool eviction
+
+The exchange pool entry (cached empty ACK) has a lifetime of `exchange_lifetime_ms`
+(~247s). If the handler takes longer than this to call `sendResponse()`, the entry
+is evicted and a client retransmission would trigger the handler again (duplicate
+invocation). Applications should complete separate responses well within this window.
+The retransmission window for the separate response itself is ~45s (`max_retransmit=4`
+with exponential backoff), so the practical deadline is driven by application logic,
+not the protocol.
+
+### Separate response is NOT cached in the exchange pool
+
+The exchange pool entry for the original msg_id holds the empty ACK. The separate
+response has its own msg_id and its own retransmission tracking in the separate pool.
+These are independent — no interaction.
+
+### Future: server-side Observe (roadmap 2.1)
+
+Observe notifications are also server-initiated CON messages with retransmission.
+The separate pool's retransmission mechanism can be reused or shared. This design
+keeps the pool generic (msg_id + wire data + retransmit state) to enable reuse.
 
 ## Thread Safety
 
-- `sendResponse()` is safe to call from any thread — only touches the MPSC queue (lock-free) and an atomic msg_id counter.
-- The MPSC queue uses atomic CAS on `head` for producers, plain load on `tail` for the consumer.
-- The separate pool and retransmission state are only accessed from the tick loop (single consumer).
-- For `thread_count > 1`: each server thread has its own io_uring and pools. The application must call `sendResponse()` on the correct server instance (the one that received the original request).
+- `deferResponse()` is called from the handler (tick loop thread) — no synchronization needed.
+- `sendResponse()` is safe to call from any thread — only touches the MPSC queue
+  (lock-free CAS) and the atomic msg_id counter.
+- The separate pool and retransmission state are only accessed from the tick loop.
+- For `thread_count > 1`: each server thread has its own pools. The `DeferredCtx`
+  captures which server instance to use (the application routes to the right one).
 
 ## Performance
 
-- **Synchronous handler path:** One atomic load to check if queue has entries (empty = skip). One comparison to check if separate pool has pending slots (empty = skip). Both are cache-hot. Negligible overhead.
-- **Separate response path:** One MPSC push (atomic CAS) + one send + one slot allocation. Same order as a normal response.
-- **Memory:** `16 * 1280 = 20 KB` wire buffer + `16 * ~64B` slots + queue overhead. ~25 KB total at defaults.
+- **Synchronous handler path:** One atomic load to check queue (empty = skip). One
+  comparison to check separate pool (empty = skip). Both cache-hot. Negligible.
+- **Separate response path:** One atomic CAS (queue push) + one send + one slot
+  allocation. Same order as a normal response.
+- **Memory:** `16 * 1280 = 20 KB` wire buffer + `16 * ~96B` slots + queue. ~25 KB total.
+- **DTLS retransmission:** Re-encryption cost per retransmit (~1us for AES-128-CCM-8).
+  Acceptable — retransmits are rare.
 
 ## Edge Cases
 
-- **Application never responds:** Client times out on its end. Server's retransmission pool entry stays until `max_retransmit` (4 retransmits × exponential backoff ≈ 45s), then is freed automatically.
-- **Client sends RST for the separate CON:** Server should match RST msg_id to the separate pool and free the slot. (Reuse existing RST handling path.)
-- **Queue full:** `sendResponse()` returns `error.SeparatePoolFull`. Application can retry or drop.
+- **Application never responds:** Server's retransmission pool entry stays until
+  `max_retransmit` (4 retransmits with exponential backoff, ~45s total), then freed.
+  Client times out independently.
+- **Client sends RST for separate CON:** Server matches RST msg_id to separate pool,
+  frees the slot. Checked in addition to exchange pool (both are scanned for RST).
+- **Queue full:** `sendResponse()` returns `error.SeparatePoolFull`. Application retries or drops.
 - **Duplicate ACK:** `find()` returns null (already removed). Harmless.
+- **Stale/wrong token in sendResponse:** Server sends a CON the client doesn't recognize.
+  Client RSTs it. Server frees the slot. Harmless.
+- **Token reuse by client:** RFC 7252 §5.3.1 requires unique tokens per endpoint pair.
+  If the client violates this, the separate response may match the wrong request. This
+  is a client bug, not a server concern.

From c06028fde054133f65dd59a3fe6cca6b023a54cf Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christoffer=20Vikstr=C3=B6m?= <cvi@heya.tech>
Date: Sun, 15 Mar 2026 21:28:19 +0100
Subject: [PATCH 03/15] specs: address review feedback for IPv6 and separate
 responses

---
 .../specs/2026-03-15-ipv6-design.md           | 70 ++++++++++++++++---
 1 file changed, 62 insertions(+), 8 deletions(-)

diff --git a/docs/superpowers/specs/2026-03-15-ipv6-design.md b/docs/superpowers/specs/2026-03-15-ipv6-design.md
index 5d62b2f..f911db9 100644
--- a/docs/superpowers/specs/2026-03-15-ipv6-design.md
+++ b/docs/superpowers/specs/2026-03-15-ipv6-design.md
@@ -8,6 +8,8 @@ Remove all IPv4-only hardcoding. Support IPv6 and dual-stack operation. Address
 
 The bind address string determines the socket family. `"0.0.0.0"` creates AF_INET, `"::"` creates AF_INET6 with `IPV6_V6ONLY=0` (dual-stack, accepts both v4 and v6 clients). The hot path gains one well-predicted branch in `decode_recv()` to interpret the sockaddr.
 
+**Dual-stack note:** When an AF_INET6 socket receives from an IPv4 client, the kernel presents the peer as an IPv4-mapped IPv6 address (`::ffff:a.b.c.d`, family AF_INET6). All rate limiting and exchange hashing keys will be in IPv6 format. A client connecting to `0.0.0.0` vs `::` produces different keys — this is expected and correct.
+
 ## Changes by File
 
 ### Io.zig
@@ -18,6 +20,16 @@ The bind address string determines the socket family. `"0.0.0.0"` creates AF_INE
 - For `AF.INET6`: set `IPV6_V6ONLY` to `0` (dual-stack) via `setsockopt`.
 - `bind()` already uses `address.getOsSockLen()` — no change needed.
 
+**`addr_recv` field:**
+- Change from `linux.sockaddr` (16 bytes) to `linux.sockaddr.in6` (28 bytes) or
+  use a buffer large enough for both families. The kernel writes the peer address
+  here during recvmsg; if the buffer is too small, the address is truncated.
+
+**`msg_recv.namelen`:**
+- Change from hardcoded `@sizeOf(linux.sockaddr)` to `@sizeOf(linux.sockaddr.in6)`.
+  This is safe for both families — the kernel writes the actual address length
+  and the unused bytes are ignored.
+
 **`decode_recv()`:**
 - Read the family from the first 2 bytes of the name area in the recvmsg buffer.
 - Branch on family:
@@ -29,13 +41,14 @@ The bind address string determines the socket family. `"0.0.0.0"` creates AF_INE
 ### Server.zig
 
 **`addrs_response` storage:**
-- Change type from `[]linux.sockaddr` (16 bytes) to a type large enough for IPv6.
-  Use `[]linux.sockaddr.storage` or `[]std.net.Address`.
+- Change type from `[]linux.sockaddr` (16 bytes) to `[]std.net.Address`.
   `std.net.Address` is the natural choice since `decode_recv` already returns it.
 
 **`send_raw()`:**
 - Store full `peer_address` (not just `.any` which truncates IPv6).
 - Use `peer_address.getOsSockLen()` for `namelen` instead of `@sizeOf(linux.sockaddr)`.
+- The `.name` pointer must point to the correct sockaddr variant (`.any` for IPv4,
+  needs to point to the beginning of the Address which overlays correctly).
 
 **Rate limiting call site (line ~805):**
 - Change `recv.peer_address.in.sa.addr` to use the new `AddrKey.fromAddress()`.
@@ -75,28 +88,69 @@ AddrKey = struct {
 **API change:** `allow(ip_addr: u32, now_ns: i64)` → `allow(addr_key: AddrKey, now_ns: i64)`
 
 Table indexing uses `addr_key.hash()`. Equality uses `addr_key.eql()` — no collisions.
+Port is intentionally excluded — multiple connections from the same IP share a
+token bucket, which is correct for abuse prevention.
 
 ### exchange.zig
 
 **`addr_hash()` and `peer_key()`:**
-- Replace `@bitCast(address.any)` → `[16]u8` with hashing over
-  `std.mem.asBytes(&address)[0..address.getOsSockLen()]`.
-- This captures the full address regardless of family.
+
+Replace `@bitCast(address.any)` → `[16]u8` with a family-aware approach:
+
+```
+fn addr_hash(address: std.net.Address) u32 {
+    const bytes = switch (address.any.family) {
+        AF.INET => std.mem.asBytes(&address.in),
+        AF.INET6 => std.mem.asBytes(&address.in6),
+        else => std.mem.asBytes(&address.in),
+    };
+    // FNV-1a over bytes
+}
+```
+
+Branch on family and hash the correct union variant's bytes. This avoids
+tagged-union layout ambiguity and captures the full address for both families.
+Same pattern for `peer_key()`.
+
+### dtls/Cookie.zig
+
+**`generate()` and `verify()` (Security-critical):**
+
+Line 9: `const addr_bytes: [16]u8 = @bitCast(client_addr.any)` truncates IPv6
+addresses to 16 bytes — different IPv6 clients could produce the same cookie,
+weakening spoofing protection.
+
+Fix: use the same family-aware byte extraction as exchange.zig. Hash the full
+`sockaddr.in` or `sockaddr.in6` depending on family.
+
+### dtls/Session.zig
+
+**`addrHash()` and `addrEqual()` (Security-critical):**
+
+Lines 350-361: Same `@bitCast(addr.any)` truncation issue. Two different IPv6
+addresses could hash/compare as equal, causing session table collisions and
+potentially routing decrypted data to the wrong session.
+
+Fix: `addrHash` uses family-aware full-address hashing. `addrEqual` compares
+the correct union variant based on family.
 
 ### Tests
 
-- Existing tests use `"127.0.0.1"` / `parseIp("127.0.0.1")` — these continue to work unchanged (IPv4).
+- Existing tests use `"127.0.0.1"` / `parseIp("127.0.0.1")` — continue to work (IPv4).
 - New tests bind to `"::1"` (IPv6 loopback) and verify round-trip.
 - Rate limiter tests: add IPv6 address test cases.
 - Exchange tests: add IPv6 address hashing test.
+- DTLS Cookie/Session tests: add IPv6 address cases.
 - handler.zig tests use `initIp4` — no change needed (tests don't touch the network).
+- Server/Client test helpers: update `test_client` to support IPv6 loopback.
 
 ## Performance
 
 - **Hot path (decode_recv):** One branch on family (well-predicted). Same cost as before for IPv4-only deployments.
 - **Rate limiter:** `AddrKey` comparison is 18 bytes instead of 4. Still < 1 cache line. Negligible.
-- **Exchange hashing:** Hashes up to 28 bytes instead of 16 for IPv6. One extra cache line read. Negligible.
-- **addrs_response:** Larger per-slot storage (Address is ~128 bytes vs sockaddr's 16). With default batch of 256, adds ~28 KB. Acceptable.
+- **Exchange hashing:** Hashes up to 28 bytes instead of 16 for IPv6. Negligible.
+- **addrs_response:** `std.net.Address` (~128 bytes) vs `sockaddr` (16 bytes). With batch of 256, adds ~28 KB. Acceptable.
+- **DTLS hashing:** Same as exchange — family branch + full address hash. Negligible.
 
 ## Non-goals
 

From 824309fa19a50c7b230294eb80a69aebf6fc91b6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christoffer=20Vikstr=C3=B6m?= <cvi@heya.tech>
Date: Sun, 15 Mar 2026 21:33:01 +0100
Subject: [PATCH 04/15] plans: IPv6 and separate responses implementation

---
 docs/superpowers/plans/2026-03-15-ipv6.md     | 519 ++++++++++++++++++
 .../plans/2026-03-15-separate-responses.md    | 479 ++++++++++++++++
 2 files changed, 998 insertions(+)
 create mode 100644 docs/superpowers/plans/2026-03-15-ipv6.md
 create mode 100644 docs/superpowers/plans/2026-03-15-separate-responses.md

diff --git a/docs/superpowers/plans/2026-03-15-ipv6.md b/docs/superpowers/plans/2026-03-15-ipv6.md
new file mode 100644
index 0000000..8d39a89
--- /dev/null
+++ b/docs/superpowers/plans/2026-03-15-ipv6.md
@@ -0,0 +1,519 @@
+# IPv6 Support Implementation Plan
+
+> **For agentic workers:** REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Remove all IPv4-only hardcoding; support IPv6 and dual-stack via auto-detect from bind/host address string.
+
+**Architecture:** Address family is derived from the bind address (`"0.0.0.0"` → AF_INET, `"::"` → AF_INET6 dual-stack). A family-aware address hashing helper is shared by exchange, rate_limiter, DTLS cookie, and DTLS session. The hot path (`decode_recv`) gains one well-predicted branch.
+
+**Tech Stack:** Zig 0.15, Linux io_uring, IPv6 `sockaddr_in6`.
+
+**Spec:** `docs/superpowers/specs/2026-03-15-ipv6-design.md`
+
+---
+
+## File Map
+
+- **Modify:** `src/exchange.zig` — family-aware `addr_hash()` and `peer_key()`
+- **Modify:** `src/rate_limiter.zig` — `AddrKey` type, replace `ip_addr: u32` API
+- **Modify:** `src/Io.zig` — remove AF_INET restriction, IPv6 socket, decode_recv
+- **Modify:** `src/Server.zig` — `addr_recv` sizing, `addrs_response` type, `send_raw` namelen, rate limit call site, `msg_recv.namelen`
+- **Modify:** `src/Client.zig` — remove AF_INET restriction, create socket from parsed family
+- **Modify:** `src/dtls/Cookie.zig` — family-aware address bytes in HMAC
+- **Modify:** `src/dtls/Session.zig` — family-aware `addrHash()` and `addrEqual()`
+- **Modify:** `README.md` — update config docs, roadmap
+- **Modify:** `docs/ROADMAP.md` — mark 1.4 done
+
+---
+
+## Chunk 1: Address Hashing and Rate Limiter
+
+### Task 1: Family-aware address hashing in exchange.zig
+
+**Files:**
+- Modify: `src/exchange.zig:113-136` (addr_hash, peer_key)
+
+- [ ] **Step 1: Write test for IPv6 address hashing**
+
+Add test in `src/exchange.zig` tests section:
+
+```zig
+test "addr_hash differentiates IPv4 and IPv6" {
+    const v4 = try std.net.Address.parseIp("127.0.0.1", 5683);
+    const v6 = try std.net.Address.parseIp("::1", 5683);
+    try testing.expect(Exchange.addr_hash(v4) != Exchange.addr_hash(v6));
+}
+
+test "peer_key differentiates IPv4 and IPv6" {
+    const v4 = try std.net.Address.parseIp("127.0.0.1", 5683);
+    const v6 = try std.net.Address.parseIp("::1", 5683);
+    const k4 = Exchange.peer_key(v4, 0x1234);
+    const k6 = Exchange.peer_key(v6, 0x1234);
+    try testing.expect(k4 != k6);
+}
+
+test "addr_hash different IPv6 addresses produce different hashes" {
+    const a = try std.net.Address.parseIp("::1", 5683);
+    const b = try std.net.Address.parseIp("fe80::1", 5683);
+    try testing.expect(Exchange.addr_hash(a) != Exchange.addr_hash(b));
+}
+```
+
+- [ ] **Step 2: Run tests — expect failures since current hashing truncates IPv6**
+
+Run: `zig build test`
+
+- [ ] **Step 3: Implement family-aware hashing**
+
+Replace `addr_hash` and `peer_key` to branch on family:
+
+```zig
+pub fn addr_hash(address: std.net.Address) u32 {
+    var hash: u32 = 0x811c9dc5;
+    const bytes = addrBytes(address);
+    for (bytes) |b| {
+        hash ^= b;
+        hash *%= 0x01000193;
+    }
+    return hash;
+}
+
+pub fn peer_key(address: std.net.Address, message_id: u16) u64 {
+    var hash: u64 = 0xcbf29ce484222325;
+    const bytes = addrBytes(address);
+    for (bytes) |b| {
+        hash ^= b;
+        hash *%= 0x100000001b3;
+    }
+    hash ^= @as(u64, message_id);
+    hash *%= 0x100000001b3;
+    return hash;
+}
+
+fn addrBytes(address: std.net.Address) []const u8 {
+    return switch (address.any.family) {
+        posix.AF.INET => std.mem.asBytes(&address.in),
+        posix.AF.INET6 => std.mem.asBytes(&address.in6),
+        else => std.mem.asBytes(&address.in),
+    };
+}
+```
+
+Note: import `posix` at the top: `const posix = std.posix;`
+
+- [ ] **Step 4: Run tests — all pass**
+
+Run: `zig build test`
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add src/exchange.zig
+git commit -m "exchange: family-aware address hashing for IPv6"
+```
+
+---
+
+### Task 2: AddrKey type in rate_limiter.zig
+
+**Files:**
+- Modify: `src/rate_limiter.zig` (full file)
+
+- [ ] **Step 1: Write test for IPv6 rate limiting**
+
+Add test:
+
+```zig
+test "IPv6 addresses independent from IPv4" {
+    var rl = try RateLimiter.init(testing.allocator, .{
+        .ip_count = 8,
+        .tokens_per_sec = 10,
+        .burst = 2,
+    });
+    defer rl.deinit(testing.allocator);
+
+    const v4 = AddrKey.fromAddress(try std.net.Address.parseIp("127.0.0.1", 5683));
+    const v6 = AddrKey.fromAddress(try std.net.Address.parseIp("::1", 5683));
+    const now: i64 = 1_000_000_000;
+
+    try testing.expect(rl.allow(v4, now));
+    try testing.expect(rl.allow(v4, now));
+    try testing.expect(!rl.allow(v4, now)); // exhausted
+
+    // IPv6 has its own bucket
+    try testing.expect(rl.allow(v6, now));
+    try testing.expect(rl.allow(v6, now));
+    try testing.expect(!rl.allow(v6, now));
+}
+```
+
+- [ ] **Step 2: Define `AddrKey` and update `Slot`**
+
+Add `AddrKey` struct:
+
+```zig
+pub const AddrKey = struct {
+    family: u16,
+    addr: [16]u8,
+
+    pub const zero: AddrKey = .{ .family = 0, .addr = .{0} ** 16 };
+
+    pub fn fromAddress(address: std.net.Address) AddrKey {
+        return switch (address.any.family) {
+            posix.AF.INET => .{
+                .family = posix.AF.INET,
+                .addr = blk: {
+                    var a: [16]u8 = .{0} ** 16;
+                    const src: [4]u8 = @bitCast(address.in.sa.addr);
+                    @memcpy(a[0..4], &src);
+                    break :blk a;
+                },
+            },
+            posix.AF.INET6 => .{
+                .family = posix.AF.INET6,
+                .addr = address.in6.sa.addr,
+            },
+            else => zero,
+        };
+    }
+
+    pub fn eql(a: AddrKey, b: AddrKey) bool {
+        return a.family == b.family and std.mem.eql(u8, &a.addr, &b.addr);
+    }
+
+    pub fn hash(self: AddrKey) u64 {
+        var h: u64 = 0xcbf29ce484222325;
+        const fam_bytes: [2]u8 = @bitCast(self.family);
+        for (fam_bytes) |b| {
+            h ^= b;
+            h *%= 0x100000001b3;
+        }
+        for (self.addr) |b| {
+            h ^= b;
+            h *%= 0x100000001b3;
+        }
+        return h;
+    }
+};
+```
+
+Add `const posix = std.posix;` import.
+
+- [ ] **Step 3: Replace `ip_addr: u32` with `addr_key: AddrKey` throughout**
+
+Change `Slot.ip_addr: u32` → `Slot.addr_key: AddrKey`.
+
+Change `allow(self, ip_addr: u32, now_ns)` → `allow(self, addr_key: AddrKey, now_ns)`.
+
+Change `find_slot(self, key, ip_addr)` → `find_slot(self, key, addr_key)` with `self.slots[slot_idx].addr_key.eql(addr_key)`.
+
+Change `allocate_slot(self, key, ip_addr, now_ns)` → `allocate_slot(self, key, addr_key, now_ns)`.
+
+Change `hash_ip(ip: u32)` → use `addr_key.hash()` directly.
+
+Change `remove_slot` to use `self.slots[si].addr_key.hash()` for rehashing.
+
+Update `reset()` to use `AddrKey.zero`.
+
+- [ ] **Step 4: Update existing tests to use AddrKey**
+
+Replace all `const ip: u32 = 0x7F000001` with `const ip = AddrKey.fromAddress(try std.net.Address.parseIp("127.0.0.1", 5683))` and raw IP literals with AddrKey equivalents.
+
+- [ ] **Step 5: Run tests**
+
+Run: `zig build test`
+Expected: All pass.
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add src/rate_limiter.zig
+git commit -m "rate_limiter: AddrKey for IPv4/IPv6 address-agnostic rate limiting"
+```
+
+---
+
+## Chunk 2: Transport Layer
+
+### Task 3: Io.zig — IPv6 socket and decode_recv
+
+**Files:**
+- Modify: `src/Io.zig:74-128` (setup) and `src/Io.zig:189-219` (decode_recv)
+
+- [ ] **Step 1: Update `setup()` — remove AF_INET restriction, create socket from family**
+
+Replace lines 78-86:
+
+```zig
+const family = address.any.family;
+const fd = try posix.socket(family, posix.SOCK.DGRAM, 0);
+io.fd_socket = fd;
+
+// Enable dual-stack for IPv6 sockets.
+if (family == posix.AF.INET6) {
+    const v6only = std.mem.toBytes(@as(c_int, 0));
+    posix.setsockopt(fd, posix.IPPROTO.IPV6, linux.IPV6.V6ONLY, &v6only) catch |err| {
+        log.debug("IPV6_V6ONLY: {}", .{err});
+    };
+}
+```
+
+Remove the `if (address.any.family != posix.AF.INET) return error.UnsupportedAddressFamily;` line and the comment above it.
+
+- [ ] **Step 2: Update `decode_recv()` — handle both address families**
+
+Replace lines 205-212:
+
+```zig
+const peer_family: u16 = @bitCast(io.buffers[name_offset..][0..2].*);
+const net_address: std.net.Address = if (peer_family == posix.AF.INET) blk: {
+    const sa: *const linux.sockaddr.in = @ptrCast(@alignCast(io.buffers.ptr + name_offset));
+    break :blk .{ .in = sa.* };
+} else if (peer_family == posix.AF.INET6) blk: {
+    const sa: *const linux.sockaddr.in6 = @ptrCast(@alignCast(io.buffers.ptr + name_offset));
+    break :blk .{ .in6 = sa.* };
+} else return error.UnsupportedAddressFamily;
+```
+
+- [ ] **Step 3: Run tests**
+
+Run: `zig build test`
+Expected: All pass (existing tests use IPv4, still work).
+
+- [ ] **Step 4: Add IPv6 SO_REUSEPORT test**
+
+```zig
+test "SO_REUSEPORT allows two IPv6 instances on same port" {
+    const port: u16 = 19692;
+    const allocator = testing.allocator;
+
+    var io1 = try Io.init(allocator, 4, 256);
+    defer io1.deinit(allocator);
+    io1.setup(port, "::") catch |err| switch (err) {
+        error.AddressInUse => return, // CI may not support IPv6
+        else => return err,
+    };
+
+    var io2 = try Io.init(allocator, 4, 256);
+    defer io2.deinit(allocator);
+    try io2.setup(port, "::");
+}
+```
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add src/Io.zig
+git commit -m "io: IPv6 socket support, dual-stack, family-aware decode_recv"
+```
+
+---
+
+### Task 4: Server.zig — address storage, send path, recv buffer
+
+**Files:**
+- Modify: `src/Server.zig` — struct fields, init, listen, send_raw, rate limiting
+
+- [ ] **Step 1: Change `addr_recv` type**
+
+Line 125: Change `addr_recv: linux.sockaddr` → `addr_recv: linux.sockaddr.storage`
+
+(`sockaddr.storage` is large enough for any address family.)
+
+If `linux.sockaddr.storage` is not available in Zig 0.15's linux definitions, use a raw byte buffer: `addr_recv: [128]u8 align(8)` and adjust casts.
+
+- [ ] **Step 2: Update `msg_recv.namelen` in `listen()`**
+
+Line 391: Change `@sizeOf(linux.sockaddr)` → `@sizeOf(@TypeOf(server.addr_recv))`
+
+- [ ] **Step 3: Change `addrs_response` type**
+
+Line 115: Change `addrs_response: []linux.sockaddr` → `addrs_response: []std.net.Address`
+
+Update the allocation in `init()` (around line 270):
+```zig
+const addrs_response = try allocator.alloc(std.net.Address, batch);
+```
+
+Update `deinit()` to free `std.net.Address` slice.
+
+- [ ] **Step 4: Update `send_raw()`**
+
+Line 1635: Change `server.addrs_response[index] = peer_address.any` → `server.addrs_response[index] = peer_address`
+
+Line 1643: Change `.name = @ptrCast(&server.addrs_response[index])` → `.name = @ptrCast(&server.addrs_response[index])`
+(The pointer cast should work since Address is an extern union starting with sockaddr.)
+
+Line 1644: Change `.namelen = @sizeOf(linux.sockaddr)` → `.namelen = peer_address.getOsSockLen()`
+
+**Important:** We need to store the namelen per response too, since different peers may have different families on a dual-stack socket. Add a `namelen_response: []u32` field, or compute from the stored Address at send time.
+
+Simpler approach: compute from the stored address:
+```zig
+.namelen = server.addrs_response[index].getOsSockLen(),
+```
+
+- [ ] **Step 5: Update rate limiting call site**
+
+Line ~805: Change `recv.peer_address.in.sa.addr` → `RateLimiter.AddrKey.fromAddress(recv.peer_address)`.
+
+The `allow()` call becomes `rl.allow(RateLimiter.AddrKey.fromAddress(recv.peer_address), ...)`.
+
+- [ ] **Step 6: Run tests**
+
+Run: `zig build test`
+Expected: All pass.
+
+- [ ] **Step 7: Commit**
+
+```bash
+git add src/Server.zig
+git commit -m "server: IPv6-capable address storage, send path, recv buffer"
+```
+
+---
+
+### Task 5: Client.zig — remove AF_INET restriction
+
+**Files:**
+- Modify: `src/Client.zig:380-388`
+
+- [ ] **Step 1: Remove AF_INET check and hardcoded socket family**
+
+Line 381: Remove `if (dest.any.family != posix.AF.INET) return error.UnsupportedAddressFamily;`
+
+Line 383: Change `posix.AF.INET` → `dest.any.family`
+
+- [ ] **Step 2: Update Config doc comment**
+
+Line 37: Change `/// Server IPv4 address.` → `/// Server address (IPv4 or IPv6).`
+
+- [ ] **Step 3: Run tests**
+
+Run: `zig build test`
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add src/Client.zig
+git commit -m "client: support IPv6 host addresses"
+```
+
+---
+
+## Chunk 3: DTLS and Docs
+
+### Task 6: DTLS Cookie.zig and Session.zig
+
+**Files:**
+- Modify: `src/dtls/Cookie.zig:8-9`
+- Modify: `src/dtls/Session.zig:349-363`
+
+- [ ] **Step 1: Fix Cookie.zig address hashing**
+
+Replace line 9 (`@bitCast(client_addr.any)`) with family-aware hashing. Use the `addrBytes` helper pattern from exchange.zig, or import it:
+
+```zig
+const addr_bytes = switch (client_addr.any.family) {
+    posix.AF.INET => std.mem.asBytes(&client_addr.in),
+    posix.AF.INET6 => std.mem.asBytes(&client_addr.in6),
+    else => std.mem.asBytes(&client_addr.in),
+};
+// ...
+h.update(addr_bytes);
+```
+
+- [ ] **Step 2: Fix Session.zig addrHash and addrEqual**
+
+```zig
+fn addrHash(addr: std.net.Address) u64 {
+    const bytes = switch (addr.any.family) {
+        posix.AF.INET => std.mem.asBytes(&addr.in),
+        posix.AF.INET6 => std.mem.asBytes(&addr.in6),
+        else => std.mem.asBytes(&addr.in),
+    };
+    var hash: u64 = 0xcbf29ce484222325;
+    for (bytes) |b| {
+        hash ^= b;
+        hash *%= 0x100000001b3;
+    }
+    return hash;
+}
+
+fn addrEqual(a: std.net.Address, b: std.net.Address) bool {
+    if (a.any.family != b.any.family) return false;
+    return switch (a.any.family) {
+        posix.AF.INET => std.mem.eql(u8, std.mem.asBytes(&a.in), std.mem.asBytes(&b.in)),
+        posix.AF.INET6 => std.mem.eql(u8, std.mem.asBytes(&a.in6), std.mem.asBytes(&b.in6)),
+        else => false,
+    };
+}
+```
+
+- [ ] **Step 3: Add IPv6 tests for Cookie and Session**
+
+- [ ] **Step 4: Run tests**
+
+Run: `zig build test`
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add src/dtls/Cookie.zig src/dtls/Session.zig
+git commit -m "dtls: fix address truncation for IPv6 in Cookie and Session"
+```
+
+---
+
+### Task 7: Integration test, README, roadmap
+
+**Files:**
+- Modify: `src/Server.zig` tests — add IPv6 round-trip test
+- Modify: `README.md`
+- Modify: `docs/ROADMAP.md`
+
+- [ ] **Step 1: Add IPv6 loopback round-trip test**
+
+```zig
+test "round-trip: NON echo via IPv6 loopback" {
+    const port: u16 = 19715;
+
+    var server = Server.init(testing.allocator, .{
+        .port = port,
+        .bind_address = "::1",
+        .buffer_count = 8,
+        .buffer_size = 1280,
+    }, echo_handler) catch |err| switch (err) {
+        error.AddressNotAvailable => return, // skip if no IPv6
+        else => return err,
+    };
+    defer server.deinit();
+    try setup_for_test(&server);
+
+    // ... send NON via IPv6, verify response
+}
+```
+
+Note: the `test_client` helper needs updating to support IPv6 (use `parseIp` + family-based socket).
+
+- [ ] **Step 2: Update README roadmap checklist**
+
+Mark `[x] IPv6` in the roadmap section.
+
+- [ ] **Step 3: Update `docs/ROADMAP.md`**
+
+Change 1.4 status to `[x]` done.
+
+- [ ] **Step 4: Run full test suite and benchmarks**
+
+```bash
+zig build test
+zig build bench -Doptimize=ReleaseFast
+```
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add src/Server.zig README.md docs/ROADMAP.md
+git commit -m "ipv6: integration test, update README and roadmap"
+```
diff --git a/docs/superpowers/plans/2026-03-15-separate-responses.md b/docs/superpowers/plans/2026-03-15-separate-responses.md
new file mode 100644
index 0000000..448946f
--- /dev/null
+++ b/docs/superpowers/plans/2026-03-15-separate-responses.md
@@ -0,0 +1,479 @@
+# Separate (Delayed) Responses Implementation Plan
+
+> **For agentic workers:** REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Allow handlers to defer responses for slow operations. Server sends empty ACK immediately, then delivers the response later as a CON with retransmission.
+
+**Architecture:** Handler calls `deferResponse(req)` to capture context, returns `null` (existing empty ACK behavior). `sendResponse()` enqueues via lock-free MPSC queue. Tick loop drains queue, sends CON responses, drives retransmission. ACK/RST from client frees slots.
+
+**Tech Stack:** Zig 0.15, Linux io_uring, lock-free MPSC ring buffer, pre-allocated retransmission pool.
+
+**Spec:** `docs/superpowers/specs/2026-03-15-separate-responses-design.md`
+
+**Depends on:** IPv6 plan should be completed first (address type changes).
+
+---
+
+## File Map
+
+- **Create:** `src/mpsc.zig` — bounded lock-free MPSC ring buffer
+- **Create:** `src/separate.zig` — separate response retransmission pool
+- **Modify:** `src/Server.zig` — integration: deferResponse, sendResponse, tick additions, ACK/RST matching, atomic msg_id
+- **Modify:** `src/handler.zig` — DeferredCtx type
+- **Modify:** `src/root.zig` — re-export DeferredCtx, add separate/mpsc to test runner
+- **Modify:** `README.md` — document separate responses
+- **Modify:** `docs/ROADMAP.md` — mark 1.2 done
+
+---
+
+## Chunk 1: Data Structures
+
+### Task 1: MPSC ring buffer (`src/mpsc.zig`)
+
+**Files:**
+- Create: `src/mpsc.zig`
+
+A bounded, lock-free, multi-producer single-consumer ring buffer. Uses atomic CAS on head for producers and a publication flag per slot to prevent reading partial writes.
+
+- [ ] **Step 1: Define the Queue struct and Entry**
+
+```zig
+const std = @import("std");
+
+pub fn MpscQueue(comptime T: type) type {
+    return struct {
+        const Self = @This();
+
+        const Slot = struct {
+            data: T,
+            ready: std.atomic.Value(bool),
+        };
+
+        buffer: []Slot,
+        mask: u32,
+        head: std.atomic.Value(u32),  // producers reserve via CAS
+        tail: u32,                     // consumer only
+
+        pub fn init(allocator: std.mem.Allocator, capacity: u16) !Self {
+            // Round up to power of two.
+            var size: u32 = 1;
+            while (size < capacity) size <<= 1;
+            const buffer = try allocator.alloc(Slot, size);
+            for (buffer) |*slot| {
+                slot.ready = std.atomic.Value(bool).init(false);
+            }
+            return .{
+                .buffer = buffer,
+                .mask = size - 1,
+                .head = std.atomic.Value(u32).init(0),
+                .tail = 0,
+            };
+        }
+
+        pub fn deinit(self: *Self, allocator: std.mem.Allocator) void {
+            allocator.free(self.buffer);
+        }
+
+        /// Push an item (any thread). Returns error.Full if queue is at capacity.
+        pub fn push(self: *Self, item: T) error{Full}!void {
+            while (true) {
+                const head = self.head.load(.acquire);
+                const tail = @atomicLoad(u32, &self.tail, .acquire);
+                if (head -% tail >= self.mask + 1) return error.Full;
+                if (self.head.cmpxchgWeak(head, head +% 1, .acq_rel, .monotonic)) |_| {
+                    continue; // CAS failed, retry
+                }
+                const slot = &self.buffer[head & self.mask];
+                slot.data = item;
+                slot.ready.store(true, .release);
+                return;
+            }
+        }
+
+        /// Pop an item (consumer thread only). Returns null if empty.
+        pub fn pop(self: *Self) ?*T {
+            const slot = &self.buffer[self.tail & self.mask];
+            if (!slot.ready.load(.acquire)) return null;
+            slot.ready.store(false, .release);
+            const ptr = &slot.data;
+            self.tail +%= 1;
+            return ptr;
+        }
+
+        pub fn isEmpty(self: *const Self) bool {
+            return self.head.load(.acquire) == self.tail;
+        }
+    };
+}
+```
+
+- [ ] **Step 2: Write tests**
+
+```zig
+test "single producer single consumer" {
+    var q = try MpscQueue(u32).init(testing.allocator, 4);
+    defer q.deinit(testing.allocator);
+
+    try q.push(42);
+    try q.push(99);
+    try testing.expectEqual(@as(u32, 42), q.pop().?.*);
+    try testing.expectEqual(@as(u32, 99), q.pop().?.*);
+    try testing.expect(q.pop() == null);
+}
+
+test "full queue returns error" {
+    var q = try MpscQueue(u32).init(testing.allocator, 2);
+    defer q.deinit(testing.allocator);
+
+    try q.push(1);
+    try q.push(2);
+    try testing.expectError(error.Full, q.push(3));
+}
+
+test "wrap around" {
+    var q = try MpscQueue(u32).init(testing.allocator, 2);
+    defer q.deinit(testing.allocator);
+
+    try q.push(1);
+    _ = q.pop();
+    try q.push(2);
+    try q.push(3);
+    try testing.expectEqual(@as(u32, 2), q.pop().?.*);
+    try testing.expectEqual(@as(u32, 3), q.pop().?.*);
+}
+```
+
+- [ ] **Step 3: Run tests**
+
+Run: `zig build test`
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add src/mpsc.zig
+git commit -m "mpsc: lock-free bounded MPSC ring buffer"
+```
+
+---
+
+### Task 2: Separate response pool (`src/separate.zig`)
+
+**Files:**
+- Create: `src/separate.zig`
+
+Pre-allocated pool for tracking outgoing CON responses pending ACK. Pattern follows exchange.zig: open-addressing hash table, intrusive free list.
+
+- [ ] **Step 1: Define the pool struct**
+
+```zig
+const std = @import("std");
+const constants = @import("constants.zig");
+
+const SeparatePool = @This();
+
+pub const Config = struct {
+    count: u16 = 16,
+    response_size: u16 = 1280,
+};
+
+pub const State = enum(u8) { free, pending };
+
+pub const Slot = struct {
+    state: State,
+    msg_id: u16,
+    peer: std.net.Address,
+    retransmit_count: u4,
+    next_retransmit_ns: i128,
+    timeout_ns: u64,
+    wire_len: u16,
+    is_dtls: bool,
+    session_idx: u16,
+    next_free: u16,
+};
+
+slots: []Slot,
+wire_buffer: []u8,
+table: []u16,
+table_mask: u16,
+free_head: u16,
+count_active: u16,
+config: Config,
+
+const empty_sentinel: u16 = 0xFFFF;
+```
+
+- [ ] **Step 2: Implement init, deinit, insert, find, remove, cached_wire**
+
+Follow exchange.zig patterns exactly. The key difference: keyed on msg_id only (not peer+msg_id), since server-generated msg_ids are unique.
+
+- [ ] **Step 3: Write tests**
+
+```zig
+test "init and deinit" { ... }
+test "insert and find by msg_id" { ... }
+test "remove frees slot" { ... }
+test "pool exhaustion returns null" { ... }
+```
+
+- [ ] **Step 4: Run tests**
+
+Run: `zig build test`
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add src/separate.zig
+git commit -m "separate: pre-allocated retransmission pool for deferred responses"
+```
+
+---
+
+## Chunk 2: Server Integration
+
+### Task 3: DeferredCtx type and Server integration
+
+**Files:**
+- Modify: `src/handler.zig` — add DeferredCtx struct
+- Modify: `src/Server.zig` — add state, deferResponse, sendResponse, tick additions
+- Modify: `src/root.zig` — re-export
+
+- [ ] **Step 1: Add DeferredCtx to handler.zig**
+
+```zig
+/// Context for a deferred (separate) response. Returned by
+/// `Server.deferResponse()`. Pass to `Server.sendResponse()` to
+/// deliver the late response from any thread.
+pub const DeferredCtx = struct {
+    peer: std.net.Address,
+    token: [8]u8,
+    token_len: u3,
+    is_dtls: bool,
+    session_idx: u16,
+};
+```
+
+- [ ] **Step 2: Convert `next_msg_id` to atomic in Server.zig**
+
+Change line 133: `next_msg_id: u16` → `next_msg_id: std.atomic.Value(u16)`
+
+Update `init()` to use `std.atomic.Value(u16).init(...)`.
+
+Update `nextMsgId()` to use `fetchAdd(1, .monotonic)`.
+
+- [ ] **Step 3: Add separate pool and MPSC queue state to Server**
+
+Add fields after existing state:
+
+```zig
+separate_pool: ?SeparatePool,
+separate_queue: ?mpsc.MpscQueue(SeparateEntry),
+```
+
+Where `SeparateEntry` holds the pre-encoded wire data + metadata:
+
+```zig
+const SeparateEntry = struct {
+    peer: std.net.Address,
+    wire_len: u16,
+    is_dtls: bool,
+    session_idx: u16,
+    wire: [1280]u8,
+};
+```
+
+Initialize in `init()` when `config.separate_response_count > 0`.
+
+- [ ] **Step 4: Implement `deferResponse()`**
+
+```zig
+pub fn deferResponse(server: *const Server, request: handler.Request) ?handler.DeferredCtx {
+    if (server.config.separate_response_count == 0) return null;
+    var ctx: handler.DeferredCtx = .{
+        .peer = request.peer_address,
+        .token = undefined,
+        .token_len = @intCast(request.packet.token.len),
+        .is_dtls = request.is_secure,
+        .session_idx = 0, // TODO: pass session index through Request
+    };
+    @memcpy(ctx.token[0..request.packet.token.len], request.packet.token);
+    return ctx;
+}
+```
+
+- [ ] **Step 5: Implement `sendResponse()` (thread-safe)**
+
+```zig
+pub fn sendResponse(
+    server: *Server,
+    ctx: handler.DeferredCtx,
+    response: handler.Response,
+) !void {
+    const queue = &(server.separate_queue orelse return error.SeparateResponsesDisabled);
+    const msg_id = server.next_msg_id.fetchAdd(1, .monotonic);
+
+    const pkt = coapz.Packet{
+        .kind = .confirmable,
+        .code = response.code,
+        .msg_id = msg_id,
+        .token = ctx.token[0..ctx.token_len],
+        .options = response.options,
+        .payload = response.payload,
+        .data_buf = &.{},
+    };
+
+    var entry: SeparateEntry = .{
+        .peer = ctx.peer,
+        .wire_len = 0,
+        .is_dtls = ctx.is_dtls,
+        .session_idx = ctx.session_idx,
+        .wire = undefined,
+    };
+
+    const wire = pkt.writeBuf(&entry.wire) catch return error.BufferTooSmall;
+    entry.wire_len = @intCast(wire.len);
+
+    queue.push(entry) catch return error.SeparatePoolFull;
+}
+```
+
+- [ ] **Step 6: Add tick loop additions**
+
+In `tick()`, after existing CQE processing and before load level update:
+
+```zig
+// Drain separate response queue.
+if (server.separate_queue) |*queue| {
+    while (queue.pop()) |entry| {
+        server.sendSeparateResponse(entry);
+    }
+}
+
+// Retransmit pending separate responses.
+if (server.separate_pool) |*pool| {
+    if (pool.count_active > 0) {
+        server.retransmitSeparateResponses();
+    }
+}
+```
+
+Implement `sendSeparateResponse()`: allocate pool slot, send wire data (encrypt for DTLS), start retransmit timer.
+
+Implement `retransmitSeparateResponses()`: scan pending slots, retransmit on timeout (re-encrypt for DTLS), free after max_retransmit.
+
+- [ ] **Step 7: Add ACK/RST matching for separate responses**
+
+In `handle_recv()`, after the existing RST handling (line ~832):
+
+```zig
+// ACK matches a pending separate response.
+if (packet.kind == .acknowledgement) {
+    if (server.separate_pool) |*pool| {
+        if (pool.find(packet.msg_id)) |slot_idx| {
+            pool.remove(slot_idx);
+        }
+    }
+    return;
+}
+```
+
+For RST (extend existing block): also check separate pool in addition to exchange pool.
+
+Same changes needed in `process_dtls_coap()`.
+
+- [ ] **Step 8: Run tests**
+
+Run: `zig build test`
+
+- [ ] **Step 9: Commit**
+
+```bash
+git add src/Server.zig src/handler.zig src/root.zig
+git commit -m "server: separate response support with MPSC queue and retransmission"
+```
+
+---
+
+### Task 4: Integration tests
+
+**Files:**
+- Modify: `src/Server.zig` tests
+
+- [ ] **Step 1: Test — deferred CON gets empty ACK, then separate CON response**
+
+```zig
+test "separate response: deferred CON gets empty ACK then CON response" {
+    // 1. Create server with a handler that defers
+    // 2. Send CON request
+    // 3. Verify empty ACK received (code=0.00)
+    // 4. Call server.sendResponse() with the deferred context
+    // 5. Tick the server
+    // 6. Verify CON response received (new msg_id, same token, code=2.05)
+    // 7. Send ACK for the CON response
+    // 8. Tick — verify separate pool slot freed
+}
+```
+
+- [ ] **Step 2: Test — separate response retransmission**
+
+```zig
+test "separate response: retransmits CON if no ACK" {
+    // 1. Send deferred response
+    // 2. Don't ACK
+    // 3. Tick multiple times past retransmit timeout
+    // 4. Verify retransmission received (same msg_id, same payload)
+}
+```
+
+- [ ] **Step 3: Test — RST cancels separate response**
+
+```zig
+test "separate response: RST cancels retransmission" {
+    // 1. Send deferred response
+    // 2. Client sends RST with matching msg_id
+    // 3. Verify pool slot freed, no more retransmissions
+}
+```
+
+- [ ] **Step 4: Run all tests and benchmarks**
+
+```bash
+zig build test
+zig build bench -Doptimize=ReleaseFast
+```
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add src/Server.zig
+git commit -m "separate: integration tests for deferred responses"
+```
+
+---
+
+### Task 5: Documentation and roadmap
+
+**Files:**
+- Modify: `README.md`
+- Modify: `docs/ROADMAP.md`
+
+- [ ] **Step 1: Add separate responses section to README**
+
+In the Server features list, add:
+```
+- Separate (delayed) responses for async handlers (RFC 7252 §5.2.2)
+```
+
+Add a "Separate Responses" section in the handler docs with usage example.
+
+Document `separate_response_count` config field.
+
+- [ ] **Step 2: Update roadmap**
+
+Mark 1.2 as `[x]` done.
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add README.md docs/ROADMAP.md
+git commit -m "docs: separate responses in README and roadmap"
+```

From be570b552fb0c1ea51afdc01e266a768360f6f30 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christoffer=20Vikstr=C3=B6m?= <cvi@heya.tech>
Date: Mon, 16 Mar 2026 14:15:55 +0100
Subject: [PATCH 05/15] exchange: family-aware address hashing for IPv6

---
 src/exchange.zig | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/src/exchange.zig b/src/exchange.zig
index 0e01283..ffc5d52 100644
--- a/src/exchange.zig
+++ b/src/exchange.zig
@@ -5,6 +5,7 @@
 /// table keyed on (peer_address, message_id) for O(1) lookups and an
 /// intrusive free list for O(1) slot allocation.
 const std = @import("std");
+const posix = std.posix;
 const constants = @import("constants.zig");
 
 const Exchange = @This();
@@ -112,9 +113,8 @@ pub fn deinit(exchange: *Exchange, allocator: std.mem.Allocator) void {
 
 /// Hash peer address only (no message ID) for peer-based eviction.
 pub fn addr_hash(address: std.net.Address) u32 {
-    const addr_bytes: [16]u8 = @bitCast(address.any);
     var hash: u32 = 0x811c9dc5; // FNV-1a 32-bit offset basis
-    for (addr_bytes) |b| {
+    for (addrBytes(address)) |b| {
         hash ^= b;
         hash *%= 0x01000193; // FNV-1a 32-bit prime
     }
@@ -123,10 +123,8 @@ pub fn addr_hash(address: std.net.Address) u32 {
 
 /// Compute a hash key from peer address and message ID.
 pub fn peer_key(address: std.net.Address, message_id: u16) u64 {
-    // Use the raw sockaddr bytes for hashing.
-    const addr_bytes: [16]u8 = @bitCast(address.any);
     var hash: u64 = 0xcbf29ce484222325; // FNV-1a offset basis
-    for (addr_bytes) |b| {
+    for (addrBytes(address)) |b| {
         hash ^= b;
         hash *%= 0x100000001b3; // FNV-1a prime
     }
@@ -135,6 +133,15 @@ pub fn peer_key(address: std.net.Address, message_id: u16) u64 {
     return hash;
 }
 
+/// Extract the relevant address bytes for hashing (family-aware).
+fn addrBytes(address: std.net.Address) []const u8 {
+    return switch (address.any.family) {
+        posix.AF.INET => std.mem.asBytes(&address.in),
+        posix.AF.INET6 => std.mem.asBytes(&address.in6),
+        else => std.mem.asBytes(&address.in),
+    };
+}
+
 /// Look up an exchange by peer address and message ID.
 /// Returns the slot index if found, null otherwise.
 pub fn find(exchange: *const Exchange, key: u64) ?u16 {
@@ -546,3 +553,21 @@ test "evict_peer removes all exchanges for a peer" {
     try testing.expectEqual(@as(u16, 1), pool.count_active);
     try testing.expect(pool.find(Exchange.peer_key(addr_b, 3)) != null);
 }
+
+test "addr_hash differentiates IPv4 and IPv6" {
+    const v4 = try std.net.Address.parseIp("127.0.0.1", 5683);
+    const v6 = try std.net.Address.parseIp("::1", 5683);
+    try testing.expect(Exchange.addr_hash(v4) != Exchange.addr_hash(v6));
+}
+
+test "peer_key differentiates IPv4 and IPv6" {
+    const v4 = try std.net.Address.parseIp("127.0.0.1", 5683);
+    const v6 = try std.net.Address.parseIp("::1", 5683);
+    try testing.expect(Exchange.peer_key(v4, 0x1234) != Exchange.peer_key(v6, 0x1234));
+}
+
+test "addr_hash different IPv6 addresses produce different hashes" {
+    const a = try std.net.Address.parseIp("::1", 5683);
+    const b = try std.net.Address.parseIp("fe80::1", 5683);
+    try testing.expect(Exchange.addr_hash(a) != Exchange.addr_hash(b));
+}

From 8112144a4c0d42462ab8959a0b90e23ca3b32097 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christoffer=20Vikstr=C3=B6m?= <cvi@heya.tech>
Date: Mon, 16 Mar 2026 14:17:41 +0100
Subject: [PATCH 06/15] rate_limiter: AddrKey for IPv4/IPv6-agnostic rate
 limiting

---
 src/Server.zig       |   4 +-
 src/rate_limiter.zig | 137 ++++++++++++++++++++++++++++++++-----------
 2 files changed, 105 insertions(+), 36 deletions(-)

diff --git a/src/Server.zig b/src/Server.zig
index 79c4d05..d761829 100644
--- a/src/Server.zig
+++ b/src/Server.zig
@@ -802,8 +802,8 @@ fn handle_recv(
     // Rate limiting in throttled mode.
     if (server.load_level == .throttled) {
         if (server.rate_limiter) |*rl| {
-            const ip = recv.peer_address.in.sa.addr;
-            if (!rl.allow(ip, server.tick_now_ns)) {
+            const addr_key = RateLimiter.AddrKey.fromAddress(recv.peer_address);
+            if (!rl.allow(addr_key, server.tick_now_ns)) {
                 const is_con_raw = recv.payload.len >= 1 and ((recv.payload[0] >> 4) & 0x03) == 0;
                 release_buffer_robust(&server.io, recv.buffer_id);
                 server.buffers_outstanding -|= 1;
diff --git a/src/rate_limiter.zig b/src/rate_limiter.zig
index 5746fad..5acbda6 100644
--- a/src/rate_limiter.zig
+++ b/src/rate_limiter.zig
@@ -3,6 +3,7 @@
 /// Pattern follows exchange.zig: linear probing, backward-shift deletion,
 /// intrusive free list. Clock-hand eviction when the table is full.
 const std = @import("std");
+const posix = std.posix;
 
 const RateLimiter = @This();
 
@@ -15,10 +16,56 @@ pub const Config = struct {
     burst: u16 = 200,
 };
 
+/// Family-aware address key for IPv4/IPv6-agnostic rate limiting.
+/// Port is excluded — multiple connections from the same IP share a bucket.
+pub const AddrKey = struct {
+    family: u16,
+    addr: [16]u8,
+
+    pub const zero: AddrKey = .{ .family = 0, .addr = .{0} ** 16 };
+
+    pub fn fromAddress(address: std.net.Address) AddrKey {
+        return switch (address.any.family) {
+            posix.AF.INET => .{
+                .family = posix.AF.INET,
+                .addr = blk: {
+                    var a: [16]u8 = .{0} ** 16;
+                    const src: [4]u8 = @bitCast(address.in.sa.addr);
+                    @memcpy(a[0..4], &src);
+                    break :blk a;
+                },
+            },
+            posix.AF.INET6 => .{
+                .family = posix.AF.INET6,
+                .addr = address.in6.sa.addr,
+            },
+            else => zero,
+        };
+    }
+
+    pub fn eql(a: AddrKey, b: AddrKey) bool {
+        return a.family == b.family and std.mem.eql(u8, &a.addr, &b.addr);
+    }
+
+    pub fn hash(self: AddrKey) u64 {
+        var h: u64 = 0xcbf29ce484222325;
+        const fam_bytes: [2]u8 = @bitCast(self.family);
+        for (fam_bytes) |b| {
+            h ^= b;
+            h *%= 0x100000001b3;
+        }
+        for (self.addr) |b| {
+            h ^= b;
+            h *%= 0x100000001b3;
+        }
+        return h;
+    }
+};
+
 const State = enum(u8) { free, active };
 
 const Slot = struct {
-    ip_addr: u32,
+    addr_key: AddrKey,
     tokens: u16,
     last_refill_ns: i64,
     state: State,
@@ -56,7 +103,7 @@ pub fn init(allocator: std.mem.Allocator, config: Config) !RateLimiter {
 
     for (slots, 0..) |*slot, i| {
         slot.* = .{
-            .ip_addr = 0,
+            .addr_key = AddrKey.zero,
             .tokens = 0,
             .last_refill_ns = 0,
             .state = .free,
@@ -84,13 +131,13 @@ pub fn deinit(self: *RateLimiter, allocator: std.mem.Allocator) void {
     allocator.free(self.table);
 }
 
-/// Check if a packet from `ip_addr` is allowed. Deducts one token.
+/// Check if a packet from `addr_key` is allowed. Deducts one token.
 /// Returns true if allowed, false if rate-limited.
-pub fn allow(self: *RateLimiter, ip_addr: u32, now_ns: i64) bool {
-    const key = hash_ip(ip_addr);
+pub fn allow(self: *RateLimiter, addr_key: AddrKey, now_ns: i64) bool {
+    const key = addr_key.hash();
 
     // Look up existing entry.
-    if (self.find_slot(key, ip_addr)) |slot_idx| {
+    if (self.find_slot(key, addr_key)) |slot_idx| {
         const slot = &self.slots[slot_idx];
         self.refill(slot, now_ns);
         if (slot.tokens == 0) return false;
@@ -99,7 +146,7 @@ pub fn allow(self: *RateLimiter, ip_addr: u32, now_ns: i64) bool {
     }
 
     // New IP — allocate a slot.
-    const slot_idx = self.allocate_slot(key, ip_addr, now_ns) orelse {
+    const slot_idx = self.allocate_slot(key, addr_key, now_ns) orelse {
         // Table completely full with no eviction candidate.
         // Fail open: allow the packet rather than DoS everyone.
         return true;
@@ -113,7 +160,7 @@ pub fn allow(self: *RateLimiter, ip_addr: u32, now_ns: i64) bool {
 pub fn reset(self: *RateLimiter) void {
     for (self.slots) |*slot| {
         slot.state = .free;
-        slot.ip_addr = 0;
+        slot.addr_key = AddrKey.zero;
         slot.tokens = 0;
         slot.last_refill_ns = 0;
     }
@@ -131,24 +178,13 @@ pub fn reset(self: *RateLimiter) void {
 
 // ── Internal ──
 
-fn hash_ip(ip: u32) u64 {
-    // FNV-1a on 4 bytes.
-    var h: u64 = 0xcbf29ce484222325;
-    const bytes: [4]u8 = @bitCast(ip);
-    for (bytes) |b| {
-        h ^= b;
-        h *%= 0x100000001b3;
-    }
-    return h;
-}
-
-fn find_slot(self: *const RateLimiter, key: u64, ip_addr: u32) ?u16 {
+fn find_slot(self: *const RateLimiter, key: u64, addr_key: AddrKey) ?u16 {
     var idx: u16 = @intCast(@as(u32, @truncate(key)) & self.table_mask);
     var probes: u16 = 0;
     while (probes <= self.table_mask) : (probes += 1) {
         const slot_idx = self.table[idx];
         if (slot_idx == empty_sentinel) return null;
-        if (self.slots[slot_idx].ip_addr == ip_addr and self.slots[slot_idx].state == .active) {
+        if (self.slots[slot_idx].addr_key.eql(addr_key) and self.slots[slot_idx].state == .active) {
             return slot_idx;
         }
         idx = (idx + 1) & self.table_mask;
@@ -170,7 +206,7 @@ fn refill(self: *const RateLimiter, slot: *Slot, now_ns: i64) void {
     }
 }
 
-fn allocate_slot(self: *RateLimiter, key: u64, ip_addr: u32, now_ns: i64) ?u16 {
+fn allocate_slot(self: *RateLimiter, key: u64, addr_key: AddrKey, now_ns: i64) ?u16 {
     var slot_idx: u16 = undefined;
 
     if (self.free_head != empty_sentinel) {
@@ -184,7 +220,7 @@ fn allocate_slot(self: *RateLimiter, key: u64, ip_addr: u32, now_ns: i64) ?u16 {
 
     const slot = &self.slots[slot_idx];
     slot.* = .{
-        .ip_addr = ip_addr,
+        .addr_key = addr_key,
         .tokens = 0,
         .last_refill_ns = now_ns,
         .state = .active,
@@ -237,7 +273,7 @@ fn evict(self: *RateLimiter) ?u16 {
 
 fn remove_slot(self: *RateLimiter, slot_idx: u16) void {
     const slot = &self.slots[slot_idx];
-    const key = hash_ip(slot.ip_addr);
+    const key = slot.addr_key.hash();
     slot.state = .free;
 
     // Remove from hash table with backward-shift deletion.
@@ -258,7 +294,7 @@ fn rehash_after_remove(self: *RateLimiter, removed_idx: u16) void {
     while (self.table[idx] != empty_sentinel) {
         const si = self.table[idx];
         const desired: u16 = @intCast(
-            @as(u32, @truncate(hash_ip(self.slots[si].ip_addr))) & self.table_mask,
+            @as(u32, @truncate(self.slots[si].addr_key.hash())) & self.table_mask,
         );
         if (wrapping_distance(desired, idx, self.table_mask) >=
             wrapping_distance(desired, gap, self.table_mask))
@@ -279,6 +315,17 @@ fn wrapping_distance(from: u16, to: u16, mask: u16) u16 {
 
 const testing = std.testing;
 
+fn ipv4Key(comptime ip: u32) AddrKey {
+    return .{
+        .family = posix.AF.INET,
+        .addr = blk: {
+            var a: [16]u8 = .{0} ** 16;
+            a[0..4].* = @bitCast(ip);
+            break :blk a;
+        },
+    };
+}
+
 test "init and deinit" {
     var rl = try RateLimiter.init(testing.allocator, .{
         .ip_count = 8,
@@ -296,8 +343,8 @@ test "allow basic" {
     });
     defer rl.deinit(testing.allocator);
 
-    const ip: u32 = 0x7F000001; // 127.0.0.1
-    const now: i64 = 1_000_000_000; // 1 second
+    const ip = ipv4Key(0x7F000001);
+    const now: i64 = 1_000_000_000;
 
     // First 5 requests should succeed (burst=5).
     for (0..5) |_| {
@@ -315,7 +362,7 @@ test "token refill over time" {
     });
     defer rl.deinit(testing.allocator);
 
-    const ip: u32 = 0x7F000001;
+    const ip = ipv4Key(0x7F000001);
     var now: i64 = 1_000_000_000;
 
     // Exhaust tokens.
@@ -340,8 +387,8 @@ test "multiple IPs independent" {
     });
     defer rl.deinit(testing.allocator);
 
-    const ip1: u32 = 0x01020304;
-    const ip2: u32 = 0x05060708;
+    const ip1 = ipv4Key(0x01020304);
+    const ip2 = ipv4Key(0x05060708);
     const now: i64 = 1_000_000_000;
 
     // Each IP gets its own bucket.
@@ -365,11 +412,11 @@ test "eviction on table full" {
     const now: i64 = 1_000_000_000;
 
     // Fill the table with 2 IPs.
-    try testing.expect(rl.allow(0x01020301, now));
-    try testing.expect(rl.allow(0x01020302, now));
+    try testing.expect(rl.allow(ipv4Key(0x01020301), now));
+    try testing.expect(rl.allow(ipv4Key(0x01020302), now));
 
     // Third IP should trigger eviction and still succeed.
-    try testing.expect(rl.allow(0x01020303, now));
+    try testing.expect(rl.allow(ipv4Key(0x01020303), now));
 }
 
 test "reset clears all state" {
@@ -380,7 +427,7 @@ test "reset clears all state" {
     });
     defer rl.deinit(testing.allocator);
 
-    const ip: u32 = 0x7F000001;
+    const ip = ipv4Key(0x7F000001);
     const now: i64 = 1_000_000_000;
 
     try testing.expect(rl.allow(ip, now));
@@ -401,3 +448,25 @@ test "disabled config returns error" {
         .{ .ip_count = 0, .tokens_per_sec = 10, .burst = 20 },
     ));
 }
+
+test "IPv6 addresses independent from IPv4" {
+    var rl = try RateLimiter.init(testing.allocator, .{
+        .ip_count = 8,
+        .tokens_per_sec = 10,
+        .burst = 2,
+    });
+    defer rl.deinit(testing.allocator);
+
+    const v4 = AddrKey.fromAddress(try std.net.Address.parseIp("127.0.0.1", 5683));
+    const v6 = AddrKey.fromAddress(try std.net.Address.parseIp("::1", 5683));
+    const now: i64 = 1_000_000_000;
+
+    try testing.expect(rl.allow(v4, now));
+    try testing.expect(rl.allow(v4, now));
+    try testing.expect(!rl.allow(v4, now)); // exhausted
+
+    // IPv6 has its own bucket
+    try testing.expect(rl.allow(v6, now));
+    try testing.expect(rl.allow(v6, now));
+    try testing.expect(!rl.allow(v6, now));
+}

From 1f5559d7d03c5bc8fe70ff4b4510d25d4c1d5fac Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christoffer=20Vikstr=C3=B6m?= <cvi@heya.tech>
Date: Mon, 16 Mar 2026 14:20:20 +0100
Subject: [PATCH 07/15] io: IPv6 socket support, dual-stack, family-aware
 decode_recv

---
 src/Io.zig | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/src/Io.zig b/src/Io.zig
index bab4ca5..dd8762e 100644
--- a/src/Io.zig
+++ b/src/Io.zig
@@ -74,21 +74,21 @@ pub fn deinit(io: *Io, allocator: std.mem.Allocator) void {
 /// Bind a UDP socket and provide buffers to the kernel pool.
 pub fn setup(io: *Io, port: u16, bind_address: []const u8) !void {
     const address = try std.net.Address.parseIp(bind_address, port);
+    const family = address.any.family;
 
-    // Only IPv4 is supported; IPv6 requires larger sockaddr buffers
-    // throughout the recv/send paths.
-    if (address.any.family != posix.AF.INET) return error.UnsupportedAddressFamily;
-
-    const fd = try posix.socket(
-        posix.AF.INET,
-        posix.SOCK.DGRAM,
-        0,
-    );
+    const fd = try posix.socket(family, posix.SOCK.DGRAM, 0);
     io.fd_socket = fd;
 
     try posix.setsockopt(fd, posix.SOL.SOCKET, posix.SO.REUSEADDR, &std.mem.toBytes(@as(c_int, 1)));
     try posix.setsockopt(fd, posix.SOL.SOCKET, posix.SO.REUSEPORT, &std.mem.toBytes(@as(c_int, 1)));
 
+    // Enable dual-stack for IPv6 sockets (accept both v4 and v6 clients).
+    if (family == posix.AF.INET6) {
+        posix.setsockopt(fd, linux.IPPROTO.IPV6, linux.IPV6.V6ONLY, &std.mem.toBytes(@as(c_int, 0))) catch |err| {
+            log.debug("IPV6_V6ONLY: {}", .{err});
+        };
+    }
+
     // Increase socket buffers for throughput.
     const buf_size = std.mem.toBytes(@as(c_int, 4 * 1024 * 1024));
     posix.setsockopt(fd, posix.SOL.SOCKET, posix.SO.SNDBUF, &buf_size) catch |err| {
@@ -202,14 +202,15 @@ pub fn decode_recv(io: *const Io, cqe: *const Cqe) !RecvResult {
         return error.PayloadOutOfBounds;
     }
 
-    const peer_addr: *linux.sockaddr.in =
-        @ptrCast(@alignCast(io.buffers.ptr + name_offset));
+    // Determine address family from the first 2 bytes of the name area.
+    const peer_family = std.mem.readInt(u16, io.buffers[name_offset..][0..2], .little);
 
-    // Port from sockaddr is in network byte order; initIp4 expects host order.
-    const net_address = std.net.Address.initIp4(
-        @bitCast(peer_addr.addr),
-        std.mem.bigToNative(u16, peer_addr.port),
-    );
+    const net_address: std.net.Address = if (peer_family == posix.AF.INET)
+        .{ .in = .{ .sa = @as(*const linux.sockaddr.in, @ptrCast(@alignCast(io.buffers.ptr + name_offset))).* } }
+    else if (peer_family == posix.AF.INET6)
+        .{ .in6 = .{ .sa = @as(*const linux.sockaddr.in6, @ptrCast(@alignCast(io.buffers.ptr + name_offset))).* } }
+    else
+        return error.UnsupportedAddressFamily;
 
     return .{
         .peer_address = net_address,

From 1bb1e6aec427f874cef48a841ca362f54e82be7e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christoffer=20Vikstr=C3=B6m?= <cvi@heya.tech>
Date: Mon, 16 Mar 2026 14:40:23 +0100
Subject: [PATCH 08/15] server: IPv6-capable address storage, send path, recv
 buffer

---
 src/Server.zig | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/src/Server.zig b/src/Server.zig
index d761829..d71e17c 100644
--- a/src/Server.zig
+++ b/src/Server.zig
@@ -112,7 +112,7 @@ exchange_lifetime_ms: u32,
 running: std.atomic.Value(bool),
 
 // Pre-allocated per-CQE response state.
-addrs_response: []linux.sockaddr,
+addrs_response: []std.net.Address,
 msgs_response: []linux.msghdr_const,
 iovs_response: []posix.iovec,
 buffer_response: []u8,
@@ -122,7 +122,7 @@ buffer_response: []u8,
 emergency_ack: []u8,
 
 // Recv state.
-addr_recv: linux.sockaddr,
+addr_recv: linux.sockaddr.in6,
 msg_recv: linux.msghdr,
 
 // Eviction timer.
@@ -268,7 +268,7 @@ fn init_raw(
     );
 
     const addrs_response = try allocator.alloc(
-        linux.sockaddr,
+        std.net.Address,
         batch,
     );
     errdefer allocator.free(addrs_response);
@@ -338,7 +338,7 @@ fn init_raw(
         .iovs_response = iovs_response,
         .buffer_response = buffer_response,
         .emergency_ack = emergency_ack,
-        .addr_recv = std.mem.zeroes(linux.sockaddr),
+        .addr_recv = std.mem.zeroes(linux.sockaddr.in6),
         .msg_recv = std.mem.zeroes(linux.msghdr),
         .last_eviction_ns = 0,
         .tick_count = 0,
@@ -387,8 +387,10 @@ pub fn deinit(server: *Server) void {
 pub fn listen(server: *Server) !void {
     try server.io.setup(server.config.port, server.config.bind_address);
 
-    server.msg_recv.name = &server.addr_recv;
-    server.msg_recv.namelen = @sizeOf(linux.sockaddr);
+    server.msg_recv.name = @ptrCast(&server.addr_recv);
+    // Set name buffer size based on actual socket family.
+    const bind_addr = try std.net.Address.parseIp(server.config.bind_address, server.config.port);
+    server.msg_recv.namelen = bind_addr.getOsSockLen();
     server.msg_recv.controllen = 0;
 
     try server.io.recv_multishot(&server.msg_recv);
@@ -1632,7 +1634,7 @@ fn send_raw(
     peer_address: std.net.Address,
     index: usize,
 ) !void {
-    server.addrs_response[index] = peer_address.any;
+    server.addrs_response[index] = peer_address;
 
     server.iovs_response[index] = .{
         .base = @ptrCast(@constCast(data.ptr)),
@@ -1641,7 +1643,7 @@ fn send_raw(
 
     server.msgs_response[index] = .{
         .name = @ptrCast(&server.addrs_response[index]),
-        .namelen = @sizeOf(linux.sockaddr),
+        .namelen = peer_address.getOsSockLen(),
         .iov = @ptrCast(&server.iovs_response[index]),
         .iovlen = 1,
         .control = null,

From b68eb62a6adb7ac6c977bac16fe62376ee5737cc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christoffer=20Vikstr=C3=B6m?= <cvi@heya.tech>
Date: Mon, 16 Mar 2026 14:41:43 +0100
Subject: [PATCH 09/15] client: support IPv6 host addresses

---
 src/Client.zig | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/Client.zig b/src/Client.zig
index 427587d..20fb28b 100644
--- a/src/Client.zig
+++ b/src/Client.zig
@@ -34,7 +34,7 @@ const Client = @This();
 
 /// Client configuration. All fields have sensible defaults.
 pub const Config = struct {
-    /// Server IPv4 address. Default: `"127.0.0.1"`.
+    /// Server address (IPv4 or IPv6). Default: `"127.0.0.1"`.
     host: []const u8 = "127.0.0.1",
     /// Server UDP port. Default: 5683 (CoAP standard).
     port: u16 = constants.port_default,
@@ -369,7 +369,7 @@ const empty_sentinel: u16 = 0xFFFF;
 /// by the client and freed in `deinit()`.
 ///
 /// Returns `error.InvalidConfig` if `max_in_flight` is 0 or `buffer_size` < 64.
-/// Returns `error.UnsupportedAddressFamily` for non-IPv4 addresses.
+/// Supports both IPv4 and IPv6 addresses.
 pub fn init(allocator: std.mem.Allocator, config: Config) !Client {
     if (config.max_in_flight == 0) return error.InvalidConfig;
     if (config.buffer_size < 64) return error.InvalidConfig;
@@ -383,9 +383,8 @@ pub fn init(allocator: std.mem.Allocator, config: Config) !Client {
     }
 
     const dest = try std.net.Address.parseIp(effective_config.host, effective_config.port);
-    if (dest.any.family != posix.AF.INET) return error.UnsupportedAddressFamily;
 
-    const fd = try posix.socket(posix.AF.INET, posix.SOCK.DGRAM | posix.SOCK.NONBLOCK, 0);
+    const fd = try posix.socket(dest.any.family, posix.SOCK.DGRAM | posix.SOCK.NONBLOCK, 0);
     errdefer posix.close(fd);
     try posix.connect(fd, &dest.any, dest.getOsSockLen());
 

From 12e0eb0d96d8d608a73b2ac9ec51ed1fe2e04d58 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christoffer=20Vikstr=C3=B6m?= <cvi@heya.tech>
Date: Mon, 16 Mar 2026 14:45:13 +0100
Subject: [PATCH 10/15] dtls: family-aware address hashing in Cookie and
 Session for IPv6

---
 src/dtls/Cookie.zig  | 10 ++++++++--
 src/dtls/Session.zig | 18 +++++++++++++-----
 src/exchange.zig     |  6 +++---
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/src/dtls/Cookie.zig b/src/dtls/Cookie.zig
index e1bd306..027e0b1 100644
--- a/src/dtls/Cookie.zig
+++ b/src/dtls/Cookie.zig
@@ -2,14 +2,20 @@
 ///
 /// Cookie = HMAC-SHA256(server_secret, client_addr_bytes || client_random)
 const std = @import("std");
+const posix = std.posix;
 const HmacSha256 = std.crypto.auth.hmac.sha2.HmacSha256;
 
 /// Generate a stateless DTLS cookie.
 pub fn generate(server_secret: [32]u8, client_addr: std.net.Address, client_random: [32]u8) [32]u8 {
-    const addr_bytes: [16]u8 = @bitCast(client_addr.any);
     var mac: [32]u8 = undefined;
     var h = HmacSha256.init(&server_secret);
-    h.update(&addr_bytes);
+    // Hash the full sockaddr for the active family. Use pointer to
+    // client_addr directly (not a copy) to avoid dangling slice.
+    switch (client_addr.any.family) {
+        posix.AF.INET => h.update(std.mem.asBytes(&client_addr.in)),
+        posix.AF.INET6 => h.update(std.mem.asBytes(&client_addr.in6)),
+        else => h.update(std.mem.asBytes(&client_addr.in)),
+    }
     h.update(&client_random);
     h.final(&mac);
     return mac;
diff --git a/src/dtls/Session.zig b/src/dtls/Session.zig
index e85bdd2..67c0f0c 100644
--- a/src/dtls/Session.zig
+++ b/src/dtls/Session.zig
@@ -3,6 +3,7 @@
 /// Pre-allocated open-addressed hash table with intrusive LRU doubly-linked
 /// list and free list. Provides O(1) lookup, allocation, and eviction.
 const std = @import("std");
+const posix = std.posix;
 const Sha256 = std.crypto.hash.sha2.Sha256;
 
 pub const State = enum(u8) {
@@ -347,9 +348,13 @@ pub const SessionTable = struct {
 };
 
 fn addrHash(addr: std.net.Address) u64 {
-    const addr_bytes: [16]u8 = @bitCast(addr.any);
+    const bytes = switch (addr.any.family) {
+        posix.AF.INET => std.mem.asBytes(&addr.in),
+        posix.AF.INET6 => std.mem.asBytes(&addr.in6),
+        else => std.mem.asBytes(&addr.in),
+    };
     var hash: u64 = 0xcbf29ce484222325; // FNV-1a 64-bit offset basis
-    for (addr_bytes) |b| {
+    for (bytes) |b| {
         hash ^= b;
         hash *%= 0x100000001b3; // FNV-1a 64-bit prime
     }
@@ -357,9 +362,12 @@ fn addrHash(addr: std.net.Address) u64 {
 }
 
 fn addrEqual(a: std.net.Address, b: std.net.Address) bool {
-    const ab: [16]u8 = @bitCast(a.any);
-    const bb: [16]u8 = @bitCast(b.any);
-    return std.mem.eql(u8, &ab, &bb);
+    if (a.any.family != b.any.family) return false;
+    return switch (a.any.family) {
+        posix.AF.INET => std.mem.eql(u8, std.mem.asBytes(&a.in), std.mem.asBytes(&b.in)),
+        posix.AF.INET6 => std.mem.eql(u8, std.mem.asBytes(&a.in6), std.mem.asBytes(&b.in6)),
+        else => false,
+    };
 }
 
 fn wrappingDistance(from: u32, to: u32, mask: u32) u32 {
diff --git a/src/exchange.zig b/src/exchange.zig
index ffc5d52..2dfe163 100644
--- a/src/exchange.zig
+++ b/src/exchange.zig
@@ -114,7 +114,7 @@ pub fn deinit(exchange: *Exchange, allocator: std.mem.Allocator) void {
 /// Hash peer address only (no message ID) for peer-based eviction.
 pub fn addr_hash(address: std.net.Address) u32 {
     var hash: u32 = 0x811c9dc5; // FNV-1a 32-bit offset basis
-    for (addrBytes(address)) |b| {
+    for (addrBytes(&address)) |b| {
         hash ^= b;
         hash *%= 0x01000193; // FNV-1a 32-bit prime
     }
@@ -124,7 +124,7 @@ pub fn addr_hash(address: std.net.Address) u32 {
 /// Compute a hash key from peer address and message ID.
 pub fn peer_key(address: std.net.Address, message_id: u16) u64 {
     var hash: u64 = 0xcbf29ce484222325; // FNV-1a offset basis
-    for (addrBytes(address)) |b| {
+    for (addrBytes(&address)) |b| {
         hash ^= b;
         hash *%= 0x100000001b3; // FNV-1a prime
     }
@@ -134,7 +134,7 @@ pub fn peer_key(address: std.net.Address, message_id: u16) u64 {
 }
 
 /// Extract the relevant address bytes for hashing (family-aware).
-fn addrBytes(address: std.net.Address) []const u8 {
+fn addrBytes(address: *const std.net.Address) []const u8 {
     return switch (address.any.family) {
         posix.AF.INET => std.mem.asBytes(&address.in),
         posix.AF.INET6 => std.mem.asBytes(&address.in6),

From 3f75ad5697ebc1ac2330b36b0d3c52f5e98ff8fa Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christoffer=20Vikstr=C3=B6m?= <cvi@heya.tech>
Date: Mon, 16 Mar 2026 14:47:37 +0100
Subject: [PATCH 11/15] ipv6: integration test for IPv6 loopback round-trip

---
 src/Server.zig | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/src/Server.zig b/src/Server.zig
index d71e17c..38f22f6 100644
--- a/src/Server.zig
+++ b/src/Server.zig
@@ -2395,3 +2395,50 @@ test "recognized_options allows custom critical options" {
     try testing.expectEqual(.content, response.code);
     try testing.expectEqualSlices(u8, "ok", response.payload);
 }
+
+fn test_client_ip(host: []const u8, port: u16) !posix.socket_t {
+    const dest = try std.net.Address.parseIp(host, port);
+    const fd = try posix.socket(dest.any.family, posix.SOCK.DGRAM, 0);
+    const timeout = posix.timeval{ .sec = 1, .usec = 0 };
+    try posix.setsockopt(fd, posix.SOL.SOCKET, posix.SO.RCVTIMEO, std.mem.asBytes(&timeout));
+    try posix.connect(fd, &dest.any, dest.getOsSockLen());
+    return fd;
+}
+
+test "round-trip: NON echo via IPv6 loopback" {
+    const port: u16 = 19715;
+
+    var server = Server.init(testing.allocator, .{
+        .port = port,
+        .bind_address = "::1",
+        .buffer_count = 8,
+        .buffer_size = 1280,
+        .rate_limit_ip_count = 0,
+    }, echo_handler) catch return;
+    defer server.deinit();
+    try setup_for_test(&server);
+
+    const request_packet = coapz.Packet{
+        .kind = .non_confirmable,
+        .code = .get,
+        .msg_id = 0x1234,
+        .token = &.{ 0xAA, 0xBB },
+        .options = &.{},
+        .payload = "ipv6",
+        .data_buf = &.{},
+    };
+    const wire = try request_packet.write(testing.allocator);
+    defer testing.allocator.free(wire);
+
+    const client_fd = test_client_ip("::1", port) catch return;
+    defer posix.close(client_fd);
+
+    const raw = try send_tick_recv(&server, client_fd, wire);
+    defer testing.allocator.free(raw);
+
+    const response = try coapz.Packet.read(testing.allocator, raw);
+    defer response.deinit(testing.allocator);
+
+    try testing.expectEqual(.content, response.code);
+    try testing.expectEqualSlices(u8, "ipv6", response.payload);
+}

From 7a41d9b81af6e3b28badebe7e3c5a98776823f33 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christoffer=20Vikstr=C3=B6m?= <cvi@heya.tech>
Date: Mon, 16 Mar 2026 14:50:37 +0100
Subject: [PATCH 12/15] ipv6: bench --ipv6 flag, update roadmap

---
 bench/client.zig | 13 ++++++++++---
 docs/ROADMAP.md  | 13 ++++---------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/bench/client.zig b/bench/client.zig
index 978ebec..f53e5e2 100644
--- a/bench/client.zig
+++ b/bench/client.zig
@@ -17,6 +17,7 @@ const bench_psk: coap.Psk = .{
 
 const SuiteConfig = struct {
     host: []const u8 = "127.0.0.1",
+    bind_address: []const u8 = "0.0.0.0",
     port: u16 = 5683,
     warmup_count: u32 = 1_000,
     window_size: u16 = 256,
@@ -165,7 +166,7 @@ pub fn main() !void {
             if (need_restart) {
                 kill_server(&server_pid);
                 const psk: ?coap.Psk = if (s.use_dtls) bench_psk else null;
-                server_pid = try fork_server(port, srv_tc, psk, counters);
+                server_pid = try fork_server(port, srv_tc, psk, counters, config.bind_address);
                 std.Thread.sleep(150 * std.time.ns_per_ms);
                 current_group = group;
             }
@@ -818,7 +819,7 @@ fn run_bench(
 
 fn make_client_socket(host: []const u8, port: u16) !posix.socket_t {
     const dest = try std.net.Address.parseIp(host, port);
-    const fd = try posix.socket(posix.AF.INET, posix.SOCK.DGRAM, 0);
+    const fd = try posix.socket(dest.any.family, posix.SOCK.DGRAM, 0);
     errdefer posix.close(fd);
     try posix.connect(fd, &dest.any, dest.getOsSockLen());
 
@@ -876,7 +877,7 @@ fn alloc_shared_counters() !*ServerCounters {
     return ptr;
 }
 
-fn fork_server(port: u16, thread_count: u16, psk: ?coap.Psk, counters: ?*ServerCounters) !posix.pid_t {
+fn fork_server(port: u16, thread_count: u16, psk: ?coap.Psk, counters: ?*ServerCounters, bind_address: []const u8) !posix.pid_t {
     const pid = try posix.fork();
     if (pid == 0) {
         // Silence server log output (info/warn) so it doesn't break the
@@ -891,6 +892,7 @@ fn fork_server(port: u16, thread_count: u16, psk: ?coap.Psk, counters: ?*ServerC
                 std.heap.page_allocator,
                 .{
                     .port = port,
+                    .bind_address = bind_address,
                     .buffer_count = 512,
                     .buffer_size = 1280,
                     .thread_count = thread_count,
@@ -906,6 +908,7 @@ fn fork_server(port: u16, thread_count: u16, psk: ?coap.Psk, counters: ?*ServerC
                 std.heap.page_allocator,
                 .{
                     .port = port,
+                    .bind_address = bind_address,
                     .buffer_count = 512,
                     .buffer_size = 1280,
                     .thread_count = thread_count,
@@ -1141,6 +1144,7 @@ fn print_usage() void {
             "  --window <n>      Client sliding window size (default: 256)\n" ++
             "  --threads <n>     Thread count, 0 = nproc (default: 0)\n" ++
             "  --no-server       Don't fork embedded echo server\n" ++
+            "  --ipv6            Use IPv6 loopback (::1) instead of IPv4\n" ++
             "\n" ++
             "Filters:\n" ++
             "  --plain-only      Skip DTLS scenarios\n" ++
@@ -1182,6 +1186,9 @@ fn parse_args() SuiteConfig {
             config.thread_count = std.fmt.parseInt(u16, val, 10) catch 0;
         } else if (std.mem.eql(u8, arg, "--no-server")) {
             config.embedded_server = false;
+        } else if (std.mem.eql(u8, arg, "--ipv6")) {
+            config.host = "::1";
+            config.bind_address = "::";
         } else if (std.mem.eql(u8, arg, "--plain-only")) {
             config.filter_dtls = false;
         } else if (std.mem.eql(u8, arg, "--dtls-only")) {
diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md
index 364965c..c713e39 100644
--- a/docs/ROADMAP.md
+++ b/docs/ROADMAP.md
@@ -52,18 +52,13 @@ These are protocol violations or mandatory omissions in the base CoAP spec.
   on first response via `poll()`, `waitForResponse()`, or `routeObserve()`.
 
 ### 1.4 IPv6 support (§1)
-- **Status:** `[ ]` hardcoded IPv4
+- **Status:** `[x]` done
 - **Issue:** AF_INET hardcoded in Io.zig, Client.zig, Server.zig. `sockaddr.in`
   cast in `decode_recv()` would overflow for `sockaddr_in6`. RFC 7252 treats
   IPv6 as essential.
-- **Impact:** Cannot deploy on IPv6-only networks (increasingly common in IoT).
-- **Effort:** Medium. Requires:
-  - sockaddr union (in/in6) throughout
-  - Buffer sizing for 28-byte sockaddr_in6
-  - Address hashing changes in exchange.zig, rate_limiter.zig, Session.zig
-  - Dual-stack or family-specific socket creation
-- **Perf note:** Larger sockaddr means more cache pressure in address hashing.
-  Use a compact address representation internally (hash, not raw sockaddr).
+- **Resolution:** Family auto-detected from bind/host address. Dual-stack via
+  `IPV6_V6ONLY=0` when binding `"::"`. Family-aware address hashing in exchange,
+  rate_limiter, DTLS Cookie, and DTLS Session. Bench supports `--ipv6` flag.
 
 ### 1.5 Option order validation on decode (§5.4.6)
 - **Status:** `[x]` done — structurally enforced by delta encoding

From ccfdfdceeab212e40fe4fc8ad42f1a0a1b39b018 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christoffer=20Vikstr=C3=B6m?= <cvi@heya.tech>
Date: Mon, 16 Mar 2026 15:14:35 +0100
Subject: [PATCH 13/15] remove separate responses docs from IPv6 branch

---
 .../plans/2026-03-15-separate-responses.md    | 479 ------------------
 .../2026-03-15-separate-responses-design.md   | 270 ----------
 2 files changed, 749 deletions(-)
 delete mode 100644 docs/superpowers/plans/2026-03-15-separate-responses.md
 delete mode 100644 docs/superpowers/specs/2026-03-15-separate-responses-design.md

diff --git a/docs/superpowers/plans/2026-03-15-separate-responses.md b/docs/superpowers/plans/2026-03-15-separate-responses.md
deleted file mode 100644
index 448946f..0000000
--- a/docs/superpowers/plans/2026-03-15-separate-responses.md
+++ /dev/null
@@ -1,479 +0,0 @@
-# Separate (Delayed) Responses Implementation Plan
-
-> **For agentic workers:** REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (`- [ ]`) syntax for tracking.
-
-**Goal:** Allow handlers to defer responses for slow operations. Server sends empty ACK immediately, then delivers the response later as a CON with retransmission.
-
-**Architecture:** Handler calls `deferResponse(req)` to capture context, returns `null` (existing empty ACK behavior). `sendResponse()` enqueues via lock-free MPSC queue. Tick loop drains queue, sends CON responses, drives retransmission. ACK/RST from client frees slots.
-
-**Tech Stack:** Zig 0.15, Linux io_uring, lock-free MPSC ring buffer, pre-allocated retransmission pool.
-
-**Spec:** `docs/superpowers/specs/2026-03-15-separate-responses-design.md`
-
-**Depends on:** IPv6 plan should be completed first (address type changes).
-
----
-
-## File Map
-
-- **Create:** `src/mpsc.zig` — bounded lock-free MPSC ring buffer
-- **Create:** `src/separate.zig` — separate response retransmission pool
-- **Modify:** `src/Server.zig` — integration: deferResponse, sendResponse, tick additions, ACK/RST matching, atomic msg_id
-- **Modify:** `src/handler.zig` — DeferredCtx type
-- **Modify:** `src/root.zig` — re-export DeferredCtx, add separate/mpsc to test runner
-- **Modify:** `README.md` — document separate responses
-- **Modify:** `docs/ROADMAP.md` — mark 1.2 done
-
----
-
-## Chunk 1: Data Structures
-
-### Task 1: MPSC ring buffer (`src/mpsc.zig`)
-
-**Files:**
-- Create: `src/mpsc.zig`
-
-A bounded, lock-free, multi-producer single-consumer ring buffer. Uses atomic CAS on head for producers and a publication flag per slot to prevent reading partial writes.
-
-- [ ] **Step 1: Define the Queue struct and Entry**
-
-```zig
-const std = @import("std");
-
-pub fn MpscQueue(comptime T: type) type {
-    return struct {
-        const Self = @This();
-
-        const Slot = struct {
-            data: T,
-            ready: std.atomic.Value(bool),
-        };
-
-        buffer: []Slot,
-        mask: u32,
-        head: std.atomic.Value(u32),  // producers reserve via CAS
-        tail: u32,                     // consumer only
-
-        pub fn init(allocator: std.mem.Allocator, capacity: u16) !Self {
-            // Round up to power of two.
-            var size: u32 = 1;
-            while (size < capacity) size <<= 1;
-            const buffer = try allocator.alloc(Slot, size);
-            for (buffer) |*slot| {
-                slot.ready = std.atomic.Value(bool).init(false);
-            }
-            return .{
-                .buffer = buffer,
-                .mask = size - 1,
-                .head = std.atomic.Value(u32).init(0),
-                .tail = 0,
-            };
-        }
-
-        pub fn deinit(self: *Self, allocator: std.mem.Allocator) void {
-            allocator.free(self.buffer);
-        }
-
-        /// Push an item (any thread). Returns error.Full if queue is at capacity.
-        pub fn push(self: *Self, item: T) error{Full}!void {
-            while (true) {
-                const head = self.head.load(.acquire);
-                const tail = @atomicLoad(u32, &self.tail, .acquire);
-                if (head -% tail >= self.mask + 1) return error.Full;
-                if (self.head.cmpxchgWeak(head, head +% 1, .acq_rel, .monotonic)) |_| {
-                    continue; // CAS failed, retry
-                }
-                const slot = &self.buffer[head & self.mask];
-                slot.data = item;
-                slot.ready.store(true, .release);
-                return;
-            }
-        }
-
-        /// Pop an item (consumer thread only). Returns null if empty.
-        pub fn pop(self: *Self) ?*T {
-            const slot = &self.buffer[self.tail & self.mask];
-            if (!slot.ready.load(.acquire)) return null;
-            slot.ready.store(false, .release);
-            const ptr = &slot.data;
-            self.tail +%= 1;
-            return ptr;
-        }
-
-        pub fn isEmpty(self: *const Self) bool {
-            return self.head.load(.acquire) == self.tail;
-        }
-    };
-}
-```
-
-- [ ] **Step 2: Write tests**
-
-```zig
-test "single producer single consumer" {
-    var q = try MpscQueue(u32).init(testing.allocator, 4);
-    defer q.deinit(testing.allocator);
-
-    try q.push(42);
-    try q.push(99);
-    try testing.expectEqual(@as(u32, 42), q.pop().?.*);
-    try testing.expectEqual(@as(u32, 99), q.pop().?.*);
-    try testing.expect(q.pop() == null);
-}
-
-test "full queue returns error" {
-    var q = try MpscQueue(u32).init(testing.allocator, 2);
-    defer q.deinit(testing.allocator);
-
-    try q.push(1);
-    try q.push(2);
-    try testing.expectError(error.Full, q.push(3));
-}
-
-test "wrap around" {
-    var q = try MpscQueue(u32).init(testing.allocator, 2);
-    defer q.deinit(testing.allocator);
-
-    try q.push(1);
-    _ = q.pop();
-    try q.push(2);
-    try q.push(3);
-    try testing.expectEqual(@as(u32, 2), q.pop().?.*);
-    try testing.expectEqual(@as(u32, 3), q.pop().?.*);
-}
-```
-
-- [ ] **Step 3: Run tests**
-
-Run: `zig build test`
-
-- [ ] **Step 4: Commit**
-
-```bash
-git add src/mpsc.zig
-git commit -m "mpsc: lock-free bounded MPSC ring buffer"
-```
-
----
-
-### Task 2: Separate response pool (`src/separate.zig`)
-
-**Files:**
-- Create: `src/separate.zig`
-
-Pre-allocated pool for tracking outgoing CON responses pending ACK. Pattern follows exchange.zig: open-addressing hash table, intrusive free list.
-
-- [ ] **Step 1: Define the pool struct**
-
-```zig
-const std = @import("std");
-const constants = @import("constants.zig");
-
-const SeparatePool = @This();
-
-pub const Config = struct {
-    count: u16 = 16,
-    response_size: u16 = 1280,
-};
-
-pub const State = enum(u8) { free, pending };
-
-pub const Slot = struct {
-    state: State,
-    msg_id: u16,
-    peer: std.net.Address,
-    retransmit_count: u4,
-    next_retransmit_ns: i128,
-    timeout_ns: u64,
-    wire_len: u16,
-    is_dtls: bool,
-    session_idx: u16,
-    next_free: u16,
-};
-
-slots: []Slot,
-wire_buffer: []u8,
-table: []u16,
-table_mask: u16,
-free_head: u16,
-count_active: u16,
-config: Config,
-
-const empty_sentinel: u16 = 0xFFFF;
-```
-
-- [ ] **Step 2: Implement init, deinit, insert, find, remove, cached_wire**
-
-Follow exchange.zig patterns exactly. The key difference: keyed on msg_id only (not peer+msg_id), since server-generated msg_ids are unique.
-
-- [ ] **Step 3: Write tests**
-
-```zig
-test "init and deinit" { ... }
-test "insert and find by msg_id" { ... }
-test "remove frees slot" { ... }
-test "pool exhaustion returns null" { ... }
-```
-
-- [ ] **Step 4: Run tests**
-
-Run: `zig build test`
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add src/separate.zig
-git commit -m "separate: pre-allocated retransmission pool for deferred responses"
-```
-
----
-
-## Chunk 2: Server Integration
-
-### Task 3: DeferredCtx type and Server integration
-
-**Files:**
-- Modify: `src/handler.zig` — add DeferredCtx struct
-- Modify: `src/Server.zig` — add state, deferResponse, sendResponse, tick additions
-- Modify: `src/root.zig` — re-export
-
-- [ ] **Step 1: Add DeferredCtx to handler.zig**
-
-```zig
-/// Context for a deferred (separate) response. Returned by
-/// `Server.deferResponse()`. Pass to `Server.sendResponse()` to
-/// deliver the late response from any thread.
-pub const DeferredCtx = struct {
-    peer: std.net.Address,
-    token: [8]u8,
-    token_len: u3,
-    is_dtls: bool,
-    session_idx: u16,
-};
-```
-
-- [ ] **Step 2: Convert `next_msg_id` to atomic in Server.zig**
-
-Change line 133: `next_msg_id: u16` → `next_msg_id: std.atomic.Value(u16)`
-
-Update `init()` to use `std.atomic.Value(u16).init(...)`.
-
-Update `nextMsgId()` to use `fetchAdd(1, .monotonic)`.
-
-- [ ] **Step 3: Add separate pool and MPSC queue state to Server**
-
-Add fields after existing state:
-
-```zig
-separate_pool: ?SeparatePool,
-separate_queue: ?mpsc.MpscQueue(SeparateEntry),
-```
-
-Where `SeparateEntry` holds the pre-encoded wire data + metadata:
-
-```zig
-const SeparateEntry = struct {
-    peer: std.net.Address,
-    wire_len: u16,
-    is_dtls: bool,
-    session_idx: u16,
-    wire: [1280]u8,
-};
-```
-
-Initialize in `init()` when `config.separate_response_count > 0`.
-
-- [ ] **Step 4: Implement `deferResponse()`**
-
-```zig
-pub fn deferResponse(server: *const Server, request: handler.Request) ?handler.DeferredCtx {
-    if (server.config.separate_response_count == 0) return null;
-    var ctx: handler.DeferredCtx = .{
-        .peer = request.peer_address,
-        .token = undefined,
-        .token_len = @intCast(request.packet.token.len),
-        .is_dtls = request.is_secure,
-        .session_idx = 0, // TODO: pass session index through Request
-    };
-    @memcpy(ctx.token[0..request.packet.token.len], request.packet.token);
-    return ctx;
-}
-```
-
-- [ ] **Step 5: Implement `sendResponse()` (thread-safe)**
-
-```zig
-pub fn sendResponse(
-    server: *Server,
-    ctx: handler.DeferredCtx,
-    response: handler.Response,
-) !void {
-    const queue = &(server.separate_queue orelse return error.SeparateResponsesDisabled);
-    const msg_id = server.next_msg_id.fetchAdd(1, .monotonic);
-
-    const pkt = coapz.Packet{
-        .kind = .confirmable,
-        .code = response.code,
-        .msg_id = msg_id,
-        .token = ctx.token[0..ctx.token_len],
-        .options = response.options,
-        .payload = response.payload,
-        .data_buf = &.{},
-    };
-
-    var entry: SeparateEntry = .{
-        .peer = ctx.peer,
-        .wire_len = 0,
-        .is_dtls = ctx.is_dtls,
-        .session_idx = ctx.session_idx,
-        .wire = undefined,
-    };
-
-    const wire = pkt.writeBuf(&entry.wire) catch return error.BufferTooSmall;
-    entry.wire_len = @intCast(wire.len);
-
-    queue.push(entry) catch return error.SeparatePoolFull;
-}
-```
-
-- [ ] **Step 6: Add tick loop additions**
-
-In `tick()`, after existing CQE processing and before load level update:
-
-```zig
-// Drain separate response queue.
-if (server.separate_queue) |*queue| {
-    while (queue.pop()) |entry| {
-        server.sendSeparateResponse(entry);
-    }
-}
-
-// Retransmit pending separate responses.
-if (server.separate_pool) |*pool| {
-    if (pool.count_active > 0) {
-        server.retransmitSeparateResponses();
-    }
-}
-```
-
-Implement `sendSeparateResponse()`: allocate pool slot, send wire data (encrypt for DTLS), start retransmit timer.
-
-Implement `retransmitSeparateResponses()`: scan pending slots, retransmit on timeout (re-encrypt for DTLS), free after max_retransmit.
-
-- [ ] **Step 7: Add ACK/RST matching for separate responses**
-
-In `handle_recv()`, after the existing RST handling (line ~832):
-
-```zig
-// ACK matches a pending separate response.
-if (packet.kind == .acknowledgement) {
-    if (server.separate_pool) |*pool| {
-        if (pool.find(packet.msg_id)) |slot_idx| {
-            pool.remove(slot_idx);
-        }
-    }
-    return;
-}
-```
-
-For RST (extend existing block): also check separate pool in addition to exchange pool.
-
-Same changes needed in `process_dtls_coap()`.
-
-- [ ] **Step 8: Run tests**
-
-Run: `zig build test`
-
-- [ ] **Step 9: Commit**
-
-```bash
-git add src/Server.zig src/handler.zig src/root.zig
-git commit -m "server: separate response support with MPSC queue and retransmission"
-```
-
----
-
-### Task 4: Integration tests
-
-**Files:**
-- Modify: `src/Server.zig` tests
-
-- [ ] **Step 1: Test — deferred CON gets empty ACK, then separate CON response**
-
-```zig
-test "separate response: deferred CON gets empty ACK then CON response" {
-    // 1. Create server with a handler that defers
-    // 2. Send CON request
-    // 3. Verify empty ACK received (code=0.00)
-    // 4. Call server.sendResponse() with the deferred context
-    // 5. Tick the server
-    // 6. Verify CON response received (new msg_id, same token, code=2.05)
-    // 7. Send ACK for the CON response
-    // 8. Tick — verify separate pool slot freed
-}
-```
-
-- [ ] **Step 2: Test — separate response retransmission**
-
-```zig
-test "separate response: retransmits CON if no ACK" {
-    // 1. Send deferred response
-    // 2. Don't ACK
-    // 3. Tick multiple times past retransmit timeout
-    // 4. Verify retransmission received (same msg_id, same payload)
-}
-```
-
-- [ ] **Step 3: Test — RST cancels separate response**
-
-```zig
-test "separate response: RST cancels retransmission" {
-    // 1. Send deferred response
-    // 2. Client sends RST with matching msg_id
-    // 3. Verify pool slot freed, no more retransmissions
-}
-```
-
-- [ ] **Step 4: Run all tests and benchmarks**
-
-```bash
-zig build test
-zig build bench -Doptimize=ReleaseFast
-```
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add src/Server.zig
-git commit -m "separate: integration tests for deferred responses"
-```
-
----
-
-### Task 5: Documentation and roadmap
-
-**Files:**
-- Modify: `README.md`
-- Modify: `docs/ROADMAP.md`
-
-- [ ] **Step 1: Add separate responses section to README**
-
-In the Server features list, add:
-```
-- Separate (delayed) responses for async handlers (RFC 7252 §5.2.2)
-```
-
-Add a "Separate Responses" section in the handler docs with usage example.
-
-Document `separate_response_count` config field.
-
-- [ ] **Step 2: Update roadmap**
-
-Mark 1.2 as `[x]` done.
-
-- [ ] **Step 3: Commit**
-
-```bash
-git add README.md docs/ROADMAP.md
-git commit -m "docs: separate responses in README and roadmap"
-```
diff --git a/docs/superpowers/specs/2026-03-15-separate-responses-design.md b/docs/superpowers/specs/2026-03-15-separate-responses-design.md
deleted file mode 100644
index 9eb32e8..0000000
--- a/docs/superpowers/specs/2026-03-15-separate-responses-design.md
+++ /dev/null
@@ -1,270 +0,0 @@
-# Separate (Delayed) Responses Design (RFC 7252 §5.2.2, Roadmap 1.2)
-
-## Goal
-
-Allow handlers to defer responses for slow operations (I/O, inter-service calls). The server sends an empty ACK immediately, then delivers the actual response later as a new CON message with retransmission.
-
-## RFC 7252 §5.2.2 Flow
-
-```
-Client                  Server
-  |  CON [0xAB01]  GET  -->  |   (1) Client sends CON request
-  |  <--  ACK [0xAB01]      |   (2) Server sends empty ACK immediately
-  |       ... time ...       |   (3) Handler does slow work
-  |  <--  CON [0xAB02] 2.05 |   (4) Server sends response as NEW CON
-  |  ACK [0xAB02]  -->      |   (5) Client ACKs the separate response
-```
-
-Key: step 4 uses a **new msg_id** but the **same token** as the original request.
-
-## Architecture
-
-No handler signature change. The existing `null` return for CON (= empty ACK) is the
-entry point. A new `Server.sendResponse()` method delivers the late response. An MPSC
-queue makes it thread-safe. A pre-allocated retransmission pool drives CON reliability
-for outgoing separate responses.
-
-### Zero-cost for synchronous handlers
-
-- The MPSC queue drain in `tick()` is a single atomic load (queue empty = no work).
-- The retransmission scan is skipped when the pool is empty (one comparison).
-- No new branches in the normal piggybacked response path.
-
-## API
-
-### Deferring a response
-
-The handler calls `server.deferResponse(req)` to capture request context, then
-returns `null` (triggering the automatic empty ACK). The returned `DeferredCtx`
-is a small value type (no heap allocation) that holds everything needed to deliver
-the response later.
-
-```zig
-fn handler(ctx: *AppState, req: coap.Request) ?coap.Response {
-    if (is_fast(req)) return coap.Response.ok("quick");
-
-    // Slow: capture context, enqueue background work.
-    const deferred = ctx.server.deferResponse(req) orelse
-        return coap.Response.withCode(.internal_server_error);
-    ctx.enqueue_work(deferred);
-    return null; // server sends empty ACK
-}
-```
-
-`DeferredCtx` is a plain struct — safe to copy, send across threads, store in queues:
-
-```zig
-const DeferredCtx = struct {
-    peer: std.net.Address,
-    token: [8]u8,
-    token_len: u3,
-    is_dtls: bool,
-    session_idx: u16, // DTLS session table index (only valid when is_dtls)
-};
-```
-
-### Delivering the late response
-
-```zig
-// From any thread (background worker, I/O callback, etc.):
-try server.sendResponse(deferred, .{
-    .code = .content,
-    .payload = result_data,
-});
-```
-
-`sendResponse()` encodes the response as a CON packet, enqueues the wire bytes
-into the MPSC queue, and returns. The tick loop sends it and handles retransmission.
-
-Returns `error.SeparatePoolFull` if the retransmission pool is exhausted.
-
-## Internal Components
-
-### SeparateResponse pool (new: `src/separate.zig`)
-
-```
-Config = struct {
-    /// Max concurrent separate responses pending ACK.
-    count: u16 = 16,
-    /// Max encoded response size.
-    response_size: u16 = 1280,
-};
-
-Slot = struct {
-    state: enum { free, pending },
-    msg_id: u16,
-    peer: std.net.Address,
-    retransmit_count: u4,
-    next_retransmit_ns: i128,
-    timeout_ns: u64,
-    wire_len: u16,
-    is_dtls: bool,
-    session_idx: u16,
-    next_free: u16,
-};
-```
-
-Pre-allocated pool with:
-- `slots: []Slot` — retransmission state per pending response.
-- `wire_buffer: []u8` — plaintext CoAP packets, `count * response_size` bytes.
-- `table: []u16` — hash table keyed on msg_id for O(1) ACK matching.
-- `free_head: u16` — intrusive free list.
-
-Methods:
-- `insert(msg_id, peer, wire_data, is_dtls, session_idx, now_ns) ?u16`
-- `find(msg_id) ?u16` — look up by msg_id (for ACK matching).
-- `remove(slot_idx)` — free slot, return to free list.
-- `cached_wire(slot_idx) []const u8` — get plaintext CoAP data.
-
-**Important:** The pool stores **plaintext CoAP** bytes, not encrypted wire bytes.
-For DTLS, each retransmission must re-encrypt with a fresh DTLS record sequence
-number (RFC 6347 requires unique sequence numbers; replaying the same encrypted
-record would be rejected by the client's replay window). For plain UDP, the
-plaintext IS the wire format, so retransmission sends the stored bytes directly.
-
-### MPSC Queue (new: `src/mpsc.zig`)
-
-Bounded lock-free ring buffer for cross-thread submission.
-
-```
-Entry = struct {
-    peer: std.net.Address,
-    wire_len: u16,
-    is_dtls: bool,
-    session_idx: u16,
-    wire: [response_size]u8,  // plaintext CoAP packet
-    ready: std.atomic.Value(bool), // publication flag
-};
-
-Queue = struct {
-    buffer: []Entry,
-    mask: u32,
-    head: std.atomic.Value(u32), // producers reserve via CAS
-    tail: u32,                    // consumer (tick loop only)
-
-    fn push(entry) error{Full}!void  // any thread
-    fn pop() ?*Entry                 // tick loop only
-};
-```
-
-**Publication protocol (prevents reading partial writes):**
-1. Producer reserves slot via atomic CAS on `head`.
-2. Producer writes entry data into `buffer[slot]`.
-3. Producer sets `buffer[slot].ready.store(true, .release)`.
-4. Consumer checks `buffer[tail].ready.load(.acquire)` before reading.
-5. Consumer clears `ready` after processing.
-
-### Server.zig integration
-
-**New config field:**
-```zig
-/// Max concurrent separate (deferred) responses. 0 = disabled.
-separate_response_count: u16 = 16,
-```
-
-**New state:**
-- `separate_pool: SeparateResponse` — retransmission tracking.
-- `separate_queue: mpsc.Queue` — cross-thread submission queue.
-
-**Message ID generation:**
-The existing `next_msg_id: u16` is converted to `std.atomic.Value(u16)` and used
-for both piggybacked responses (tick loop) and separate responses (`sendResponse()`
-from any thread). Single counter eliminates msg_id collisions. The tick loop uses
-`fetchAdd(1, .monotonic)` instead of the current `id +% 1`.
-
-**`deferResponse(req: Request) ?DeferredCtx`:**
-Captures token, peer address, and DTLS session index. Returns null if
-`separate_response_count == 0` (feature disabled). No allocation — just copies
-fields into a stack struct.
-
-**`sendResponse(ctx: DeferredCtx, response: Response) !void` (thread-safe):**
-1. Generate msg_id via `next_msg_id.fetchAdd(1, .monotonic)`.
-2. Encode response as CON packet (with ctx.token, new msg_id) into stack buffer.
-3. Push plaintext wire bytes + peer + DTLS info into MPSC queue.
-4. Return error if queue is full.
-
-**`tick()` additions (after existing CQE processing):**
-
-1. **Drain MPSC queue:** Pop entries, allocate retransmission slots, send.
-   - For plain UDP: send stored plaintext directly.
-   - For DTLS: encrypt plaintext via `send_dtls_packet()` using session_idx.
-2. **Retransmission scan:** For each pending separate slot:
-   - If `now >= next_retransmit_ns` and `retransmit_count < constants.max_retransmit`:
-     retransmit (re-encrypt for DTLS), double timeout, increment count.
-   - If `retransmit_count >= constants.max_retransmit`: free the slot (timed out).
-   - Initial timeout: `randomizedTimeout(constants.ack_timeout_ms)` (2-3s per RFC 7252 §4.2).
-   - Timeout doubles on each retransmit (exponential backoff).
-
-**`handle_recv()` addition (ACK/RST matching):**
-
-Currently the server processes RST (cancels exchange) and ignores ACK. Add:
-
-- **ACK:** If `packet.kind == .acknowledgement`, look up `packet.msg_id` in the
-  separate pool. If found, remove the slot (response delivered successfully).
-- **RST:** Also check the separate pool (in addition to the existing exchange pool
-  check). If found, remove the slot (client rejected the separate response).
-
-## Interactions with Existing Mechanisms
-
-### Client retransmits original CON after empty ACK
-
-The empty ACK for the original request is cached in the exchange pool (existing
-behavior at `Server.zig:1011-1031`). If the client retransmits its CON (same
-msg_id) before receiving the ACK, the server's duplicate detection retransmits
-the cached empty ACK. This is correct per RFC 7252.
-
-### Exchange pool eviction
-
-The exchange pool entry (cached empty ACK) has a lifetime of `exchange_lifetime_ms`
-(~247s). If the handler takes longer than this to call `sendResponse()`, the entry
-is evicted and a client retransmission would trigger the handler again (duplicate
-invocation). Applications should complete separate responses well within this window.
-The retransmission window for the separate response itself is ~45s (`max_retransmit=4`
-with exponential backoff), so the practical deadline is driven by application logic,
-not the protocol.
-
-### Separate response is NOT cached in the exchange pool
-
-The exchange pool entry for the original msg_id holds the empty ACK. The separate
-response has its own msg_id and its own retransmission tracking in the separate pool.
-These are independent — no interaction.
-
-### Future: server-side Observe (roadmap 2.1)
-
-Observe notifications are also server-initiated CON messages with retransmission.
-The separate pool's retransmission mechanism can be reused or shared. This design
-keeps the pool generic (msg_id + wire data + retransmit state) to enable reuse.
-
-## Thread Safety
-
-- `deferResponse()` is called from the handler (tick loop thread) — no synchronization needed.
-- `sendResponse()` is safe to call from any thread — only touches the MPSC queue
-  (lock-free CAS) and the atomic msg_id counter.
-- The separate pool and retransmission state are only accessed from the tick loop.
-- For `thread_count > 1`: each server thread has its own pools. The `DeferredCtx`
-  captures which server instance to use (the application routes to the right one).
-
-## Performance
-
-- **Synchronous handler path:** One atomic load to check queue (empty = skip). One
-  comparison to check separate pool (empty = skip). Both cache-hot. Negligible.
-- **Separate response path:** One atomic CAS (queue push) + one send + one slot
-  allocation. Same order as a normal response.
-- **Memory:** `16 * 1280 = 20 KB` wire buffer + `16 * ~96B` slots + queue. ~25 KB total.
-- **DTLS retransmission:** Re-encryption cost per retransmit (~1us for AES-128-CCM-8).
-  Acceptable — retransmits are rare.
-
-## Edge Cases
-
-- **Application never responds:** Server's retransmission pool entry stays until
-  `max_retransmit` (4 retransmits with exponential backoff, ~45s total), then freed.
-  Client times out independently.
-- **Client sends RST for separate CON:** Server matches RST msg_id to separate pool,
-  frees the slot. Checked in addition to exchange pool (both are scanned for RST).
-- **Queue full:** `sendResponse()` returns `error.SeparatePoolFull`. Application retries or drops.
-- **Duplicate ACK:** `find()` returns null (already removed). Harmless.
-- **Stale/wrong token in sendResponse:** Server sends a CON the client doesn't recognize.
-  Client RSTs it. Server frees the slot. Harmless.
-- **Token reuse by client:** RFC 7252 §5.3.1 requires unique tokens per endpoint pair.
-  If the client violates this, the separate response may match the wrong request. This
-  is a client bug, not a server concern.

From 25bd4590547c3f97e31c0fc29b7c01df15c0e08e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christoffer=20Vikstr=C3=B6m?= <cvi@heya.tech>
Date: Mon, 16 Mar 2026 15:15:15 +0100
Subject: [PATCH 14/15] remove specs and plans docs from repo

---
 docs/superpowers/plans/2026-03-15-ipv6.md     | 519 ------------------
 .../specs/2026-03-15-ipv6-design.md           | 159 ------
 2 files changed, 678 deletions(-)
 delete mode 100644 docs/superpowers/plans/2026-03-15-ipv6.md
 delete mode 100644 docs/superpowers/specs/2026-03-15-ipv6-design.md

diff --git a/docs/superpowers/plans/2026-03-15-ipv6.md b/docs/superpowers/plans/2026-03-15-ipv6.md
deleted file mode 100644
index 8d39a89..0000000
--- a/docs/superpowers/plans/2026-03-15-ipv6.md
+++ /dev/null
@@ -1,519 +0,0 @@
-# IPv6 Support Implementation Plan
-
-> **For agentic workers:** REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (`- [ ]`) syntax for tracking.
-
-**Goal:** Remove all IPv4-only hardcoding; support IPv6 and dual-stack via auto-detect from bind/host address string.
-
-**Architecture:** Address family is derived from the bind address (`"0.0.0.0"` → AF_INET, `"::"` → AF_INET6 dual-stack). A family-aware address hashing helper is shared by exchange, rate_limiter, DTLS cookie, and DTLS session. The hot path (`decode_recv`) gains one well-predicted branch.
-
-**Tech Stack:** Zig 0.15, Linux io_uring, IPv6 `sockaddr_in6`.
-
-**Spec:** `docs/superpowers/specs/2026-03-15-ipv6-design.md`
-
----
-
-## File Map
-
-- **Modify:** `src/exchange.zig` — family-aware `addr_hash()` and `peer_key()`
-- **Modify:** `src/rate_limiter.zig` — `AddrKey` type, replace `ip_addr: u32` API
-- **Modify:** `src/Io.zig` — remove AF_INET restriction, IPv6 socket, decode_recv
-- **Modify:** `src/Server.zig` — `addr_recv` sizing, `addrs_response` type, `send_raw` namelen, rate limit call site, `msg_recv.namelen`
-- **Modify:** `src/Client.zig` — remove AF_INET restriction, create socket from parsed family
-- **Modify:** `src/dtls/Cookie.zig` — family-aware address bytes in HMAC
-- **Modify:** `src/dtls/Session.zig` — family-aware `addrHash()` and `addrEqual()`
-- **Modify:** `README.md` — update config docs, roadmap
-- **Modify:** `docs/ROADMAP.md` — mark 1.4 done
-
----
-
-## Chunk 1: Address Hashing and Rate Limiter
-
-### Task 1: Family-aware address hashing in exchange.zig
-
-**Files:**
-- Modify: `src/exchange.zig:113-136` (addr_hash, peer_key)
-
-- [ ] **Step 1: Write test for IPv6 address hashing**
-
-Add test in `src/exchange.zig` tests section:
-
-```zig
-test "addr_hash differentiates IPv4 and IPv6" {
-    const v4 = try std.net.Address.parseIp("127.0.0.1", 5683);
-    const v6 = try std.net.Address.parseIp("::1", 5683);
-    try testing.expect(Exchange.addr_hash(v4) != Exchange.addr_hash(v6));
-}
-
-test "peer_key differentiates IPv4 and IPv6" {
-    const v4 = try std.net.Address.parseIp("127.0.0.1", 5683);
-    const v6 = try std.net.Address.parseIp("::1", 5683);
-    const k4 = Exchange.peer_key(v4, 0x1234);
-    const k6 = Exchange.peer_key(v6, 0x1234);
-    try testing.expect(k4 != k6);
-}
-
-test "addr_hash different IPv6 addresses produce different hashes" {
-    const a = try std.net.Address.parseIp("::1", 5683);
-    const b = try std.net.Address.parseIp("fe80::1", 5683);
-    try testing.expect(Exchange.addr_hash(a) != Exchange.addr_hash(b));
-}
-```
-
-- [ ] **Step 2: Run tests — expect failures since current hashing truncates IPv6**
-
-Run: `zig build test`
-
-- [ ] **Step 3: Implement family-aware hashing**
-
-Replace `addr_hash` and `peer_key` to branch on family:
-
-```zig
-pub fn addr_hash(address: std.net.Address) u32 {
-    var hash: u32 = 0x811c9dc5;
-    const bytes = addrBytes(address);
-    for (bytes) |b| {
-        hash ^= b;
-        hash *%= 0x01000193;
-    }
-    return hash;
-}
-
-pub fn peer_key(address: std.net.Address, message_id: u16) u64 {
-    var hash: u64 = 0xcbf29ce484222325;
-    const bytes = addrBytes(address);
-    for (bytes) |b| {
-        hash ^= b;
-        hash *%= 0x100000001b3;
-    }
-    hash ^= @as(u64, message_id);
-    hash *%= 0x100000001b3;
-    return hash;
-}
-
-fn addrBytes(address: std.net.Address) []const u8 {
-    return switch (address.any.family) {
-        posix.AF.INET => std.mem.asBytes(&address.in),
-        posix.AF.INET6 => std.mem.asBytes(&address.in6),
-        else => std.mem.asBytes(&address.in),
-    };
-}
-```
-
-Note: import `posix` at the top: `const posix = std.posix;`
-
-- [ ] **Step 4: Run tests — all pass**
-
-Run: `zig build test`
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add src/exchange.zig
-git commit -m "exchange: family-aware address hashing for IPv6"
-```
-
----
-
-### Task 2: AddrKey type in rate_limiter.zig
-
-**Files:**
-- Modify: `src/rate_limiter.zig` (full file)
-
-- [ ] **Step 1: Write test for IPv6 rate limiting**
-
-Add test:
-
-```zig
-test "IPv6 addresses independent from IPv4" {
-    var rl = try RateLimiter.init(testing.allocator, .{
-        .ip_count = 8,
-        .tokens_per_sec = 10,
-        .burst = 2,
-    });
-    defer rl.deinit(testing.allocator);
-
-    const v4 = AddrKey.fromAddress(try std.net.Address.parseIp("127.0.0.1", 5683));
-    const v6 = AddrKey.fromAddress(try std.net.Address.parseIp("::1", 5683));
-    const now: i64 = 1_000_000_000;
-
-    try testing.expect(rl.allow(v4, now));
-    try testing.expect(rl.allow(v4, now));
-    try testing.expect(!rl.allow(v4, now)); // exhausted
-
-    // IPv6 has its own bucket
-    try testing.expect(rl.allow(v6, now));
-    try testing.expect(rl.allow(v6, now));
-    try testing.expect(!rl.allow(v6, now));
-}
-```
-
-- [ ] **Step 2: Define `AddrKey` and update `Slot`**
-
-Add `AddrKey` struct:
-
-```zig
-pub const AddrKey = struct {
-    family: u16,
-    addr: [16]u8,
-
-    pub const zero: AddrKey = .{ .family = 0, .addr = .{0} ** 16 };
-
-    pub fn fromAddress(address: std.net.Address) AddrKey {
-        return switch (address.any.family) {
-            posix.AF.INET => .{
-                .family = posix.AF.INET,
-                .addr = blk: {
-                    var a: [16]u8 = .{0} ** 16;
-                    const src: [4]u8 = @bitCast(address.in.sa.addr);
-                    @memcpy(a[0..4], &src);
-                    break :blk a;
-                },
-            },
-            posix.AF.INET6 => .{
-                .family = posix.AF.INET6,
-                .addr = address.in6.sa.addr,
-            },
-            else => zero,
-        };
-    }
-
-    pub fn eql(a: AddrKey, b: AddrKey) bool {
-        return a.family == b.family and std.mem.eql(u8, &a.addr, &b.addr);
-    }
-
-    pub fn hash(self: AddrKey) u64 {
-        var h: u64 = 0xcbf29ce484222325;
-        const fam_bytes: [2]u8 = @bitCast(self.family);
-        for (fam_bytes) |b| {
-            h ^= b;
-            h *%= 0x100000001b3;
-        }
-        for (self.addr) |b| {
-            h ^= b;
-            h *%= 0x100000001b3;
-        }
-        return h;
-    }
-};
-```
-
-Add `const posix = std.posix;` import.
-
-- [ ] **Step 3: Replace `ip_addr: u32` with `addr_key: AddrKey` throughout**
-
-Change `Slot.ip_addr: u32` → `Slot.addr_key: AddrKey`.
-
-Change `allow(self, ip_addr: u32, now_ns)` → `allow(self, addr_key: AddrKey, now_ns)`.
-
-Change `find_slot(self, key, ip_addr)` → `find_slot(self, key, addr_key)` with `self.slots[slot_idx].addr_key.eql(addr_key)`.
-
-Change `allocate_slot(self, key, ip_addr, now_ns)` → `allocate_slot(self, key, addr_key, now_ns)`.
-
-Change `hash_ip(ip: u32)` → use `addr_key.hash()` directly.
-
-Change `remove_slot` to use `self.slots[si].addr_key.hash()` for rehashing.
-
-Update `reset()` to use `AddrKey.zero`.
-
-- [ ] **Step 4: Update existing tests to use AddrKey**
-
-Replace all `const ip: u32 = 0x7F000001` with `const ip = AddrKey.fromAddress(try std.net.Address.parseIp("127.0.0.1", 5683))` and raw IP literals with AddrKey equivalents.
-
-- [ ] **Step 5: Run tests**
-
-Run: `zig build test`
-Expected: All pass.
-
-- [ ] **Step 6: Commit**
-
-```bash
-git add src/rate_limiter.zig
-git commit -m "rate_limiter: AddrKey for IPv4/IPv6 address-agnostic rate limiting"
-```
-
----
-
-## Chunk 2: Transport Layer
-
-### Task 3: Io.zig — IPv6 socket and decode_recv
-
-**Files:**
-- Modify: `src/Io.zig:74-128` (setup) and `src/Io.zig:189-219` (decode_recv)
-
-- [ ] **Step 1: Update `setup()` — remove AF_INET restriction, create socket from family**
-
-Replace lines 78-86:
-
-```zig
-const family = address.any.family;
-const fd = try posix.socket(family, posix.SOCK.DGRAM, 0);
-io.fd_socket = fd;
-
-// Enable dual-stack for IPv6 sockets.
-if (family == posix.AF.INET6) {
-    const v6only = std.mem.toBytes(@as(c_int, 0));
-    posix.setsockopt(fd, posix.IPPROTO.IPV6, linux.IPV6.V6ONLY, &v6only) catch |err| {
-        log.debug("IPV6_V6ONLY: {}", .{err});
-    };
-}
-```
-
-Remove the `if (address.any.family != posix.AF.INET) return error.UnsupportedAddressFamily;` line and the comment above it.
-
-- [ ] **Step 2: Update `decode_recv()` — handle both address families**
-
-Replace lines 205-212:
-
-```zig
-const peer_family: u16 = @bitCast(io.buffers[name_offset..][0..2].*);
-const net_address: std.net.Address = if (peer_family == posix.AF.INET) blk: {
-    const sa: *const linux.sockaddr.in = @ptrCast(@alignCast(io.buffers.ptr + name_offset));
-    break :blk .{ .in = sa.* };
-} else if (peer_family == posix.AF.INET6) blk: {
-    const sa: *const linux.sockaddr.in6 = @ptrCast(@alignCast(io.buffers.ptr + name_offset));
-    break :blk .{ .in6 = sa.* };
-} else return error.UnsupportedAddressFamily;
-```
-
-- [ ] **Step 3: Run tests**
-
-Run: `zig build test`
-Expected: All pass (existing tests use IPv4, still work).
-
-- [ ] **Step 4: Add IPv6 SO_REUSEPORT test**
-
-```zig
-test "SO_REUSEPORT allows two IPv6 instances on same port" {
-    const port: u16 = 19692;
-    const allocator = testing.allocator;
-
-    var io1 = try Io.init(allocator, 4, 256);
-    defer io1.deinit(allocator);
-    io1.setup(port, "::") catch |err| switch (err) {
-        error.AddressInUse => return, // CI may not support IPv6
-        else => return err,
-    };
-
-    var io2 = try Io.init(allocator, 4, 256);
-    defer io2.deinit(allocator);
-    try io2.setup(port, "::");
-}
-```
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add src/Io.zig
-git commit -m "io: IPv6 socket support, dual-stack, family-aware decode_recv"
-```
-
----
-
-### Task 4: Server.zig — address storage, send path, recv buffer
-
-**Files:**
-- Modify: `src/Server.zig` — struct fields, init, listen, send_raw, rate limiting
-
-- [ ] **Step 1: Change `addr_recv` type**
-
-Line 125: Change `addr_recv: linux.sockaddr` → `addr_recv: linux.sockaddr.storage`
-
-(`sockaddr.storage` is large enough for any address family.)
-
-If `linux.sockaddr.storage` is not available in Zig 0.15's linux definitions, use a raw byte buffer: `addr_recv: [128]u8 align(8)` and adjust casts.
-
-- [ ] **Step 2: Update `msg_recv.namelen` in `listen()`**
-
-Line 391: Change `@sizeOf(linux.sockaddr)` → `@sizeOf(@TypeOf(server.addr_recv))`
-
-- [ ] **Step 3: Change `addrs_response` type**
-
-Line 115: Change `addrs_response: []linux.sockaddr` → `addrs_response: []std.net.Address`
-
-Update the allocation in `init()` (around line 270):
-```zig
-const addrs_response = try allocator.alloc(std.net.Address, batch);
-```
-
-Update `deinit()` to free `std.net.Address` slice.
-
-- [ ] **Step 4: Update `send_raw()`**
-
-Line 1635: Change `server.addrs_response[index] = peer_address.any` → `server.addrs_response[index] = peer_address`
-
-Line 1643: Change `.name = @ptrCast(&server.addrs_response[index])` → `.name = @ptrCast(&server.addrs_response[index])`
-(The pointer cast should work since Address is an extern union starting with sockaddr.)
-
-Line 1644: Change `.namelen = @sizeOf(linux.sockaddr)` → `.namelen = peer_address.getOsSockLen()`
-
-**Important:** We need to store the namelen per response too, since different peers may have different families on a dual-stack socket. Add a `namelen_response: []u32` field, or compute from the stored Address at send time.
-
-Simpler approach: compute from the stored address:
-```zig
-.namelen = server.addrs_response[index].getOsSockLen(),
-```
-
-- [ ] **Step 5: Update rate limiting call site**
-
-Line ~805: Change `recv.peer_address.in.sa.addr` → `RateLimiter.AddrKey.fromAddress(recv.peer_address)`.
-
-The `allow()` call becomes `rl.allow(RateLimiter.AddrKey.fromAddress(recv.peer_address), ...)`.
-
-- [ ] **Step 6: Run tests**
-
-Run: `zig build test`
-Expected: All pass.
-
-- [ ] **Step 7: Commit**
-
-```bash
-git add src/Server.zig
-git commit -m "server: IPv6-capable address storage, send path, recv buffer"
-```
-
----
-
-### Task 5: Client.zig — remove AF_INET restriction
-
-**Files:**
-- Modify: `src/Client.zig:380-388`
-
-- [ ] **Step 1: Remove AF_INET check and hardcoded socket family**
-
-Line 381: Remove `if (dest.any.family != posix.AF.INET) return error.UnsupportedAddressFamily;`
-
-Line 383: Change `posix.AF.INET` → `dest.any.family`
-
-- [ ] **Step 2: Update Config doc comment**
-
-Line 37: Change `/// Server IPv4 address.` → `/// Server address (IPv4 or IPv6).`
-
-- [ ] **Step 3: Run tests**
-
-Run: `zig build test`
-
-- [ ] **Step 4: Commit**
-
-```bash
-git add src/Client.zig
-git commit -m "client: support IPv6 host addresses"
-```
-
----
-
-## Chunk 3: DTLS and Docs
-
-### Task 6: DTLS Cookie.zig and Session.zig
-
-**Files:**
-- Modify: `src/dtls/Cookie.zig:8-9`
-- Modify: `src/dtls/Session.zig:349-363`
-
-- [ ] **Step 1: Fix Cookie.zig address hashing**
-
-Replace line 9 (`@bitCast(client_addr.any)`) with family-aware hashing. Use the `addrBytes` helper pattern from exchange.zig, or import it:
-
-```zig
-const addr_bytes = switch (client_addr.any.family) {
-    posix.AF.INET => std.mem.asBytes(&client_addr.in),
-    posix.AF.INET6 => std.mem.asBytes(&client_addr.in6),
-    else => std.mem.asBytes(&client_addr.in),
-};
-// ...
-h.update(addr_bytes);
-```
-
-- [ ] **Step 2: Fix Session.zig addrHash and addrEqual**
-
-```zig
-fn addrHash(addr: std.net.Address) u64 {
-    const bytes = switch (addr.any.family) {
-        posix.AF.INET => std.mem.asBytes(&addr.in),
-        posix.AF.INET6 => std.mem.asBytes(&addr.in6),
-        else => std.mem.asBytes(&addr.in),
-    };
-    var hash: u64 = 0xcbf29ce484222325;
-    for (bytes) |b| {
-        hash ^= b;
-        hash *%= 0x100000001b3;
-    }
-    return hash;
-}
-
-fn addrEqual(a: std.net.Address, b: std.net.Address) bool {
-    if (a.any.family != b.any.family) return false;
-    return switch (a.any.family) {
-        posix.AF.INET => std.mem.eql(u8, std.mem.asBytes(&a.in), std.mem.asBytes(&b.in)),
-        posix.AF.INET6 => std.mem.eql(u8, std.mem.asBytes(&a.in6), std.mem.asBytes(&b.in6)),
-        else => false,
-    };
-}
-```
-
-- [ ] **Step 3: Add IPv6 tests for Cookie and Session**
-
-- [ ] **Step 4: Run tests**
-
-Run: `zig build test`
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add src/dtls/Cookie.zig src/dtls/Session.zig
-git commit -m "dtls: fix address truncation for IPv6 in Cookie and Session"
-```
-
----
-
-### Task 7: Integration test, README, roadmap
-
-**Files:**
-- Modify: `src/Server.zig` tests — add IPv6 round-trip test
-- Modify: `README.md`
-- Modify: `docs/ROADMAP.md`
-
-- [ ] **Step 1: Add IPv6 loopback round-trip test**
-
-```zig
-test "round-trip: NON echo via IPv6 loopback" {
-    const port: u16 = 19715;
-
-    var server = Server.init(testing.allocator, .{
-        .port = port,
-        .bind_address = "::1",
-        .buffer_count = 8,
-        .buffer_size = 1280,
-    }, echo_handler) catch |err| switch (err) {
-        error.AddressNotAvailable => return, // skip if no IPv6
-        else => return err,
-    };
-    defer server.deinit();
-    try setup_for_test(&server);
-
-    // ... send NON via IPv6, verify response
-}
-```
-
-Note: the `test_client` helper needs updating to support IPv6 (use `parseIp` + family-based socket).
-
-- [ ] **Step 2: Update README roadmap checklist**
-
-Mark `[x] IPv6` in the roadmap section.
-
-- [ ] **Step 3: Update `docs/ROADMAP.md`**
-
-Change 1.4 status to `[x]` done.
-
-- [ ] **Step 4: Run full test suite and benchmarks**
-
-```bash
-zig build test
-zig build bench -Doptimize=ReleaseFast
-```
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add src/Server.zig README.md docs/ROADMAP.md
-git commit -m "ipv6: integration test, update README and roadmap"
-```
diff --git a/docs/superpowers/specs/2026-03-15-ipv6-design.md b/docs/superpowers/specs/2026-03-15-ipv6-design.md
deleted file mode 100644
index f911db9..0000000
--- a/docs/superpowers/specs/2026-03-15-ipv6-design.md
+++ /dev/null
@@ -1,159 +0,0 @@
-# IPv6 Support Design (RFC 7252 §1, Roadmap 1.4)
-
-## Goal
-
-Remove all IPv4-only hardcoding. Support IPv6 and dual-stack operation. Address family is auto-detected from the bind/host address string — no new config fields.
-
-## Architecture
-
-The bind address string determines the socket family. `"0.0.0.0"` creates AF_INET, `"::"` creates AF_INET6 with `IPV6_V6ONLY=0` (dual-stack, accepts both v4 and v6 clients). The hot path gains one well-predicted branch in `decode_recv()` to interpret the sockaddr.
-
-**Dual-stack note:** When an AF_INET6 socket receives from an IPv4 client, the kernel presents the peer as an IPv4-mapped IPv6 address (`::ffff:a.b.c.d`, family AF_INET6). All rate limiting and exchange hashing keys will be in IPv6 format. A client connecting to `0.0.0.0` vs `::` produces different keys — this is expected and correct.
-
-## Changes by File
-
-### Io.zig
-
-**`setup()`:**
-- Remove the `AF.INET` guard and `UnsupportedAddressFamily` error.
-- Create socket with `address.any.family` instead of hardcoded `AF.INET`.
-- For `AF.INET6`: set `IPV6_V6ONLY` to `0` (dual-stack) via `setsockopt`.
-- `bind()` already uses `address.getOsSockLen()` — no change needed.
-
-**`addr_recv` field:**
-- Change from `linux.sockaddr` (16 bytes) to `linux.sockaddr.in6` (28 bytes) or
-  use a buffer large enough for both families. The kernel writes the peer address
-  here during recvmsg; if the buffer is too small, the address is truncated.
-
-**`msg_recv.namelen`:**
-- Change from hardcoded `@sizeOf(linux.sockaddr)` to `@sizeOf(linux.sockaddr.in6)`.
-  This is safe for both families — the kernel writes the actual address length
-  and the unused bytes are ignored.
-
-**`decode_recv()`:**
-- Read the family from the first 2 bytes of the name area in the recvmsg buffer.
-- Branch on family:
-  - `AF.INET`: cast to `sockaddr.in`, construct `Address` via `.{ .in = sa.* }`.
-  - `AF.INET6`: cast to `sockaddr.in6`, construct `Address` via `.{ .in6 = sa.* }`.
-  - Other: return `error.UnsupportedAddressFamily`.
-- The branch is well-predicted since all packets on a given socket share the same family.
-
-### Server.zig
-
-**`addrs_response` storage:**
-- Change type from `[]linux.sockaddr` (16 bytes) to `[]std.net.Address`.
-  `std.net.Address` is the natural choice since `decode_recv` already returns it.
-
-**`send_raw()`:**
-- Store full `peer_address` (not just `.any` which truncates IPv6).
-- Use `peer_address.getOsSockLen()` for `namelen` instead of `@sizeOf(linux.sockaddr)`.
-- The `.name` pointer must point to the correct sockaddr variant (`.any` for IPv4,
-  needs to point to the beginning of the Address which overlays correctly).
-
-**Rate limiting call site (line ~805):**
-- Change `recv.peer_address.in.sa.addr` to use the new `AddrKey.fromAddress()`.
-
-**Config:**
-- `bind_address` default stays `"0.0.0.0"` for backward compatibility.
-- Doc comment updated to mention `"::"` for dual-stack IPv6.
-
-### Client.zig
-
-**`init()`:**
-- Remove `AF.INET` family check and `UnsupportedAddressFamily` error.
-- Create socket with `dest.any.family` instead of hardcoded `AF.INET`.
-
-**Config:**
-- `host` default stays `"127.0.0.1"` for backward compatibility.
-- Doc comment updated to mention IPv6 (`"::1"`).
-
-### rate_limiter.zig
-
-**New `AddrKey` type:**
-
-```
-AddrKey = struct {
-    family: u16,
-    addr: [16]u8,   // 4 bytes used for IPv4, 16 for IPv6
-    // zero-initialized remainder for IPv4
-
-    fn fromAddress(address: std.net.Address) AddrKey
-    fn eql(a: AddrKey, b: AddrKey) bool
-    fn hash(self: AddrKey) u32   // FNV-1a over family + addr
-}
-```
-
-**Slot change:** `ip_addr: u32` → `addr_key: AddrKey`
-
-**API change:** `allow(ip_addr: u32, now_ns: i64)` → `allow(addr_key: AddrKey, now_ns: i64)`
-
-Table indexing uses `addr_key.hash()`. Equality uses `addr_key.eql()` — no collisions.
-Port is intentionally excluded — multiple connections from the same IP share a
-token bucket, which is correct for abuse prevention.
-
-### exchange.zig
-
-**`addr_hash()` and `peer_key()`:**
-
-Replace `@bitCast(address.any)` → `[16]u8` with a family-aware approach:
-
-```
-fn addr_hash(address: std.net.Address) u32 {
-    const bytes = switch (address.any.family) {
-        AF.INET => std.mem.asBytes(&address.in),
-        AF.INET6 => std.mem.asBytes(&address.in6),
-        else => std.mem.asBytes(&address.in),
-    };
-    // FNV-1a over bytes
-}
-```
-
-Branch on family and hash the correct union variant's bytes. This avoids
-tagged-union layout ambiguity and captures the full address for both families.
-Same pattern for `peer_key()`.
-
-### dtls/Cookie.zig
-
-**`generate()` and `verify()` (Security-critical):**
-
-Line 9: `const addr_bytes: [16]u8 = @bitCast(client_addr.any)` truncates IPv6
-addresses to 16 bytes — different IPv6 clients could produce the same cookie,
-weakening spoofing protection.
-
-Fix: use the same family-aware byte extraction as exchange.zig. Hash the full
-`sockaddr.in` or `sockaddr.in6` depending on family.
-
-### dtls/Session.zig
-
-**`addrHash()` and `addrEqual()` (Security-critical):**
-
-Lines 350-361: Same `@bitCast(addr.any)` truncation issue. Two different IPv6
-addresses could hash/compare as equal, causing session table collisions and
-potentially routing decrypted data to the wrong session.
-
-Fix: `addrHash` uses family-aware full-address hashing. `addrEqual` compares
-the correct union variant based on family.
-
-### Tests
-
-- Existing tests use `"127.0.0.1"` / `parseIp("127.0.0.1")` — continue to work (IPv4).
-- New tests bind to `"::1"` (IPv6 loopback) and verify round-trip.
-- Rate limiter tests: add IPv6 address test cases.
-- Exchange tests: add IPv6 address hashing test.
-- DTLS Cookie/Session tests: add IPv6 address cases.
-- handler.zig tests use `initIp4` — no change needed (tests don't touch the network).
-- Server/Client test helpers: update `test_client` to support IPv6 loopback.
-
-## Performance
-
-- **Hot path (decode_recv):** One branch on family (well-predicted). Same cost as before for IPv4-only deployments.
-- **Rate limiter:** `AddrKey` comparison is 18 bytes instead of 4. Still < 1 cache line. Negligible.
-- **Exchange hashing:** Hashes up to 28 bytes instead of 16 for IPv6. Negligible.
-- **addrs_response:** `std.net.Address` (~128 bytes) vs `sockaddr` (16 bytes). With batch of 256, adds ~28 KB. Acceptable.
-- **DTLS hashing:** Same as exchange — family branch + full address hash. Negligible.
-
-## Non-goals
-
-- IPv6 multicast (roadmap item 5.1, separate feature).
-- Configurable `IPV6_V6ONLY` — always dual-stack when binding `::`. Users who want v6-only can bind a specific IPv6 address.
-- DNS resolution (roadmap item 5.4, separate feature).

From e38be0efd41fe5e8387982fef804226ddcb82721 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christoffer=20Vikstr=C3=B6m?= <cvi@heya.tech>
Date: Mon, 16 Mar 2026 15:16:35 +0100
Subject: [PATCH 15/15] readme: link to protocol compliance roadmap

---
 README.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/README.md b/README.md
index bca24ad..ac9d07b 100644
--- a/README.md
+++ b/README.md
@@ -25,6 +25,8 @@ High-performance CoAP server and client library for Zig, built on Linux io_uring
 - DTLS 1.2 PSK handshake and encrypted transport
 - Pre-allocated in-flight tracking, zero hot-path allocations
 
+See the [protocol compliance roadmap](docs/ROADMAP.md) for planned features.
+
 ## Quick Start
 
 ### Server