gRPC builder stream silently dies, builder never reconnects

## Summary

The gRPC bidirectional stream between `hydra-builder` and the queue runner silently dies. When this happens:

1. The **builder process keeps running** but stops receiving new work — no child `nix` processes, idle CPU
2. The **queue runner** still considers the builder's job slots as occupied (`currentJobs: 4`) and never reclaims them
3. `sinceLastPing` on the runner side grows indefinitely (observed 51,696s / ~14h and 39,156s / ~10.9h)
4. The builder's `--ping-interval 10` does not detect the dead stream or trigger reconnection

The only fix is to manually restart the builder service (`launchctl kickstart -k`), after which it immediately reconnects and starts receiving builds.

## Observed pattern

From the builder logs, the sequence before going silent is typically:

```
INFO  Finished building <drv>
INFO  Start uploading paths to queue runner directly
INFO  Finished uploading paths to queue runner directly. elapsed=138ms
INFO  Successfully completed build process for <drv>
INFO  Building <next-drv>                    ← receives new build
      warning: file ... does not exist in binary cache ...  ← fetching inputs
      ... (silence — no more log entries for 10+ hours)
```

The builder appears to get stuck during the input-fetching phase of a build. Meanwhile the gRPC stream dies — possibly because the long input-fetch blocks the ping response, or a network interruption goes undetected.

From the queue runner's `/status` endpoint:
```json
{
  "hostname": "builder-A",
  "sinceLastPing": 51696,
  "currentJobs": 4,
  "failedBuilds": 0,
  "succeededBuilds": 161
}
```

Other builders on the same network show `sinceLastPing: 2-6` and are healthy.

## Expected behavior

- The builder should detect the dead gRPC stream (via ping timeout or TCP keepalive) and reconnect automatically
- The queue runner should have a configurable timeout for `sinceLastPing` — after which it marks the builder as disconnected, reclaims the job slots, and reschedules the builds on other machines

## Environment

- Queue runner: `hydra-queue-runner 0.1.0-c1fe4808`
- Builder: `hydra-builder 0.1.0-c1fe4808`
- `--ping-interval 10` configured on all builders
- 10 darwin builders on a LAN, connected to queue runner via IPv6
- Observed across multiple builders, recurring after restarts

## Workaround

Restart the builder service. It reconnects immediately and starts receiving builds within seconds.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gRPC builder stream silently dies, builder never reconnects #1674

Summary

Observed pattern

Expected behavior

Environment

Workaround

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Uh oh!

gRPC builder stream silently dies, builder never reconnects #1674

Description

Summary

Observed pattern

Expected behavior

Environment

Workaround

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions