fix(#135): clean up Traefik network connections on tenant deletion#144
fix(#135): clean up Traefik network connections on tenant deletion#144dennisonbertram merged 2 commits intomainfrom
Conversation
Traefik accumulates connections to tenant networks that are never disconnected, preventing network removal and causing unbounded growth. - Add NetworkDisconnect, NetworkList, RemoveNetwork to Docker client interface and implementation - Disconnect Traefik and remove tenant network in StopAllForTenant after all containers are stopped - Add orphaned network sweep to GC loop: finds ah-tenant-* networks with 0-1 containers (only Traefik), disconnects and removes them - Add traefik_networks count to /v1/system/health/detailed with warning at >150 and degradation at >200 - Update MockDockerClient with new method stubs - Add tests for all three behaviors Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9fe3e5e0ee
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // A network with exactly 1 container is orphaned if that container is Traefik | ||
| // (the only reason Traefik is still connected is because disconnect was missed). | ||
| // Networks with 2+ containers have active tenant workloads — leave them alone. | ||
| if net.Containers <= 1 { |
There was a problem hiding this comment.
Exclude freshly created tenant networks from orphan sweep
The orphan predicate net.Containers <= 1 will also match newly created tenant networks that are temporarily empty during normal deploy/restart flows, because services.Manager calls EnsureNetwork(...) before image pull and before the container is attached. If GC runs in that window, it can remove the network and the later container create/start path can fail with a missing network, turning an otherwise valid deployment into a transient failure. This cleanup path needs a race guard (for example, a minimum-age check similar to minResourceAge used for containers/volumes, or additional ownership/state checks) before deleting empty tenant networks.
Useful? React with 👍 / 👎.
Summary
ah-tenant-*networks with 0 containerstraefik_networkscountNetworkDisconnect,NetworkList,RemoveNetworkto Docker client interfaceCloses #135
Test plan
go build ./...passesgo test ./...passes (pre-existing flaky test unrelated)🤖 Generated with Claude Code