Skip to content

fix(#135): clean up Traefik network connections on tenant deletion#144

Merged
dennisonbertram merged 2 commits intomainfrom
issue-135-traefik-cleanup
Mar 22, 2026
Merged

fix(#135): clean up Traefik network connections on tenant deletion#144
dennisonbertram merged 2 commits intomainfrom
issue-135-traefik-cleanup

Conversation

@dennisonbertram
Copy link
Owner

Summary

  • Disconnects tenant Docker networks from Traefik on tenant deletion
  • GC detects and removes orphaned ah-tenant-* networks with 0 containers
  • Detailed health endpoint reports traefik_networks count
  • Health degrades when network count exceeds 200 (warns at 150)
  • Adds NetworkDisconnect, NetworkList, RemoveNetwork to Docker client interface

Closes #135

Test plan

  • GC tests: orphaned network removal, skip active networks
  • Service tests: disconnect called during tenant cleanup
  • Health tests: network count included, degradation threshold
  • go build ./... passes
  • go test ./... passes (pre-existing flaky test unrelated)

🤖 Generated with Claude Code

dennisonbertram and others added 2 commits March 22, 2026 07:42
Traefik accumulates connections to tenant networks that are never
disconnected, preventing network removal and causing unbounded growth.

- Add NetworkDisconnect, NetworkList, RemoveNetwork to Docker client
  interface and implementation
- Disconnect Traefik and remove tenant network in StopAllForTenant
  after all containers are stopped
- Add orphaned network sweep to GC loop: finds ah-tenant-* networks
  with 0-1 containers (only Traefik), disconnects and removes them
- Add traefik_networks count to /v1/system/health/detailed with
  warning at >150 and degradation at >200
- Update MockDockerClient with new method stubs
- Add tests for all three behaviors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9fe3e5e0ee

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// A network with exactly 1 container is orphaned if that container is Traefik
// (the only reason Traefik is still connected is because disconnect was missed).
// Networks with 2+ containers have active tenant workloads — leave them alone.
if net.Containers <= 1 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Exclude freshly created tenant networks from orphan sweep

The orphan predicate net.Containers <= 1 will also match newly created tenant networks that are temporarily empty during normal deploy/restart flows, because services.Manager calls EnsureNetwork(...) before image pull and before the container is attached. If GC runs in that window, it can remove the network and the later container create/start path can fail with a missing network, turning an otherwise valid deployment into a transient failure. This cleanup path needs a race guard (for example, a minimum-age check similar to minResourceAge used for containers/volumes, or additional ownership/state checks) before deleting empty tenant networks.

Useful? React with 👍 / 👎.

@dennisonbertram dennisonbertram merged commit a1b1dac into main Mar 22, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: clean up Traefik network connections on tenant deletion

1 participant