feat(environments): per-agent dev environment isolation with tooling#44
feat(environments): per-agent dev environment isolation with tooling#44dennisonbertram wants to merge 2 commits intomainfrom
Conversation
Add sandboxed dev environments as a new resource type alongside services
and databases. Each environment is a gVisor-isolated container with a
persistent /workspace volume, shell access via WebSocket exec, and file
upload/download support.
API endpoints:
- POST/GET /v1/environments — create and list environments
- GET/DELETE /v1/environments/{id} — get and delete
- POST /v1/environments/{id}/start|stop — lifecycle control
- GET /v1/environments/{id}/exec — WebSocket bidirectional exec
- POST/GET /v1/environments/{id}/files — upload/download files
Key design decisions:
- WebSocket exec (not SSH) for simpler proxy, no key management
- Writable filesystem (unlike services) for package installation
- Persistent /workspace volume survives container restarts
- Auto-stop after configurable idle timeout (default 30min)
- Same per-tenant network so environments reach tenant's services/DBs
- Allowed base images: node:20, python:3.12, golang:1.25, ubuntu:24.04
Includes: migration (state_010), environment manager, idle detector,
Docker client extensions, reconciler/GC updates, 16 passing tests.
Closes #41
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2687abb96f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| _, err = m.db.ExecContext(ctx, | ||
| `UPDATE environments SET status = 'running', last_activity_at = ?, updated_at = ? WHERE id = ?`, | ||
| now, now, envID) |
There was a problem hiding this comment.
Reject start when stopped environment has no container
After reconciliation, missing environment containers are explicitly cleared (container_id = ''), but Start still updates the record to status='running' even when there is no container to start. In this state the API reports success yet /exec and /files calls fail because GetContainerID returns “environment has no container”. Start should fail (or recreate the container) instead of transitioning to running with an empty container_id.
Useful? React with 👍 / 👎.
| if err := tx.Commit(); err != nil { | ||
| return nil, fmt.Errorf("commit quota check: %w", err) | ||
| } |
There was a problem hiding this comment.
Keep quota check and insert in one transaction
The environment quota count is committed before the new row is inserted, so concurrent creates can both pass the count check and then both insert, exceeding max_environments. This is a classic check-then-act race; quota enforcement needs to include both the count check and insert in the same serialized transaction (or an equivalent atomic SQL pattern).
Useful? React with 👍 / 👎.
| r.Body = http.MaxBytesReader(w, r.Body, 50<<20) | ||
|
|
||
| if err := r.ParseMultipartForm(50 << 20); err != nil { |
There was a problem hiding this comment.
Fix effective upload limit for environment file API
This handler advertises a 50MB multipart limit, but requests are already wrapped by the global 1MB maxBodySize middleware in setupRoutes, so uploads over 1MB are rejected before this limit is applied. As written, the file-upload endpoint cannot support the documented/expected 50MB payloads.
Useful? React with 👍 / 👎.
…eview CRITICAL fixes: - WebSocket CheckOrigin now rejects browser requests with Origin header to prevent cross-site WebSocket hijacking - ExecAttach returns proper close function to prevent FD/connection leaks on every exec session - Reorganize maxBodySize middleware per-group so 50MB file uploads work (global 1MB cap was blocking them) - Safe string truncation in container naming to prevent panics on short IDs - Path traversal hardening: strict /workspace/ prefix check, tar header filename sanitization against ../../ attacks HIGH fixes: - Atomic quota check + INSERT in single transaction for serialization - WebSocket DoS controls: 1MB read limit, 60s deadlines, ping/pong - File download validates exact filename match (not first file in tar) - Global rate limiter added to WebSocket route group - Fix RunDatabase comment to match gVisor runtime code MEDIUM fixes: - TouchActivity now tenant-scoped (defense in depth) - Start/Stop SQL includes tenant_id in WHERE clause - IdleDetector: remove double rows.Close(), check rows.Err() LOW fixes: - Circuit breaker backoff uses actual circuit_open_count - Content-Disposition uses %q for proper filename quoting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
/workspacevolume, WebSocket shell exec, and file transfernode:20,python:3.12,golang:1.25,ubuntu:24.04New API surface
/v1/environments/v1/environments/v1/environments/{id}/v1/environments/{id}/v1/environments/{id}/start/v1/environments/{id}/stop/v1/environments/{id}/exec/v1/environments/{id}/files/v1/environments/{id}/files/*Files changed
internal/environments/— manager, idle detector, tests (16 passing)internal/api/environments.go— HTTP/WebSocket handlersinternal/db/migrations/state_010_environments.sql— schema + quotaTest plan
go build ./...cleanls, upload/download fileCloses #41
🤖 Generated with Claude Code