fix(reasoning_gym_env): support concurrent WebSocket sessions#619
fix(reasoning_gym_env): support concurrent WebSocket sessions#619sergiopaniego wants to merge 2 commits intometa-pytorch:mainfrom
Conversation
The server hardcoded `max_concurrent_envs=1`, which causes "server at capacity" errors when more than one client connects (e.g. TRL `GRPOTrainer` with `num_generations > 1` opens parallel rollouts). Match the textarena_env pattern: - Read `MAX_CONCURRENT_ENVS` env var (default 8) - Pass a factory function so each session gets its own environment instance `ReasoningGymEnvironment` already declares `SUPPORTS_CONCURRENT_SESSIONS = True` and has no `__init__` args, so the factory is a thin wrapper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR fixes a bug in Confidence Score: 4/5Safe to merge — a minimal, well-scoped bug fix with only a P2 style note. The change is a clean one-file fix that exactly mirrors a proven pattern in the codebase. The only finding is a P2 style suggestion (unguarded No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant GRPOTrainer
participant Server as FastAPI Server
participant Factory as create_reasoning_gym_environment()
participant Env1 as ReasoningGymEnvironment (session 1)
participant Env2 as ReasoningGymEnvironment (session 2)
Note over Server: max_concurrent = int(os.getenv("MAX_CONCURRENT_ENVS", "8"))
GRPOTrainer->>Server: WS /ws (session 1)
Server->>Factory: call factory
Factory->>Env1: ReasoningGymEnvironment()
Env1-->>Server: instance 1
Server-->>GRPOTrainer: session 1 established
GRPOTrainer->>Server: WS /ws (session 2)
Server->>Factory: call factory
Factory->>Env2: ReasoningGymEnvironment()
Env2-->>Server: instance 2
Server-->>GRPOTrainer: session 2 established
Note over Env1,Env2: Each session owns its own dataset iterator — no shared state
Prompt To Fix All With AIThis is a comment left during a code review.
Path: envs/reasoning_gym_env/server/app.py
Line: 48
Comment:
**Unguarded `int()` on env var**
If `MAX_CONCURRENT_ENVS` is set to a non-integer value (e.g. `"auto"` or an empty string), `int()` raises `ValueError` at module-import time, crashing the server before it can start with no useful error message. The `textarena_env` copy of this line has the same gap — consider adding a fallback for consistency in both places.
```suggestion
_max_concurrent_raw = os.getenv("MAX_CONCURRENT_ENVS", "8")
try:
max_concurrent = int(_max_concurrent_raw)
except ValueError:
raise ValueError(
f"MAX_CONCURRENT_ENVS must be an integer, got: {_max_concurrent_raw!r}"
) from None
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(reasoning_gym_env): support concurre..." | Re-trigger Greptile |
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Darktex
left a comment
There was a problem hiding this comment.
Note: This is an automated review by Claude Code, not a human review.
Clean, correct bug fix. No Tier 1 or Tier 2 issues identified.
The change follows the established textarena_env pattern exactly: env-var-controlled concurrency limit with a safe int() parse and a one-liner factory so each WebSocket session gets its own ReasoningGymEnvironment instance. ReasoningGymEnvironment already declares SUPPORTS_CONCURRENT_SESSIONS = True and carries no shared mutable state between instances, so the factory is safe with no further changes required.
Automated checks: lint passes on the affected file, no debug code in scope.
Automated review by Claude Code | Learn more
Darktex
left a comment
There was a problem hiding this comment.
Note: This is an automated review by Claude Code, not a human review.
Alignment Review Report
Automated Checks
- Lint: PASS -
ruff check,ruff format --check, andusort checkall pass on the post-PR file. - Debug code: CLEAN - No debug artifacts introduced by this PR. Pre-existing findings in
src/openenv/auto/andsrc/openenv/cli/are out of scope.
Tier 1: Fixes Required
None.
The change is mechanically sound:
import osis placed correctly after the module docstring and before thetry:import block. Passesusort/ruffwithout complaint.int(_max_concurrent_raw)is called at module load time, so a misconfigured env var raises immediately on startup (explicit fail-fast, not a hidden runtime crash).- The factory function
create_reasoning_gym_environmenthas the correct return type (ReasoningGymEnvironment), matching whatcreate_app'senvparameter expects (aCallable[[], Environment]). max_concurrentisint, matchingmax_concurrent_envs: Optional[int]increate_app.
Tier 2: Alignment Discussion
None identified.
Walking through each relevant invariant:
| Invariant | Status |
|---|---|
| One env = one trajectory (RFC 004 / PRINCIPLES.md) | Aligned - concurrent sessions each get their own ReasoningGymEnvironment instance via the factory; no trajectory multiplexing occurs. |
| Agents cannot reset (INVARIANTS.md) | Aligned - no change to what is exposed via MCP tools. |
| Client-server separation | Aligned - change is entirely inside server/app.py; client code untouched. |
| Rewards inside environment | Aligned - reward computation in ReasoningGymEnvironment.step() is untouched. |
| Gymnasium API signatures | Aligned - reset / step / state signatures unchanged. |
The PR also correctly honours the environment's pre-existing SUPPORTS_CONCURRENT_SESSIONS = True declaration, which the old hardcoded max_concurrent_envs=1 silently contradicted. The pattern mirrors envs/textarena_env/server/app.py exactly: env-var-driven cap + factory function.
Summary
- 0 mechanical issues to fix
- 0 alignment points for human review
Straightforward bug fix. Approved.
Automated review by Claude Code | Learn more
Darktex
left a comment
There was a problem hiding this comment.
Note: This is an automated review by Claude Code, not a human review.
Summary
This PR fixes a hard-coded max_concurrent_envs=1 in envs/reasoning_gym_env/server/app.py that caused all but the first WebSocket session to receive a "server at capacity" error. It reads the limit from MAX_CONCURRENT_ENVS (defaulting to 8) and wraps instantiation in a factory function so each session gets its own ReasoningGymEnvironment instance — the same pattern already in use by textarena_env.
Tier 1 — Bugs / Quality
- None spotted. The
int()parse of the env var is guarded with aValueErroron bad input.ReasoningGymEnvironment.__init__takes no arguments, making the factory a correct one-liner. Lint and format checks pass on the file.
One minor observation: the textarena_env reference implementation does not pass max_concurrent_envs explicitly (it relies on the create_app default), while this PR passes it explicitly. That is not a bug — being explicit is fine — but it is a small divergence from the pattern the PR claims to mirror. Not worth blocking.
Tier 2 — Alignment
Aligned with PRINCIPLES.md / INVARIANTS.md. The change:
- Stays entirely inside
server/app.py; no client-server boundary is crossed. - Does not touch reward computation, reset/step/state signatures, or MCP tooling.
- Honors the pre-existing
SUPPORTS_CONCURRENT_SESSIONS = Truedeclaration on the environment class; the bug was purely thatapp.pynever wired it through. - Follows the "one env = one trajectory" principle by giving each WebSocket session its own instance via the factory, rather than multiplexing.
Verdict
Clean bug fix with correct factory pattern, no invariant concerns; approve.
Automated review by Claude Code | Learn more
Summary
envs/reasoning_gym_env/server/app.pyhardcodedmax_concurrent_envs=1, so opening more than one WebSocket session against the deployed Space returned "server at capacity" errors. This PR matches the existingtextarena_envpattern: the limit is read from theMAX_CONCURRENT_ENVSenv var (default8) and the server is given a factory function so each session gets its own environment instance.ReasoningGymEnvironmentalready declaresSUPPORTS_CONCURRENT_SESSIONS = Trueand takes no__init__args, so the factory is a one-liner. The bug surfaced while running the new GRPO end-to-end tutorial (#618) against the deployed Space — TRL'sGRPOTraineropensgradient_accumulation_steps × per_device_train_batch_sizeparallel WebSocket sessions per generation batch and only the first was succeeding.Type of Change
Alignment Checklist
Before submitting, verify:
.claude/docs/PRINCIPLES.mdand this PR aligns with our principles.claude/docs/INVARIANTS.mdand no invariants are violated/pre-submit-pr(orbash .claude/hooks/lint.shand tests) and addressed all issuesRFC Status
Test Plan
MAX_CONCURRENT_ENVS=2env var override caps at 2, third connection rejectedMAX_CONCURRENT_ENVS=8) is the same as the previous hardcodedmax_concurrent_envs=1would have been at8Claude Code Review