fix(reasoning_gym_env): support concurrent WebSocket sessions by sergiopaniego · Pull Request #619 · meta-pytorch/OpenEnv

sergiopaniego · 2026-04-28T15:33:18Z

Summary

envs/reasoning_gym_env/server/app.py hardcoded max_concurrent_envs=1, so opening more than one WebSocket session against the deployed Space returned "server at capacity" errors. This PR matches the existing textarena_env pattern: the limit is read from the MAX_CONCURRENT_ENVS env var (default 8) and the server is given a factory function so each session gets its own environment instance.

ReasoningGymEnvironment already declares SUPPORTS_CONCURRENT_SESSIONS = True and takes no __init__ args, so the factory is a one-liner. The bug surfaced while running the new GRPO end-to-end tutorial (#618) against the deployed Space — TRL's GRPOTrainer opens gradient_accumulation_steps × per_device_train_batch_size parallel WebSocket sessions per generation batch and only the first was succeeding.

Type of Change

Alignment Checklist

Before submitting, verify:

I have read .claude/docs/PRINCIPLES.md and this PR aligns with our principles
I have checked .claude/docs/INVARIANTS.md and no invariants are violated
I have run /pre-submit-pr (or bash .claude/hooks/lint.sh and tests) and addressed all issues

RFC Status

Not required (bug fix, docs, minor refactoring)
RFC exists: #___
RFC needed (will create before merge)

Test Plan

Build and run the server, open ≥2 concurrent WebSocket sessions, verify both succeed
MAX_CONCURRENT_ENVS=2 env var override caps at 2, third connection rejected
Diff is purely additive: behavior with the default value (MAX_CONCURRENT_ENVS=8) is the same as the previous hardcoded max_concurrent_envs=1 would have been at 8

Claude Code Review

## Alignment Review Report

### Automated Checks
- Lint: PASS for the PR scope. The hook flags formatting in
  `envs/chat_env/`, `envs/repl_env/`, and `envs/textarena_env/`, none of
  which this PR touches — pre-existing on `main`, out of scope.
- Debug code: CLEAN for the PR scope. The hook flags `console.print`
  calls in `src/openenv/cli/commands/push.py` and pre-existing TODOs in
  `src/openenv/cli/`, none of which this PR touches.

### Open RFCs Context
- **RFC 005 (Agentic Harness Integration)** — *In Review*. This RFC
  covers wrapping external agentic harnesses inside OpenEnv containers
  and does not touch session capacity or factory wiring; no overlap
  with this PR.

### PR Scope
- 1 file changed: `envs/reasoning_gym_env/server/app.py` (+17 / -3).
  Adds an `os` import, an env-var read for `MAX_CONCURRENT_ENVS`, a
  factory function `create_reasoning_gym_environment`, and updates the
  `create_app` call to pass the factory and the env-var value.

### Tier 1: Fixes Required
None.

### Tier 2: Alignment Discussion

#### Principle Conflicts
None identified. The change:
- Mirrors the existing `envs/textarena_env/server/app.py` pattern
  (`MAX_CONCURRENT_ENVS` env var + factory function) — no new pattern
  introduced.
- Does not touch the client–server boundary (no client imports added,
  no server internals exposed).
- Does not move reward computation; rewards still live inside the env.
- Honors the env's pre-existing `SUPPORTS_CONCURRENT_SESSIONS = True`
  declaration; the bug was that `app.py` didn't wire it through.

#### RFC Conflicts
None identified.

### Summary
- 0 mechanical issues to fix
- 0 alignment points for human review
- 0 RFC conflicts to discuss

The server hardcoded `max_concurrent_envs=1`, which causes "server at capacity" errors when more than one client connects (e.g. TRL `GRPOTrainer` with `num_generations > 1` opens parallel rollouts). Match the textarena_env pattern: - Read `MAX_CONCURRENT_ENVS` env var (default 8) - Pass a factory function so each session gets its own environment instance `ReasoningGymEnvironment` already declares `SUPPORTS_CONCURRENT_SESSIONS = True` and has no `__init__` args, so the factory is a thin wrapper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-04-28T15:39:27Z

Greptile Summary

This PR fixes a bug in envs/reasoning_gym_env/server/app.py where max_concurrent_envs was hardcoded to 1, causing every WebSocket session beyond the first to be rejected with "server at capacity." The fix reads MAX_CONCURRENT_ENVS from the environment (default 8) and introduces a factory function so each session receives its own ReasoningGymEnvironment instance — directly mirroring the established textarena_env pattern and correctly wiring the env's pre-existing SUPPORTS_CONCURRENT_SESSIONS = True declaration.

Confidence Score: 4/5

Safe to merge — a minimal, well-scoped bug fix with only a P2 style note.

The change is a clean one-file fix that exactly mirrors a proven pattern in the codebase. The only finding is a P2 style suggestion (unguarded int() on an env var), which is consistent with the existing textarena_env copy of the same line and does not affect correctness under normal deployment.

No files require special attention.

Important Files Changed

Filename	Overview
envs/reasoning_gym_env/server/app.py	Adds `MAX_CONCURRENT_ENVS` env-var read (default 8) and a factory function so each WebSocket session gets its own `ReasoningGymEnvironment` instance; correctly mirrors the `textarena_env` pattern. One minor gap: `int()` on the env var is unguarded.

Sequence Diagram

sequenceDiagram
    participant GRPOTrainer
    participant Server as FastAPI Server
    participant Factory as create_reasoning_gym_environment()
    participant Env1 as ReasoningGymEnvironment (session 1)
    participant Env2 as ReasoningGymEnvironment (session 2)

    Note over Server: max_concurrent = int(os.getenv("MAX_CONCURRENT_ENVS", "8"))

    GRPOTrainer->>Server: WS /ws (session 1)
    Server->>Factory: call factory
    Factory->>Env1: ReasoningGymEnvironment()
    Env1-->>Server: instance 1
    Server-->>GRPOTrainer: session 1 established

    GRPOTrainer->>Server: WS /ws (session 2)
    Server->>Factory: call factory
    Factory->>Env2: ReasoningGymEnvironment()
    Env2-->>Server: instance 2
    Server-->>GRPOTrainer: session 2 established

    Note over Env1,Env2: Each session owns its own dataset iterator — no shared state

Prompt To Fix All With AI

This is a comment left during a code review.
Path: envs/reasoning_gym_env/server/app.py
Line: 48

Comment:
**Unguarded `int()` on env var**

If `MAX_CONCURRENT_ENVS` is set to a non-integer value (e.g. `"auto"` or an empty string), `int()` raises `ValueError` at module-import time, crashing the server before it can start with no useful error message. The `textarena_env` copy of this line has the same gap — consider adding a fallback for consistency in both places.

```suggestion
_max_concurrent_raw = os.getenv("MAX_CONCURRENT_ENVS", "8")
try:
    max_concurrent = int(_max_concurrent_raw)
except ValueError:
    raise ValueError(
        f"MAX_CONCURRENT_ENVS must be an integer, got: {_max_concurrent_raw!r}"
    ) from None
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(reasoning_gym_env): support concurre..." | Re-trigger Greptile}

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Darktex

Note: This is an automated review by Claude Code, not a human review.

Clean, correct bug fix. No Tier 1 or Tier 2 issues identified.

The change follows the established textarena_env pattern exactly: env-var-controlled concurrency limit with a safe int() parse and a one-liner factory so each WebSocket session gets its own ReasoningGymEnvironment instance. ReasoningGymEnvironment already declares SUPPORTS_CONCURRENT_SESSIONS = True and carries no shared mutable state between instances, so the factory is safe with no further changes required.

Automated checks: lint passes on the affected file, no debug code in scope.

Automated review by Claude Code | Learn more

Darktex

Note: This is an automated review by Claude Code, not a human review.

Alignment Review Report

Automated Checks

Lint: PASS - ruff check, ruff format --check, and usort check all pass on the post-PR file.
Debug code: CLEAN - No debug artifacts introduced by this PR. Pre-existing findings in src/openenv/auto/ and src/openenv/cli/ are out of scope.

Tier 1: Fixes Required

None.

The change is mechanically sound:

import os is placed correctly after the module docstring and before the try: import block. Passes usort/ruff without complaint.
int(_max_concurrent_raw) is called at module load time, so a misconfigured env var raises immediately on startup (explicit fail-fast, not a hidden runtime crash).
The factory function create_reasoning_gym_environment has the correct return type (ReasoningGymEnvironment), matching what create_app's env parameter expects (a Callable[[], Environment]).
max_concurrent is int, matching max_concurrent_envs: Optional[int] in create_app.

Tier 2: Alignment Discussion

None identified.

Walking through each relevant invariant:

Invariant	Status
One env = one trajectory (RFC 004 / PRINCIPLES.md)	Aligned - concurrent sessions each get their own `ReasoningGymEnvironment` instance via the factory; no trajectory multiplexing occurs.
Agents cannot reset (INVARIANTS.md)	Aligned - no change to what is exposed via MCP tools.
Client-server separation	Aligned - change is entirely inside `server/app.py`; client code untouched.
Rewards inside environment	Aligned - reward computation in `ReasoningGymEnvironment.step()` is untouched.
Gymnasium API signatures	Aligned - `reset` / `step` / `state` signatures unchanged.

The PR also correctly honours the environment's pre-existing SUPPORTS_CONCURRENT_SESSIONS = True declaration, which the old hardcoded max_concurrent_envs=1 silently contradicted. The pattern mirrors envs/textarena_env/server/app.py exactly: env-var-driven cap + factory function.

Summary

0 mechanical issues to fix
0 alignment points for human review

Straightforward bug fix. Approved.

Automated review by Claude Code | Learn more

Darktex

Note: This is an automated review by Claude Code, not a human review.

Summary

This PR fixes a hard-coded max_concurrent_envs=1 in envs/reasoning_gym_env/server/app.py that caused all but the first WebSocket session to receive a "server at capacity" error. It reads the limit from MAX_CONCURRENT_ENVS (defaulting to 8) and wraps instantiation in a factory function so each session gets its own ReasoningGymEnvironment instance — the same pattern already in use by textarena_env.

Tier 1 — Bugs / Quality

None spotted. The int() parse of the env var is guarded with a ValueError on bad input. ReasoningGymEnvironment.__init__ takes no arguments, making the factory a correct one-liner. Lint and format checks pass on the file.

One minor observation: the textarena_env reference implementation does not pass max_concurrent_envs explicitly (it relies on the create_app default), while this PR passes it explicitly. That is not a bug — being explicit is fine — but it is a small divergence from the pattern the PR claims to mirror. Not worth blocking.

Tier 2 — Alignment

Aligned with PRINCIPLES.md / INVARIANTS.md. The change:

Stays entirely inside server/app.py; no client-server boundary is crossed.
Does not touch reward computation, reset/step/state signatures, or MCP tooling.
Honors the pre-existing SUPPORTS_CONCURRENT_SESSIONS = True declaration on the environment class; the bug was purely that app.py never wired it through.
Follows the "one env = one trajectory" principle by giving each WebSocket session its own instance via the factory, rather than multiplexing.

Verdict

Clean bug fix with correct factory pattern, no invariant concerns; approve.

Automated review by Claude Code | Learn more

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 28, 2026

greptile-apps Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread envs/reasoning_gym_env/server/app.py Outdated

Update envs/reasoning_gym_env/server/app.py

da7f0eb

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Darktex approved these changes Apr 28, 2026

View reviewed changes

Darktex approved these changes Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(reasoning_gym_env): support concurrent WebSocket sessions#619

fix(reasoning_gym_env): support concurrent WebSocket sessions#619
sergiopaniego wants to merge 2 commits intometa-pytorch:mainfrom
sergiopaniego:feature/reasoning-gym-concurrent-sessions

sergiopaniego commented Apr 28, 2026

Uh oh!

greptile-apps Bot commented Apr 28, 2026

Uh oh!

Uh oh!

Darktex left a comment

Uh oh!

Darktex left a comment

Uh oh!

Darktex left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sergiopaniego commented Apr 28, 2026

Summary

Type of Change

Alignment Checklist

RFC Status

Test Plan

Claude Code Review

Uh oh!

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Alignment Review Report

Automated Checks

Tier 1: Fixes Required

Tier 2: Alignment Discussion

Summary

Uh oh!

Darktex left a comment

Choose a reason for hiding this comment

Summary

Tier 1 — Bugs / Quality

Tier 2 — Alignment

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants