Skip to content

fix(reasoning_gym_env): support concurrent WebSocket sessions#619

Open
sergiopaniego wants to merge 2 commits intometa-pytorch:mainfrom
sergiopaniego:feature/reasoning-gym-concurrent-sessions
Open

fix(reasoning_gym_env): support concurrent WebSocket sessions#619
sergiopaniego wants to merge 2 commits intometa-pytorch:mainfrom
sergiopaniego:feature/reasoning-gym-concurrent-sessions

Conversation

@sergiopaniego
Copy link
Copy Markdown
Contributor

Summary

envs/reasoning_gym_env/server/app.py hardcoded max_concurrent_envs=1, so opening more than one WebSocket session against the deployed Space returned "server at capacity" errors. This PR matches the existing textarena_env pattern: the limit is read from the MAX_CONCURRENT_ENVS env var (default 8) and the server is given a factory function so each session gets its own environment instance.

ReasoningGymEnvironment already declares SUPPORTS_CONCURRENT_SESSIONS = True and takes no __init__ args, so the factory is a one-liner. The bug surfaced while running the new GRPO end-to-end tutorial (#618) against the deployed Space — TRL's GRPOTrainer opens gradient_accumulation_steps × per_device_train_batch_size parallel WebSocket sessions per generation batch and only the first was succeeding.

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • New environment
  • Refactoring

Alignment Checklist

Before submitting, verify:

  • I have read .claude/docs/PRINCIPLES.md and this PR aligns with our principles
  • I have checked .claude/docs/INVARIANTS.md and no invariants are violated
  • I have run /pre-submit-pr (or bash .claude/hooks/lint.sh and tests) and addressed all issues

RFC Status

  • Not required (bug fix, docs, minor refactoring)
  • RFC exists: #___
  • RFC needed (will create before merge)

Test Plan

  • Build and run the server, open ≥2 concurrent WebSocket sessions, verify both succeed
  • MAX_CONCURRENT_ENVS=2 env var override caps at 2, third connection rejected
  • Diff is purely additive: behavior with the default value (MAX_CONCURRENT_ENVS=8) is the same as the previous hardcoded max_concurrent_envs=1 would have been at 8

Claude Code Review

## Alignment Review Report

### Automated Checks
- Lint: PASS for the PR scope. The hook flags formatting in
  `envs/chat_env/`, `envs/repl_env/`, and `envs/textarena_env/`, none of
  which this PR touches — pre-existing on `main`, out of scope.
- Debug code: CLEAN for the PR scope. The hook flags `console.print`
  calls in `src/openenv/cli/commands/push.py` and pre-existing TODOs in
  `src/openenv/cli/`, none of which this PR touches.

### Open RFCs Context
- **RFC 005 (Agentic Harness Integration)** — *In Review*. This RFC
  covers wrapping external agentic harnesses inside OpenEnv containers
  and does not touch session capacity or factory wiring; no overlap
  with this PR.

### PR Scope
- 1 file changed: `envs/reasoning_gym_env/server/app.py` (+17 / -3).
  Adds an `os` import, an env-var read for `MAX_CONCURRENT_ENVS`, a
  factory function `create_reasoning_gym_environment`, and updates the
  `create_app` call to pass the factory and the env-var value.

### Tier 1: Fixes Required
None.

### Tier 2: Alignment Discussion

#### Principle Conflicts
None identified. The change:
- Mirrors the existing `envs/textarena_env/server/app.py` pattern
  (`MAX_CONCURRENT_ENVS` env var + factory function) — no new pattern
  introduced.
- Does not touch the client–server boundary (no client imports added,
  no server internals exposed).
- Does not move reward computation; rewards still live inside the env.
- Honors the env's pre-existing `SUPPORTS_CONCURRENT_SESSIONS = True`
  declaration; the bug was that `app.py` didn't wire it through.

#### RFC Conflicts
None identified.

### Summary
- 0 mechanical issues to fix
- 0 alignment points for human review
- 0 RFC conflicts to discuss

The server hardcoded `max_concurrent_envs=1`, which causes "server at capacity"
errors when more than one client connects (e.g. TRL `GRPOTrainer` with
`num_generations > 1` opens parallel rollouts).

Match the textarena_env pattern:
- Read `MAX_CONCURRENT_ENVS` env var (default 8)
- Pass a factory function so each session gets its own environment instance

`ReasoningGymEnvironment` already declares `SUPPORTS_CONCURRENT_SESSIONS = True`
and has no `__init__` args, so the factory is a thin wrapper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 28, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

This PR fixes a bug in envs/reasoning_gym_env/server/app.py where max_concurrent_envs was hardcoded to 1, causing every WebSocket session beyond the first to be rejected with "server at capacity." The fix reads MAX_CONCURRENT_ENVS from the environment (default 8) and introduces a factory function so each session receives its own ReasoningGymEnvironment instance — directly mirroring the established textarena_env pattern and correctly wiring the env's pre-existing SUPPORTS_CONCURRENT_SESSIONS = True declaration.

Confidence Score: 4/5

Safe to merge — a minimal, well-scoped bug fix with only a P2 style note.

The change is a clean one-file fix that exactly mirrors a proven pattern in the codebase. The only finding is a P2 style suggestion (unguarded int() on an env var), which is consistent with the existing textarena_env copy of the same line and does not affect correctness under normal deployment.

No files require special attention.

Important Files Changed

Filename Overview
envs/reasoning_gym_env/server/app.py Adds MAX_CONCURRENT_ENVS env-var read (default 8) and a factory function so each WebSocket session gets its own ReasoningGymEnvironment instance; correctly mirrors the textarena_env pattern. One minor gap: int() on the env var is unguarded.

Sequence Diagram

sequenceDiagram
    participant GRPOTrainer
    participant Server as FastAPI Server
    participant Factory as create_reasoning_gym_environment()
    participant Env1 as ReasoningGymEnvironment (session 1)
    participant Env2 as ReasoningGymEnvironment (session 2)

    Note over Server: max_concurrent = int(os.getenv("MAX_CONCURRENT_ENVS", "8"))

    GRPOTrainer->>Server: WS /ws (session 1)
    Server->>Factory: call factory
    Factory->>Env1: ReasoningGymEnvironment()
    Env1-->>Server: instance 1
    Server-->>GRPOTrainer: session 1 established

    GRPOTrainer->>Server: WS /ws (session 2)
    Server->>Factory: call factory
    Factory->>Env2: ReasoningGymEnvironment()
    Env2-->>Server: instance 2
    Server-->>GRPOTrainer: session 2 established

    Note over Env1,Env2: Each session owns its own dataset iterator — no shared state
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: envs/reasoning_gym_env/server/app.py
Line: 48

Comment:
**Unguarded `int()` on env var**

If `MAX_CONCURRENT_ENVS` is set to a non-integer value (e.g. `"auto"` or an empty string), `int()` raises `ValueError` at module-import time, crashing the server before it can start with no useful error message. The `textarena_env` copy of this line has the same gap — consider adding a fallback for consistency in both places.

```suggestion
_max_concurrent_raw = os.getenv("MAX_CONCURRENT_ENVS", "8")
try:
    max_concurrent = int(_max_concurrent_raw)
except ValueError:
    raise ValueError(
        f"MAX_CONCURRENT_ENVS must be an integer, got: {_max_concurrent_raw!r}"
    ) from None
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(reasoning_gym_env): support concurre..." | Re-trigger Greptile

Comment thread envs/reasoning_gym_env/server/app.py Outdated
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@Darktex Darktex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Clean, correct bug fix. No Tier 1 or Tier 2 issues identified.

The change follows the established textarena_env pattern exactly: env-var-controlled concurrency limit with a safe int() parse and a one-liner factory so each WebSocket session gets its own ReasoningGymEnvironment instance. ReasoningGymEnvironment already declares SUPPORTS_CONCURRENT_SESSIONS = True and carries no shared mutable state between instances, so the factory is safe with no further changes required.

Automated checks: lint passes on the affected file, no debug code in scope.


Automated review by Claude Code | Learn more

Copy link
Copy Markdown
Contributor

@Darktex Darktex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Alignment Review Report

Automated Checks

  • Lint: PASS - ruff check, ruff format --check, and usort check all pass on the post-PR file.
  • Debug code: CLEAN - No debug artifacts introduced by this PR. Pre-existing findings in src/openenv/auto/ and src/openenv/cli/ are out of scope.

Tier 1: Fixes Required

None.

The change is mechanically sound:

  • import os is placed correctly after the module docstring and before the try: import block. Passes usort/ruff without complaint.
  • int(_max_concurrent_raw) is called at module load time, so a misconfigured env var raises immediately on startup (explicit fail-fast, not a hidden runtime crash).
  • The factory function create_reasoning_gym_environment has the correct return type (ReasoningGymEnvironment), matching what create_app's env parameter expects (a Callable[[], Environment]).
  • max_concurrent is int, matching max_concurrent_envs: Optional[int] in create_app.

Tier 2: Alignment Discussion

None identified.

Walking through each relevant invariant:

Invariant Status
One env = one trajectory (RFC 004 / PRINCIPLES.md) Aligned - concurrent sessions each get their own ReasoningGymEnvironment instance via the factory; no trajectory multiplexing occurs.
Agents cannot reset (INVARIANTS.md) Aligned - no change to what is exposed via MCP tools.
Client-server separation Aligned - change is entirely inside server/app.py; client code untouched.
Rewards inside environment Aligned - reward computation in ReasoningGymEnvironment.step() is untouched.
Gymnasium API signatures Aligned - reset / step / state signatures unchanged.

The PR also correctly honours the environment's pre-existing SUPPORTS_CONCURRENT_SESSIONS = True declaration, which the old hardcoded max_concurrent_envs=1 silently contradicted. The pattern mirrors envs/textarena_env/server/app.py exactly: env-var-driven cap + factory function.

Summary

  • 0 mechanical issues to fix
  • 0 alignment points for human review

Straightforward bug fix. Approved.


Automated review by Claude Code | Learn more

Copy link
Copy Markdown
Contributor

@Darktex Darktex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is an automated review by Claude Code, not a human review.


Summary

This PR fixes a hard-coded max_concurrent_envs=1 in envs/reasoning_gym_env/server/app.py that caused all but the first WebSocket session to receive a "server at capacity" error. It reads the limit from MAX_CONCURRENT_ENVS (defaulting to 8) and wraps instantiation in a factory function so each session gets its own ReasoningGymEnvironment instance — the same pattern already in use by textarena_env.

Tier 1 — Bugs / Quality

  • None spotted. The int() parse of the env var is guarded with a ValueError on bad input. ReasoningGymEnvironment.__init__ takes no arguments, making the factory a correct one-liner. Lint and format checks pass on the file.

One minor observation: the textarena_env reference implementation does not pass max_concurrent_envs explicitly (it relies on the create_app default), while this PR passes it explicitly. That is not a bug — being explicit is fine — but it is a small divergence from the pattern the PR claims to mirror. Not worth blocking.

Tier 2 — Alignment

Aligned with PRINCIPLES.md / INVARIANTS.md. The change:

  • Stays entirely inside server/app.py; no client-server boundary is crossed.
  • Does not touch reward computation, reset/step/state signatures, or MCP tooling.
  • Honors the pre-existing SUPPORTS_CONCURRENT_SESSIONS = True declaration on the environment class; the bug was purely that app.py never wired it through.
  • Follows the "one env = one trajectory" principle by giving each WebSocket session its own instance via the factory, rather than multiplexing.

Verdict

Clean bug fix with correct factory pattern, no invariant concerns; approve.


Automated review by Claude Code | Learn more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants