feat(rl): add rollout collector and JSONL export CLI by MagellaX · Pull Request #335 · hud-evals/hud-python

MagellaX · 2026-02-17T19:20:33Z

Summary

add hud.rl rollout schema and collector utilities to build offline rollout records from run_dataset
add hud rollout collect CLI command to execute tasks and export trajectories to JSONL
support group_size repetition per task and deterministic rollout IDs
add focused tests for collector grouping/coercion/export and CLI flow

Impact

This is a low-risk first step toward RL data workflows in the SDK.
Users can now collect structured trajectories from existing eval tasks without changing the training stack.

Continuation

This PR intentionally keeps scope tight. It sets up the next step to add richer trajectory metadata (e.g. action-level reward shaping fields) without reworking evaluation execution.

Note

Medium Risk
Introduces new execution/export pathways on top of run_dataset and tightens JSON/JSONL task parsing in strict mode; incorrect coercion or stricter validation could affect rollout collection and local task ingestion behavior.

Overview
Adds a new hud rollout collect CLI workflow to execute eval tasks and export structured rollout trajectories to JSONL, including support for per-task repetition via --group-size, concurrency/step limits, tool allow/deny lists, and deterministic rollout_ids.

Introduces a new hud.rl module with RolloutRecord schema plus collector utilities that load tasks (local JSON/JSONL or dataset source + split), run them through run_dataset, coerce results into a normalized Trace, and write records to disk. Task loading is tightened for rollout paths by adding a strict mode to _load_raw_from_file to error on non-object entries with clearer location context, and new unit tests cover collector behavior and the CLI flow.

^{Written by Cursor Bugbot for commit 92658ed. This will update automatically on new commits. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e97fb4b914

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-17T19:25:55Z

hud/rl/collector.py

+        return []
+
+    expanded_tasks = _expand_tasks(raw_tasks, group_size=group_size)
+    source_name = source if isinstance(source, str) else name


Preserve split in rollout source IDs

When source is a HuggingFace dataset and split is non-default, tasks are loaded from source:split but source_name is still set to the unsuffixed source. Because build_rollout_records() uses source in make_rollout_id(), collecting different splits can produce identical rollout IDs for matching prompts/task indexes, and the exported source field also loses split provenance. This can cause collisions or incorrect joins when combining train/test exports.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-17T19:25:55Z

hud/cli/rollout.py

+        config = {"verbose": verbose, "validate_api_key": False}
+        if allowed_tools:
+            config["allowed_tools"] = allowed_tools
+        return OperatorAgent, config


Pass --model through to OpenAI rollouts

The OpenAI branch ignores the CLI --model value and always instantiates OperatorAgent with its default model, even though collect advertises --model as backend-specific. This means users cannot pin or vary OpenAI models for rollout collection, which can silently skew experiment reproducibility and comparisons.

Useful? React with 👍 / 👎.

hud/cli/rollout.py

hud/rl/collector.py

hud/cli/rollout.py

hud/rl/collector.py

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

hud/cli/rollout.py

feat(rl): add rollout collector and rollout CLI

e97fb4b

chatgpt-codex-connector bot reviewed Feb 17, 2026

View reviewed changes

cursor bot reviewed Feb 17, 2026

View reviewed changes

hud/cli/rollout.py Outdated Show resolved Hide resolved

Merge upstream/main into feature/rollout-collector-mvp

9176ccb

cursor bot reviewed Feb 17, 2026

View reviewed changes

hud/rl/collector.py Show resolved Hide resolved

fix(rollout): enforce strict task parsing and wire auto_respond

bfb6e6b

cursor bot reviewed Feb 17, 2026

View reviewed changes

hud/rl/collector.py Show resolved Hide resolved

hud/rl/collector.py Outdated Show resolved Hide resolved

hud/cli/rollout.py Outdated Show resolved Hide resolved

fix(rollout): address split/source and shared loader feedback

27b4347

cursor bot reviewed Feb 17, 2026

View reviewed changes

hud/cli/rollout.py Outdated Show resolved Hide resolved

fix(rollout): preserve agent-native model defaults

5f232de

cursor bot reviewed Feb 18, 2026

View reviewed changes

hud/rl/collector.py Show resolved Hide resolved

fix(rollout): preserve split for slash-free hf datasets

b46a358

cursor bot reviewed Feb 18, 2026

View reviewed changes

hud/cli/rollout.py Outdated Show resolved Hide resolved

chore(rollout): remove redundant exception tuple

92658ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rl): add rollout collector and JSONL export CLI#335

feat(rl): add rollout collector and JSONL export CLI#335
MagellaX wants to merge 7 commits intohud-evals:mainfrom
MagellaX:feature/rollout-collector-mvp

MagellaX commented Feb 17, 2026 •

edited by cursor bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 17, 2026

Uh oh!

chatgpt-codex-connector bot Feb 17, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MagellaX commented Feb 17, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Impact

Continuation

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MagellaX commented Feb 17, 2026 •

edited by cursor bot

Loading