feat(rl): add rollout collector and JSONL export CLI#335
feat(rl): add rollout collector and JSONL export CLI#335MagellaX wants to merge 7 commits intohud-evals:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e97fb4b914
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
hud/rl/collector.py
Outdated
| return [] | ||
|
|
||
| expanded_tasks = _expand_tasks(raw_tasks, group_size=group_size) | ||
| source_name = source if isinstance(source, str) else name |
There was a problem hiding this comment.
Preserve split in rollout source IDs
When source is a HuggingFace dataset and split is non-default, tasks are loaded from source:split but source_name is still set to the unsuffixed source. Because build_rollout_records() uses source in make_rollout_id(), collecting different splits can produce identical rollout IDs for matching prompts/task indexes, and the exported source field also loses split provenance. This can cause collisions or incorrect joins when combining train/test exports.
Useful? React with 👍 / 👎.
hud/cli/rollout.py
Outdated
| config = {"verbose": verbose, "validate_api_key": False} | ||
| if allowed_tools: | ||
| config["allowed_tools"] = allowed_tools | ||
| return OperatorAgent, config |
There was a problem hiding this comment.
Pass --model through to OpenAI rollouts
The OpenAI branch ignores the CLI --model value and always instantiates OperatorAgent with its default model, even though collect advertises --model as backend-specific. This means users cannot pin or vary OpenAI models for rollout collection, which can silently skew experiment reproducibility and comparisons.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Summary
hud.rlrollout schema and collector utilities to build offline rollout records fromrun_datasethud rollout collectCLI command to execute tasks and export trajectories to JSONLgroup_sizerepetition per task and deterministic rollout IDsImpact
This is a low-risk first step toward RL data workflows in the SDK.
Users can now collect structured trajectories from existing eval tasks without changing the training stack.
Continuation
This PR intentionally keeps scope tight. It sets up the next step to add richer trajectory metadata (e.g. action-level reward shaping fields) without reworking evaluation execution.
Note
Medium Risk
Introduces new execution/export pathways on top of
run_datasetand tightens JSON/JSONL task parsing in strict mode; incorrect coercion or stricter validation could affect rollout collection and local task ingestion behavior.Overview
Adds a new
hud rollout collectCLI workflow to execute eval tasks and export structured rollout trajectories to JSONL, including support for per-task repetition via--group-size, concurrency/step limits, tool allow/deny lists, and deterministicrollout_ids.Introduces a new
hud.rlmodule withRolloutRecordschema plus collector utilities that load tasks (local JSON/JSONL or dataset source + split), run them throughrun_dataset, coerce results into a normalizedTrace, and write records to disk. Task loading is tightened for rollout paths by adding astrictmode to_load_raw_from_fileto error on non-object entries with clearer location context, and new unit tests cover collector behavior and the CLI flow.Written by Cursor Bugbot for commit 92658ed. This will update automatically on new commits. Configure here.