Skip to content

[Fix] fix SP for InternS1 VL RL#1656

Open
tina-wen wants to merge 1 commit intoInternLM:mainfrom
tina-wen:rl_sp
Open

[Fix] fix SP for InternS1 VL RL#1656
tina-wen wants to merge 1 commit intoInternLM:mainfrom
tina-wen:rl_sp

Conversation

@tina-wen
Copy link
Copy Markdown
Contributor

@tina-wen tina-wen commented Apr 6, 2026

Root Cause

Under sequence parallelism, entropy statistics and GRPO batch loss calibration were computed from sharded tensors, which could produce incorrect token counts and inconsistent metrics across SP ranks. In addition, GRPO loss batching needed to accept the forwarded SP context from the worker.

Fix

Gather shifted labels and logprobs before entropy aggregation under SP, pass sp_mesh into RL loss batching from the training worker, and make GRPO batch construction use the SP-aware token counting path.

@tina-wen tina-wen changed the title [Fix] fix SP for VL-241B RL [Fix] fix SP for InternS1 VL RL Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant