init: OR-accumulate hasMultiRankNvml across all rank pairs#2257
Open
EylonKrause wants to merge 1 commit into
Open
init: OR-accumulate hasMultiRankNvml across all rank pairs#2257EylonKrause wants to merge 1 commit into
EylonKrause wants to merge 1 commit into
Conversation
hasMultiRankNvml was assigned with = inside the nested rank-pair loop, so after the loop it reflected only the last pair (i=nranks-1, j=nranks-2) instead of whether ANY pair shares host+nvmlDev. The flag gates NVLS setup (ncclNvlsInit force-disables NVLS when set), so a multi-rank-per-NVML-device config could be mis-detected. Restore the OR-reduction semantics of the pre-refactor isMultiRankGpu check with |= (comm is calloc-zeroed). Signed-off-by: EylonKrause <eylon1909@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
In
initTransportsRank,comm->hasMultiRankNvmlis assigned with=inside the nested rank-pair loop:Each iteration overwrites the field, so after the loops it reflects only the last evaluated pair (
i=nranks-1, j=nranks-2), discarding every earlier pair. The field is documented as "if multiple ranks are using the NVML device" — a global OR over all pairs — and it gates NVLS setup:ncclNvlsInitforce-disables NVLS (and makesNCCL_NVLS_ENABLE=1an error) when it is set. With the bug, a configuration where a non-terminal pair of ranks shares host+nvmlDevbut the last pair does not is mis-detected asfalse(NVLS not disabled when it should be), and vice-versa. The pre-refactorisMultiRankGpucheck was a monotonic OR; the refactor replaced it with a per-iteration=, so this is a regression of the original semantics.Related Issues
None.
Changes & Impact
src/init.cc(initTransportsRank): use|=so the flag OR-accumulates across all rank pairs.commisncclCalloc-zeroed, so it startsfalseand the accumulation is well-defined. Restores the documented global-OR semantics. The common case (no ranks sharing a device) is unchanged; only multi-rank-per-NVML-device configurations (theNCCL_MULTI_RANK_GPU_ENABLEcase) are affected, and in the intended direction. Non-breaking.Performance Impact
None.
Testing
make src.build.=overwrites each iteration;|=accumulates) and match the pre-refactorisMultiRankGpuOR semantics. Demonstrating the downstream NVLS effect at runtime requires multiple ranks sharing one NVML device, i.e. multi-rank hardware.