init: do not reject a config that sets only minCTAs or only maxCTAs#2256
Open
EylonKrause wants to merge 1 commit into
Open
init: do not reject a config that sets only minCTAs or only maxCTAs#2256EylonKrause wants to merge 1 commit into
EylonKrause wants to merge 1 commit into
Conversation
The min/maxCTAs validation in parseCommConfig guards the <=0 checks with NCCL_CONFIG_UNDEF_INT but the minCTAs>maxCTAs check does not. Since unset fields are NCCL_CONFIG_UNDEF_INT (INT_MIN), setting only minCTAs (maxCTAs left unset) makes minCTAs>maxCTAs (>INT_MIN) true, so a valid config is rejected with ncclInvalidArgument. Guard the comparison with the same UNDEF checks used by the sibling clauses. Signed-off-by: EylonKrause <eylon1909@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
parseCommConfigvalidates theminCTAs/maxCTAsconfig fields with three clauses. The first two guard the unset sentinelNCCL_CONFIG_UNDEF_INTbefore their<= 0checks, but the third clause —internalConfigPtr->minCTAs > internalConfigPtr->maxCTAs— does not. Unset config fields areNCCL_CONFIG_UNDEF_INT(INT_MIN), so a config that sets onlyminCTAs(leavingmaxCTAsunset) makesminCTAs > INT_MINtrue, andncclCommInitRankConfig/ncclCommSplitreject an otherwise-valid configuration withncclInvalidArgument.minCTAsandmaxCTAsare documented as independently settable, so setting just one should be allowed.Related Issues
None.
Changes & Impact
src/init.cc(parseCommConfig): guard theminCTAs > maxCTAscomparison with the sameNCCL_CONFIG_UNDEF_INTchecks the two sibling clauses already use, so it only fires when both bounds are actually set. A genuinely inverted pair (e.g.minCTAs=16, maxCTAs=4) is still rejected. Non-breaking.Performance Impact
None.
Testing
make src.build.ncclCommInitRankConfig(1 rank),config.minCTAs=4,maxCTAsleft unset:init.cc NCCL WARN Invalid config min/max channels attribute value 4/-2147483648→ returnsncclInvalidArgument(-2147483648is the unsetmaxCTAssentinel).ncclSuccess.