Skip to content

net_ib: free GIN barrier buffer on failure#2227

Open
wanglei875 wants to merge 1 commit into
NVIDIA:devfrom
wanglei875:fix-gin-ib-p2pbarrier-cleanup
Open

net_ib: free GIN barrier buffer on failure#2227
wanglei875 wants to merge 1 commit into
NVIDIA:devfrom
wanglei875:fix-gin-ib-p2pbarrier-cleanup

Conversation

@wanglei875

@wanglei875 wanglei875 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Description

ncclGinIbP2PBarrier allocates a temporary dummy buffer and then calls ncclGinIbAllGather. If the allgather fails, the existing NCCLCHECK path returns before freeing the buffer.

Use a local cleanup label so the dummy buffer is released on both success and failure paths.

Validation:

  • git diff --check
  • source-level invariant check for the ncclGinIbP2PBarrier cleanup path

Copilot AI review requested due to automatic review settings June 9, 2026 03:23

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves error handling in the IB P2P barrier to correctly propagate failures and ensure allocated memory is released on error paths.

Changes:

  • Switches ncclGinIbP2PBarrier to use NCCLCHECKGOTO with an out cleanup label.
  • Ensures dummy is initialized and freed in a shared cleanup path.
  • Returns the actual ncclResult_t status instead of always returning success.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +168 to +173
int* dummy = nullptr;
NCCLCHECKGOTO(ncclIbMalloc((void**)&dummy, cComm->nranks * sizeof(int)), ret, out);
NCCLCHECKGOTO(ncclGinIbAllGather(cComm, dummy + cComm->rank, dummy, sizeof(int)), ret, out);
out:
if (dummy) free(dummy);
return ret;
Signed-off-by: WangLei <1539790288@qq.com>
@xiaofanl-nvidia

Copy link
Copy Markdown
Collaborator

++ @jynv to mirror if reasonable.

@xiaofanl-nvidia xiaofanl-nvidia requested a review from jynv June 23, 2026 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants