Skip to content

net/ib: Print qp_num in completion error log#2220

Open
zrss wants to merge 1 commit into
NVIDIA:masterfrom
zrss:qp-num-completion-log
Open

net/ib: Print qp_num in completion error log#2220
zrss wants to merge 1 commit into
NVIDIA:masterfrom
zrss:qp-num-completion-log

Conversation

@zrss

@zrss zrss commented Jun 7, 2026

Copy link
Copy Markdown

Description

This PR includes wc.qp_num in the detailed NET/IB completion error log.

NCCL already prints qp_num in the nearby CQE error log, but the detailed completion error line contains the peer, status, vendor error, local/remote GID, and HCA. Including qp_num in that same detailed line makes RoCE failures such as IBV_WC_RETRY_EXC_ERR(12) easier to correlate with earlier QP creation/connect logs.

Related Issues

Closes #2219

Changes & Impact

This is a logging-only change.

The detailed NET/IB completion error WARN now changes from:

NET/IB: Got completion from peer ... status=... opcode=... vendor_err=... localGid ... remoteGid ... hca ...

to:

NET/IB: Got completion from peer ... status=... opcode=... vendor_err=... qp_num=... localGid ... remoteGid ... hca ...

Performance Impact

No expected performance impact.

Signed-off-by: zrss <huangzhesi@gmail.com>
@xiaofanl-nvidia

Copy link
Copy Markdown
Collaborator

@stephenmsachs @thomasgillis please take a look and mirror if this is good.

@stephenmsachs

Copy link
Copy Markdown
Collaborator

Looks good to me.
/mirror

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFE]: Include qp_num in detailed RoCE completion error log

3 participants