Fix NchwcTransformer null deref on node with unresolved schema#29194
Fix NchwcTransformer null deref on node with unresolved schema#29194tairenpiao wants to merge 2 commits into
Conversation
7ef2476 to
4b2b5a3
Compare
There was a problem hiding this comment.
Pull request overview
Fixes a crash during InferenceSession::Initialize() on x86_64 when NchwcTransformer processes a graph containing nodes whose op schema has not been resolved yet (node.Op() == nullptr). The change hardens a shared graph utility predicate used by transformers to avoid a null dereference, and adds a regression test that loads a minimal repro model.
Changes:
- Add a null-check for
node.Op()ingraph_utils::IsSupportedOptypeVersionAndDomainbefore consultingOpSchema::Deprecated(). - Add a regression test that loads and initializes a minimal
Transpose -> Cast(fp16->fp32) -> MatMulmodel at Level3 optimizations.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| onnxruntime/test/optimizer/nchwc_optimizer_test.cc | Adds a regression smoke test that loads the repro model and asserts session initialization succeeds. |
| onnxruntime/core/graph/graph_utils.cc | Guards node.Op()->Deprecated() with node.Op() != nullptr to prevent a null dereference when schemas are unresolved. |
| return (node.OpType() == op_type && | ||
| // we don't have op schemas in the minimal build so there's no way to check the deprecated flag | ||
| #if !defined(ORT_MINIMAL_BUILD) | ||
| !node.Op()->Deprecated() && | ||
| // node.Op() can be null for a node whose schema is not yet resolved. | ||
| node.Op() != nullptr && !node.Op()->Deprecated() && | ||
| #endif | ||
| MatchesOpSinceVersion(node, versions) && MatchesOpSetDomain(node, domain)); |
There was a problem hiding this comment.
Pulled node.Op() into a local op_schema and reused it. Thanks!
4b2b5a3 to
ce4447a
Compare
|
Hi @tianleiwu. Could you please take a look? |
tianleiwu
left a comment
There was a problem hiding this comment.
Review
The fix is correct and well-targeted. Guarding node.Op() against null in IsSupportedOptypeVersionAndDomain and treating an unresolved-schema node as "not supported" (return false) is the safe, conservative behavior, and it matches existing patterns in the codebase (e.g. insert_cast_transformer.cc checks if (!schema)). The focused regression test plus model data is a nice touch. LGTM.
Suggestion (optional, out of this PR's diff)
There is an identical unguarded pattern in a sibling optimizer at onnxruntime/core/optimizer/qdq_transformer/avx2_weight_s8_to_u8.cc:185:
#if !defined(ORT_MINIMAL_BUILD)
!op_node.Op()->Deprecated() &&
#endifThis runs over arbitrary graph nodes (QGemm/QAttention/etc. in s8_overflow_ops) and would crash the same way on a node whose schema is unresolved (op_node.Op() == nullptr). Since this PR is specifically hardening the Op()->Deprecated() pattern against null schemas, it may be worth applying the same guard there too (here or in a quick follow-up) so the two paths stay consistent. Not blocking.
tianleiwu
left a comment
There was a problem hiding this comment.
Review
The fix is correct and complete. Guarding node.Op() against null in IsSupportedOptypeVersionAndDomain and treating an unresolved-schema node as "not supported" (returning false) is the safe, conservative behavior and matches existing patterns in the codebase. The same hardening has now been applied to the sibling avx2_weight_s8_to_u8.cc path, so both Op()->Deprecated() dereferences are protected and the two paths stay consistent — that resolves the follow-up I raised on the previous commit. Caching node.Op() into a local op_schema also addresses the earlier readability note. The focused regression test plus model data is a nice touch.
LGTM.
One optional nitpick inline regarding the test's build-config placement; non-blocking.
| #ifndef DISABLE_CONTRIB_OPS | ||
|
|
||
| // Regression test for #28392: NchwcTransformer must not crash when a node's schema is unresolved (node.Op() == nullptr). | ||
| TEST(NchwcOptimizerTests, MatMulCastDoesNotCrashOnUnresolvedSchema) { |
There was a problem hiding this comment.
Nitpick (non-blocking): this regression model is a plain Transpose -> Cast -> MatMul chain of standard ONNX ops, so the test does not actually depend on contrib ops. Placing it inside the #ifndef DISABLE_CONTRIB_OPS block means the regression is not guarded in contrib-ops-disabled builds. Since the crash path is in core graph utilities, consider hoisting this test outside the contrib-ops guard so it runs in more configurations. Not blocking.
|
@tianleiwu Seems the CI has download issue. |
Problem
On x86_64 with
ORT_ENABLE_ALL,InferenceSessioninitialization crashes (SIGSEGV) insideNchwcTransformerfor a model containing aTranspose → Cast(fp16→fp32) → MatMulchain on a 3D dynamic shape with a singleton dim. Stack bottoms out ingraph_utils::IsSupportedOptypeVersionAndDomain.Root cause
graph_utils::IsSupportedOptypeVersionAndDomaindereferencesnode.Op()->Deprecated()unconditionally.node.Op()can benullptrwhen a node's schema is not resolved (e.g. a node added by an earlier transformer before the graph is re-resolved), so this is a null dereference.Fix
Guard with
node.Op() != nullptrbefore the deref — matching the existing pattern atgraph.cc:3547. A node whose schema is unresolved cannot match a specific op, so returningfalseis correct. Adds a regression test that loads the reporter's minimal model atORT_ENABLE_ALL.Scope
The crash is x86_64-only (NCHWc enabled,
MlasNchwcGetBlockSize() > 1). On other platforms the guard is a no-op and the test is a harmless smoke test.Fixes #28392