Skip to content

Migrate Kerncap to rocprof-sdk#117

Open
coleramos425 wants to merge 7 commits intomainfrom
colramos/kerncap-rocprofsdk-registration
Open

Migrate Kerncap to rocprof-sdk#117
coleramos425 wants to merge 7 commits intomainfrom
colramos/kerncap-rocprofsdk-registration

Conversation

@coleramos425
Copy link
Copy Markdown
Collaborator

@coleramos425 coleramos425 commented Apr 9, 2026

ROCm 7.x's rocprofiler-register intercepts tool loading before HSA_TOOLS_LIB is checked, breaking kerncap in modern ROCm builds (i.e. the latest vLLM containers). The fix is to export rocprofiler_configure() instead of relying on OnLoad(), matching Omniprobe's migration (commit 0806c60).

Validation of this change also revealed an oversight in the generation of the Triton reproducer where some autotuner configs weren't being picked up, leading to some missing data when attempting to run the reproducer.py. This PR also fixes that so that args to the autotuner can properly populate that

Signed-off-by: coleramos425 <colramos@amd.com>
…hesize missing config

The standalone extractor copied @triton.autotune decorators verbatim,
  but never extracted the module-level variables they reference (e.g.
  autotune_configs), causing NameError on import. Strip autotune
  decorators and keep only @triton.jit so the reproducer can call the
  bare JITFunction directly with pinned config kwargs.
  Also synthesize autotune_config from per-arg is_autotune_config flags
  when the top-level key is missing (warm autotuner cache), and pass
  autotune_stripped to the template so it skips the .fn unwrap on
  already-bare JITFunctions. Register a pyrepr Jinja2 filter (repr())
  so config values render as valid Python literals.

Signed-off-by: coleramos425 <colramos@amd.com>
Replace tojson (warmup) and bare rendering (timed run) with pyrepr
  for autotune config kwargs. tojson rendered Python booleans as JSON
  false/true, crashing the reproducer with NameError. Bare rendering
  left strings unquoted. pyrepr (Python repr()) handles all types
  correctly. Also skip .fn unwrap when autotune was already stripped
  in the standalone extraction path.

Signed-off-by: coleramos425 <colramos@amd.com>
…esis

Add fixture with multi-line @triton.autotune decorator and tests for:
  - autotune decorator stripping in standalone extraction
  - non-autotuned kernel passes through unchanged (regression guard)
  - autotune_config synthesized from per-arg is_autotune_config flags
  - non-standalone path still uses .fn bypass
  - explicit autotune_config in metadata takes precedence

Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
@coleramos425 coleramos425 changed the title Migrate Kerncap to rocprof Migrate Kerncap to rocprof-sdk Apr 9, 2026
@coleramos425 coleramos425 marked this pull request as ready for review April 9, 2026 16:54
Copilot AI review requested due to automatic review settings April 9, 2026 16:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates kerncap’s HIP capture tool-loading mechanism to rocprofiler-sdk registration (to work with ROCm 7.x tool-loading behavior) and improves Triton reproducer generation so autotuner configs are reliably pinned.

Changes:

  • Switch HIP interception from HSA_TOOLS_LIB-based OnLoad() to exporting rocprofiler_configure() and driving loading via LD_PRELOAD.
  • Improve Triton reproducer generation by stripping @triton.autotune in standalone extraction and synthesizing pinned autotune kwargs from captured args when needed.
  • Update docs and add/extend unit tests to cover the new capture env behavior and Triton autotune reproducer paths.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
kerncap/src/kerncap.hip Adds rocprofiler-sdk registration entrypoint and deprecates OnLoad() HSA_TOOLS_LIB path.
kerncap/src/CMakeLists.txt Links against rocprofiler-sdk.
kerncap/kerncap/capturer.py Switches capture injection from HSA_TOOLS_LIB to LD_PRELOAD (with prepend behavior).
kerncap/kerncap/reproducer.py Adds pyrepr Jinja filter, strips autotune decorators for standalone Triton modules, synthesizes autotune config from args.
kerncap/kerncap/templates/triton_reproducer.py.j2 Avoids .fn bypass for stripped kernels; uses pyrepr for config kwarg rendering.
kerncap/tests/unit/test_capturer.py New tests validating LD_PRELOAD env setup and triton delegation.
kerncap/tests/unit/test_reproducer.py Adds tests for autotune stripping and autotune-config synthesis/precedence.
kerncap/tests/unit/fixtures/sample_autotuned_kernel.py Fixture for validating autotune decorator stripping.
kerncap/src/kerncap_log.hpp Updates injection mechanism wording in comments.
kerncap/README.md Documents LD_PRELOAD usage and ROCm 7.0+ dependency (rocprofiler-sdk).
kerncap/CLAUDE.md Updates internal docs to reflect LD_PRELOAD + rocprofiler-sdk registration.
docs/src/content/docs/tools/kerncap.mdx Updates user docs from HSA_TOOLS_LIB to LD_PRELOAD + ROCm 7.0+.
docs/src/content/docs/getting-started/installation.mdx Notes kerncap requires ROCm 7.0+.
AGENTS.md Updates repo-level agent guidance about kerncap ROCm 7.0+ workflow targeting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coleramos425 and others added 2 commits April 9, 2026 13:13
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: coleramos425 <colramos@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants