Open
Conversation
Signed-off-by: coleramos425 <colramos@amd.com>
…hesize missing config The standalone extractor copied @triton.autotune decorators verbatim, but never extracted the module-level variables they reference (e.g. autotune_configs), causing NameError on import. Strip autotune decorators and keep only @triton.jit so the reproducer can call the bare JITFunction directly with pinned config kwargs. Also synthesize autotune_config from per-arg is_autotune_config flags when the top-level key is missing (warm autotuner cache), and pass autotune_stripped to the template so it skips the .fn unwrap on already-bare JITFunctions. Register a pyrepr Jinja2 filter (repr()) so config values render as valid Python literals. Signed-off-by: coleramos425 <colramos@amd.com>
Replace tojson (warmup) and bare rendering (timed run) with pyrepr for autotune config kwargs. tojson rendered Python booleans as JSON false/true, crashing the reproducer with NameError. Bare rendering left strings unquoted. pyrepr (Python repr()) handles all types correctly. Also skip .fn unwrap when autotune was already stripped in the standalone extraction path. Signed-off-by: coleramos425 <colramos@amd.com>
…esis Add fixture with multi-line @triton.autotune decorator and tests for: - autotune decorator stripping in standalone extraction - non-autotuned kernel passes through unchanged (regression guard) - autotune_config synthesized from per-arg is_autotune_config flags - non-standalone path still uses .fn bypass - explicit autotune_config in metadata takes precedence Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR migrates kerncap’s HIP capture tool-loading mechanism to rocprofiler-sdk registration (to work with ROCm 7.x tool-loading behavior) and improves Triton reproducer generation so autotuner configs are reliably pinned.
Changes:
- Switch HIP interception from
HSA_TOOLS_LIB-basedOnLoad()to exportingrocprofiler_configure()and driving loading viaLD_PRELOAD. - Improve Triton reproducer generation by stripping
@triton.autotunein standalone extraction and synthesizing pinned autotune kwargs from captured args when needed. - Update docs and add/extend unit tests to cover the new capture env behavior and Triton autotune reproducer paths.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
kerncap/src/kerncap.hip |
Adds rocprofiler-sdk registration entrypoint and deprecates OnLoad() HSA_TOOLS_LIB path. |
kerncap/src/CMakeLists.txt |
Links against rocprofiler-sdk. |
kerncap/kerncap/capturer.py |
Switches capture injection from HSA_TOOLS_LIB to LD_PRELOAD (with prepend behavior). |
kerncap/kerncap/reproducer.py |
Adds pyrepr Jinja filter, strips autotune decorators for standalone Triton modules, synthesizes autotune config from args. |
kerncap/kerncap/templates/triton_reproducer.py.j2 |
Avoids .fn bypass for stripped kernels; uses pyrepr for config kwarg rendering. |
kerncap/tests/unit/test_capturer.py |
New tests validating LD_PRELOAD env setup and triton delegation. |
kerncap/tests/unit/test_reproducer.py |
Adds tests for autotune stripping and autotune-config synthesis/precedence. |
kerncap/tests/unit/fixtures/sample_autotuned_kernel.py |
Fixture for validating autotune decorator stripping. |
kerncap/src/kerncap_log.hpp |
Updates injection mechanism wording in comments. |
kerncap/README.md |
Documents LD_PRELOAD usage and ROCm 7.0+ dependency (rocprofiler-sdk). |
kerncap/CLAUDE.md |
Updates internal docs to reflect LD_PRELOAD + rocprofiler-sdk registration. |
docs/src/content/docs/tools/kerncap.mdx |
Updates user docs from HSA_TOOLS_LIB to LD_PRELOAD + ROCm 7.0+. |
docs/src/content/docs/getting-started/installation.mdx |
Notes kerncap requires ROCm 7.0+. |
AGENTS.md |
Updates repo-level agent guidance about kerncap ROCm 7.0+ workflow targeting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: coleramos425 <colramos@amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ROCm 7.x's
rocprofiler-registerintercepts tool loading beforeHSA_TOOLS_LIBis checked, breaking kerncap in modern ROCm builds (i.e. the latest vLLM containers). The fix is to exportrocprofiler_configure()instead of relying onOnLoad(), matching Omniprobe's migration (commit 0806c60).Validation of this change also revealed an oversight in the generation of the Triton reproducer where some autotuner configs weren't being picked up, leading to some missing data when attempting to run the
reproducer.py. This PR also fixes that so that args to the autotuner can properly populate that