fix(metrix): fix MCP list_available_metrics and shell quoting by mawad-amd · Pull Request #115 · AMDResearch/intellikit

mawad-amd · 2026-04-03T22:57:15Z

Summary

list_available_metrics returned hardcoded wrong metric names: The MCP tool returned memory.l2_cache_hit_rate, compute.cu_utilization, and compute.wave_occupancy — none of which exist in the actual METRIC_CATALOG. The real metric is memory.l2_hit_rate, and there are no cu_utilization or wave_occupancy metrics. Now dynamically queries METRIC_CATALOG and groups results by category.
Shell quoting broke profiling commands with quoted arguments: rocprof_wrapper.py used str.split() to tokenize the command string, which splits inside quotes. A command like python -c "import torch; print(torch.cuda.is_available())" would be split into ["python", "-c", "\"import", "torch;", ...], causing SyntaxError: unterminated string literal in the spawned Python process. Now uses shlex.split() which respects shell quoting semantics.

Context

Found during a benchmark comparing 4 methods of GPU profiling tool integration (Vanilla, MCP, Skill, Native IntelliKit). The MCP config hit both bugs: list_available_metrics told the LLM agent to use non-existent metrics (causing 2 failed profile_metrics calls), and profile_metrics broke when the agent passed python -c "..." commands (causing 2 more failures). The agent had to fall back to writing a wrapper script before profiling succeeded.

Test plan

list_available_metrics returns only metric names from METRIC_CATALOG (e.g. memory.l2_hit_rate, not memory.l2_cache_hit_rate)
profile_metrics with command="python -c 'import torch; print(1+1)'" correctly passes the quoted argument to rocprofv3
Existing profiling with simple commands (./app, python script.py) still works

🤖 Generated with Claude Code

1. list_available_metrics returned hardcoded metric names that don't exist in the catalog (memory.l2_cache_hit_rate, compute.cu_utilization, compute.wave_occupancy). Now queries METRIC_CATALOG dynamically and groups by category. 2. rocprof_wrapper used str.split() to tokenize the profiled command, which breaks quoted arguments like `python -c "import torch; ..."`. Now uses shlex.split() to preserve shell quoting semantics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR fixes two MCP-facing issues in the Metrix profiler tooling: incorrect metric discovery from the MCP server and incorrect command tokenization when profiling commands include quoted arguments.

Changes:

Switch rocprofv3 command tokenization from str.split() to shlex.split() to preserve quoted arguments.
Update MCP list_available_metrics to build its output dynamically and group metrics by category.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`metrix/src/metrix/profiler/rocprof_wrapper.py`	Improves robustness of profiling command argument parsing by respecting shell-style quoting.
`metrix/src/metrix/mcp/server.py`	Reworks metric listing to be dynamic and categorized for MCP clients.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-03T23:03:20Z

metrix/src/metrix/profiler/rocprof_wrapper.py

+            # e.g. python -c "import torch; ..." won't be split inside quotes)
            prof_cmd.append("--")
-            prof_cmd.extend(command.split())
+            prof_cmd.extend(shlex.split(command))


shlex.split(command) can raise ValueError (e.g., unmatched quotes). Right now that exception will bubble up and likely crash callers; consider catching ValueError here and raising a RuntimeError with an actionable message that includes the original command string.

Suggested change

prof_cmd.extend(shlex.split(command))

try:

prof_cmd.extend(shlex.split(command))

except ValueError as exc:

raise RuntimeError(

f"Failed to parse command for rocprofv3: {command!r}. "

"Please check for unmatched quotes or other shell syntax issues."

) from exc

Copilot · 2026-04-03T23:03:21Z

metrix/src/metrix/profiler/rocprof_wrapper.py

+            # Add target command (use shlex.split to preserve quoted arguments,
+            # e.g. python -c "import torch; ..." won't be split inside quotes)
            prof_cmd.append("--")
-            prof_cmd.extend(command.split())
+            prof_cmd.extend(shlex.split(command))


This change modifies how the target command is tokenized (from str.split() to shlex.split()), but there doesn’t appear to be a unit test asserting correct behavior for quoted/escaped arguments. Adding a test that captures the constructed subprocess.run argv for a command like python -c "print(1)" would prevent regressions.

Copilot · 2026-04-03T23:03:21Z

metrix/src/metrix/mcp/server.py

+    metrics = sorted(METRIC_CATALOG.keys())
+
+    # Group by category for better discoverability
+    by_category = {}
+    for name in metrics:
+        cat = METRIC_CATALOG[name]["category"].value


list_available_metrics() is still not listing the actually supported metrics for the current backend/architecture. It returns METRIC_CATALOG.keys(), but some metrics exist in backend YAML (e.g. compute.gpu_utilization in backends/counter_defs.yaml) and are therefore profileable while not present in METRIC_CATALOG, so they will be missing from this list. Consider sourcing metrics from Metrix().backend.get_available_metrics() (or Metrix().list_metrics()) and then grouping by category using METRIC_CATALOG when metadata exists, with a fallback category (e.g., prefix before .) for metrics not in the catalog.

Suggested change

metrics = sorted(METRIC_CATALOG.keys())

# Group by category for better discoverability

by_category = {}

for name in metrics:

cat = METRIC_CATALOG[name]["category"].value

profiler = Metrix()

if hasattr(profiler, "list_metrics"):

metrics = sorted(profiler.list_metrics())

elif hasattr(profiler, "backend") and hasattr(profiler.backend, "get_available_metrics"):

metrics = sorted(profiler.backend.get_available_metrics())

else:

metrics = sorted(METRIC_CATALOG.keys())

# Group by category for better discoverability

by_category = {}

for name in metrics:

metric_meta = METRIC_CATALOG.get(name, {})

category = metric_meta.get("category")

if category is not None:

cat = getattr(category, "value", category)

else:

cat = name.split(".", 1)[0] if "." in name else "other"

- test_mcp_server.py: 6 tests verifying list_available_metrics returns only metrics from METRIC_CATALOG, includes known good names, and rejects the old hardcoded wrong names (l2_cache_hit_rate, cu_utilization, wave_occupancy). - test_rocprof_wrapper.py: 3 tests verifying shlex.split preserves quoted arguments in profiled commands (double quotes, single quotes, and simple unquoted commands). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Wrap shlex.split() in try/except ValueError -> RuntimeError with actionable message including the original command string - list_available_metrics now queries Metrix().list_metrics() for backend-aware metrics (includes YAML-defined metrics), with fallback to METRIC_CATALOG when no GPU is available - Add test for unmatched quotes raising RuntimeError - Skip GPU-dependent test when hipcc/ROCm is not available - Use >= instead of == for catalog size assertion (backend may have YAML-only metrics) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The SKILL.md examples showed `metrix ./my_app` but the actual CLI requires `metrix profile ./my_app`. This caused LLM agents to hit a parse error on first attempt and waste 2 round-trips recovering via `--help`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add gfx950 (MI350, CDNA 4) to the supported devices list - Remove "Use --device <arch>" hint that pointed to a non-existent flag, which caused LLM agents to hallucinate the parameter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add gfx1201 (RX 9070) and gfx1151 (RX 9060) which already have backends implemented. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Both share the same gfx950 backend. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- profile_metrics: detailed docstring describing args, return format, and shell quoting behavior. Default to collecting all available metrics when none are specified instead of just one. - list_available_metrics: list categories in docstring, describe return structure. - SKILL.md: mention both MCP tools in the execution path section. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 3, 2026 22:57

Copilot started reviewing on behalf of mawad-amd April 3, 2026 22:59 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

mawad-amd and others added 7 commits April 3, 2026 16:10

fix(cli): add RDNA 4 devices to supported list

27b2960

Add gfx1201 (RX 9070) and gfx1151 (RX 9060) which already have backends implemented. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(cli): show both MI350X and MI355X for gfx950

a20779c

Both share the same gfx950 backend. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(metrix): fix MCP list_available_metrics and shell quoting#115

fix(metrix): fix MCP list_available_metrics and shell quoting#115
mawad-amd wants to merge 8 commits intomainfrom
muhaawad/fix-mcp-metrics-and-quoting

mawad-amd commented Apr 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-            prof_cmd.extend(shlex.split(command))
+            try:
+                prof_cmd.extend(shlex.split(command))
+            except ValueError as exc:
+                raise RuntimeError(
+                    f"Failed to parse command for rocprofv3: {command!r}. "
+                    "Please check for unmatched quotes or other shell syntax issues."
+                ) from exc

-    metrics = sorted(METRIC_CATALOG.keys())
-    # Group by category for better discoverability
-    by_category = {}
-    for name in metrics:
-        cat = METRIC_CATALOG[name]["category"].value
+    profiler = Metrix()
+    if hasattr(profiler, "list_metrics"):
+        metrics = sorted(profiler.list_metrics())
+    elif hasattr(profiler, "backend") and hasattr(profiler.backend, "get_available_metrics"):
+        metrics = sorted(profiler.backend.get_available_metrics())
+    else:
+        metrics = sorted(METRIC_CATALOG.keys())
+    # Group by category for better discoverability
+    by_category = {}
+    for name in metrics:
+        metric_meta = METRIC_CATALOG.get(name, {})
+        category = metric_meta.get("category")
+        if category is not None:
+            cat = getattr(category, "value", category)
+        else:
+            cat = name.split(".", 1)[0] if "." in name else "other"

Conversation

mawad-amd commented Apr 3, 2026

Summary

Context

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants