Skip to content

fix(metrix): fix MCP list_available_metrics and shell quoting#115

Open
mawad-amd wants to merge 8 commits intomainfrom
muhaawad/fix-mcp-metrics-and-quoting
Open

fix(metrix): fix MCP list_available_metrics and shell quoting#115
mawad-amd wants to merge 8 commits intomainfrom
muhaawad/fix-mcp-metrics-and-quoting

Conversation

@mawad-amd
Copy link
Copy Markdown
Member

Summary

  • list_available_metrics returned hardcoded wrong metric names: The MCP tool returned memory.l2_cache_hit_rate, compute.cu_utilization, and compute.wave_occupancy — none of which exist in the actual METRIC_CATALOG. The real metric is memory.l2_hit_rate, and there are no cu_utilization or wave_occupancy metrics. Now dynamically queries METRIC_CATALOG and groups results by category.

  • Shell quoting broke profiling commands with quoted arguments: rocprof_wrapper.py used str.split() to tokenize the command string, which splits inside quotes. A command like python -c "import torch; print(torch.cuda.is_available())" would be split into ["python", "-c", "\"import", "torch;", ...], causing SyntaxError: unterminated string literal in the spawned Python process. Now uses shlex.split() which respects shell quoting semantics.

Context

Found during a benchmark comparing 4 methods of GPU profiling tool integration (Vanilla, MCP, Skill, Native IntelliKit). The MCP config hit both bugs: list_available_metrics told the LLM agent to use non-existent metrics (causing 2 failed profile_metrics calls), and profile_metrics broke when the agent passed python -c "..." commands (causing 2 more failures). The agent had to fall back to writing a wrapper script before profiling succeeded.

Test plan

  • list_available_metrics returns only metric names from METRIC_CATALOG (e.g. memory.l2_hit_rate, not memory.l2_cache_hit_rate)
  • profile_metrics with command="python -c 'import torch; print(1+1)'" correctly passes the quoted argument to rocprofv3
  • Existing profiling with simple commands (./app, python script.py) still works

🤖 Generated with Claude Code

1. list_available_metrics returned hardcoded metric names that don't
   exist in the catalog (memory.l2_cache_hit_rate, compute.cu_utilization,
   compute.wave_occupancy). Now queries METRIC_CATALOG dynamically and
   groups by category.

2. rocprof_wrapper used str.split() to tokenize the profiled command,
   which breaks quoted arguments like `python -c "import torch; ..."`.
   Now uses shlex.split() to preserve shell quoting semantics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 3, 2026 22:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes two MCP-facing issues in the Metrix profiler tooling: incorrect metric discovery from the MCP server and incorrect command tokenization when profiling commands include quoted arguments.

Changes:

  • Switch rocprofv3 command tokenization from str.split() to shlex.split() to preserve quoted arguments.
  • Update MCP list_available_metrics to build its output dynamically and group metrics by category.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
metrix/src/metrix/profiler/rocprof_wrapper.py Improves robustness of profiling command argument parsing by respecting shell-style quoting.
metrix/src/metrix/mcp/server.py Reworks metric listing to be dynamic and categorized for MCP clients.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# e.g. python -c "import torch; ..." won't be split inside quotes)
prof_cmd.append("--")
prof_cmd.extend(command.split())
prof_cmd.extend(shlex.split(command))
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shlex.split(command) can raise ValueError (e.g., unmatched quotes). Right now that exception will bubble up and likely crash callers; consider catching ValueError here and raising a RuntimeError with an actionable message that includes the original command string.

Suggested change
prof_cmd.extend(shlex.split(command))
try:
prof_cmd.extend(shlex.split(command))
except ValueError as exc:
raise RuntimeError(
f"Failed to parse command for rocprofv3: {command!r}. "
"Please check for unmatched quotes or other shell syntax issues."
) from exc

Copilot uses AI. Check for mistakes.
Comment on lines +169 to +172
# Add target command (use shlex.split to preserve quoted arguments,
# e.g. python -c "import torch; ..." won't be split inside quotes)
prof_cmd.append("--")
prof_cmd.extend(command.split())
prof_cmd.extend(shlex.split(command))
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change modifies how the target command is tokenized (from str.split() to shlex.split()), but there doesn’t appear to be a unit test asserting correct behavior for quoted/escaped arguments. Adding a test that captures the constructed subprocess.run argv for a command like python -c "print(1)" would prevent regressions.

Copilot uses AI. Check for mistakes.
Comment on lines +77 to +82
metrics = sorted(METRIC_CATALOG.keys())

# Group by category for better discoverability
by_category = {}
for name in metrics:
cat = METRIC_CATALOG[name]["category"].value
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_available_metrics() is still not listing the actually supported metrics for the current backend/architecture. It returns METRIC_CATALOG.keys(), but some metrics exist in backend YAML (e.g. compute.gpu_utilization in backends/counter_defs.yaml) and are therefore profileable while not present in METRIC_CATALOG, so they will be missing from this list. Consider sourcing metrics from Metrix().backend.get_available_metrics() (or Metrix().list_metrics()) and then grouping by category using METRIC_CATALOG when metadata exists, with a fallback category (e.g., prefix before .) for metrics not in the catalog.

Suggested change
metrics = sorted(METRIC_CATALOG.keys())
# Group by category for better discoverability
by_category = {}
for name in metrics:
cat = METRIC_CATALOG[name]["category"].value
profiler = Metrix()
if hasattr(profiler, "list_metrics"):
metrics = sorted(profiler.list_metrics())
elif hasattr(profiler, "backend") and hasattr(profiler.backend, "get_available_metrics"):
metrics = sorted(profiler.backend.get_available_metrics())
else:
metrics = sorted(METRIC_CATALOG.keys())
# Group by category for better discoverability
by_category = {}
for name in metrics:
metric_meta = METRIC_CATALOG.get(name, {})
category = metric_meta.get("category")
if category is not None:
cat = getattr(category, "value", category)
else:
cat = name.split(".", 1)[0] if "." in name else "other"

Copilot uses AI. Check for mistakes.
mawad-amd and others added 7 commits April 3, 2026 16:10
- test_mcp_server.py: 6 tests verifying list_available_metrics returns
  only metrics from METRIC_CATALOG, includes known good names, and
  rejects the old hardcoded wrong names (l2_cache_hit_rate, cu_utilization,
  wave_occupancy).

- test_rocprof_wrapper.py: 3 tests verifying shlex.split preserves
  quoted arguments in profiled commands (double quotes, single quotes,
  and simple unquoted commands).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wrap shlex.split() in try/except ValueError -> RuntimeError with
  actionable message including the original command string
- list_available_metrics now queries Metrix().list_metrics() for
  backend-aware metrics (includes YAML-defined metrics), with fallback
  to METRIC_CATALOG when no GPU is available
- Add test for unmatched quotes raising RuntimeError
- Skip GPU-dependent test when hipcc/ROCm is not available
- Use >= instead of == for catalog size assertion (backend may have
  YAML-only metrics)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The SKILL.md examples showed `metrix ./my_app` but the actual CLI
requires `metrix profile ./my_app`. This caused LLM agents to hit
a parse error on first attempt and waste 2 round-trips recovering
via `--help`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add gfx950 (MI350, CDNA 4) to the supported devices list
- Remove "Use --device <arch>" hint that pointed to a non-existent
  flag, which caused LLM agents to hallucinate the parameter

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add gfx1201 (RX 9070) and gfx1151 (RX 9060) which already have
backends implemented.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both share the same gfx950 backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- profile_metrics: detailed docstring describing args, return format,
  and shell quoting behavior. Default to collecting all available
  metrics when none are specified instead of just one.
- list_available_metrics: list categories in docstring, describe
  return structure.
- SKILL.md: mention both MCP tools in the execution path section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants