Skip to content

feat: add Grafana + mcp-grafana to compose stack#39

Merged
tallpsmith merged 21 commits into
mainfrom
012-grafana-compose
Mar 10, 2026
Merged

feat: add Grafana + mcp-grafana to compose stack#39
tallpsmith merged 21 commits into
mainfrom
012-grafana-compose

Conversation

@tallpsmith
Copy link
Copy Markdown
Owner

Summary

  • Adds Grafana (port 3000) with auto-provisioned PCP Valkey + Vector datasources and the unsigned PCP plugin installed via ZIP URL
  • Adds mcp-grafana (port 8000/SSE) wired to Grafana with basic auth for AI agent access to dashboards
  • CI parity maintained — Grafana health wait step + GRAFANA_URL env var added to E2E job
  • 4 new E2E tests: Grafana health, datasource provisioning, Valkey connectivity, mcp-grafana SSE
  • Fixes pre-existing pmmcp healthcheck bug (podman splits CMD array args on semicolons → CMD-SHELL)
  • Makes E2E mandatory in pre-commit (removes PMPROXY_URL guard), adds Grafana env vars to Justfile
  • Documents VM-aware pre-push workflow in CLAUDE.md (Claude can't run podman in a VM)

Docs impact

  • README.md: Updated service list (4→6), added Grafana quickstart section, renumbered steps
  • CLAUDE.md: Added Grafana compose gotchas, podman CMD-SHELL gotcha, VM pre-push guidance
  • docker-compose.yml: Inline comments on all new services explaining env vars and auth decisions

Test plan

  • just ci passes (lint + format + 497 unit/integration tests, 90% coverage)
  • podman compose up -d --wait — all 9 containers healthy in 22s
  • 4 Grafana E2E tests pass locally (health, datasources, Valkey connectivity, mcp-grafana SSE)
  • Full E2E suite passes locally (25/27 — 2 pre-existing timeseries timing failures unrelated to this PR)
  • CI fully green — unit-integration (22s) + E2E (1m24s) including Grafana tests

Provision Valkey (historical/timeseries) and Vector (live) datasources
pointing at pmproxy for auto-configuration on compose up.
Tests verify Grafana health, PCP datasource provisioning, Valkey
datasource connectivity, and mcp-grafana SSE endpoint — all fail
until compose services are wired up.
Grafana gets PCP plugin (unsigned ZIP install), auto-provisioned Valkey
and Vector datasources, anonymous admin for browser, basic auth for
mcp-grafana API access. mcp-grafana runs SSE on port 8000.
Maintains compose/CI parity — CI now polls Grafana health before
running E2E tests, same pattern as the existing pmproxy wait.
README gets updated service list, Grafana quickstart section, and
renumbered steps. CLAUDE.md gets Grafana-specific compose gotchas
(unsigned plugin, auth, provisioning).
Spec, plan, research, data model, contracts, quickstart, and
checklists for the Grafana compose integration feature.
Podman splits CMD array arguments on semicolons, breaking Python
one-liners. CMD-SHELL lets the shell handle quoting. Also switched
to TCP liveness (socket connect) instead of HTTP probe to avoid
503s when pmproxy isn't reachable.
Podman splits CMD array arguments on semicolons, breaking Python
one-liners. Documented the workaround (use CMD-SHELL) to save
future us from the same debugging session.
All 17 tasks verified — compose stack healthy, E2E tests pass,
quickstart validated. Two pre-existing failures in test_tools.py
(timeseries/compare_windows empty data) are unrelated.
E2E is no longer optional — pre-commit always runs it. Justfile e2e
recipe now sets GRAFANA_URL and MCP_GRAFANA_URL for the new tests.
Claude in a VM can't run podman/docker — document the fallback
(just ci) and prompt pattern to get the user to run the full
E2E suite on their host.
Captures the design for guiding Claude toward the coordinator prompt
instead of bypassing the investigation hierarchy and calling raw tools.
- Correct config class: ServerConfig not PmproxyConfig (PMMCP_ prefix)
- Fix test_config.py: create, not append (file doesn't exist)
- Add README.md to Task 6 for env var documentation
- Note spec deviation rationale for config class choice
Per issue #10 — configurable Grafana folder (default pmmcp-triage)
and HTML fallback report directory (default ~/.pmmcp/reports).
Tool descriptions are the last thing Claude reads before deciding what
to call. These one-liners nudge toward coordinate_investigation for
broad investigations instead of bypassing the prompt hierarchy.
Moves coordinator reference from afterthought to IMPORTANT block at top.
Adds Grafana datasource discovery workflow with fallback cascade.
Per issue #10.
Tells Claude this prompt is typically dispatched by the coordinator,
nudging toward coordinate_investigation for broad investigations.
After synthesis, coordinator now instructs Claude to create a Grafana
dashboard (pmmcp-triage folder, YYYY-MM-DD naming, pmmcp-generated tag)
with fallback cascade to HTML or text. Per issue #10.
CLAUDE.md: dashboard conventions table and prompt hierarchy guidance.
README.md: PMMCP_GRAFANA_FOLDER and PMMCP_REPORT_DIR env vars.
@tallpsmith tallpsmith merged commit 22a7fe6 into main Mar 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant