Skip to content

DRAFT: Align telemetry middleware with MCP OTEL semantic conventions#3683

Closed
ChrisJBurns wants to merge 5 commits intomainfrom
otel/standard-metrics-attributes
Closed

DRAFT: Align telemetry middleware with MCP OTEL semantic conventions#3683
ChrisJBurns wants to merge 5 commits intomainfrom
otel/standard-metrics-attributes

Conversation

@ChrisJBurns
Copy link
Collaborator

@ChrisJBurns ChrisJBurns commented Feb 6, 2026

Summary

  • Rename span attributes to match MCP OTEL spec (mcp.method.name, jsonrpc.request.id, gen_ai.tool.name, gen_ai.tool.call.arguments, gen_ai.prompt.name, network.transport)
  • Update HTTP attributes to stable OTEL semantic conventions (http.request.method, url.full, http.response.status_code, etc.)
  • Add standard metrics: mcp.server.operation.duration and mcp.server.session.duration histograms with spec-defined bucket boundaries
  • Emit mcp.server.session.duration on HTTP DELETE session termination (per MCP streamable-http spec)
  • Source session ID from Mcp-Session-Id HTTP header instead of JSON-RPC _meta
  • Derive network.protocol.version from actual HTTP request instead of hardcoding per transport type
  • Add session tracking with TTL-based cleanup to prevent memory leaks
  • Update span naming to {method} {target} format (e.g., tools/call get_weather)
  • Add error.type, client.address/client.port, gen_ai.operation.name attributes
  • Map transport values to standard network.transport/network.protocol.name
  • Propagate metric creation errors with warning log and no-op fallback
  • Initialize tool call counter once at startup instead of per-request
  • Unexport mcpOperationDurationBuckets (no external consumers)
  • Remove non-standard http.duration_ms attribute (span timestamps capture duration)
  • Add telemetry migration guide (docs/telemetry-migration.md) with attribute rename tables, PromQL examples, and migration checklist
  • Remove duplicate mcp.resource.uri attribute for resources/read
  • Replace hand-rolled contains() with slices.Contains() in tests
  • Fix t.Parallel() on test that mutates env vars (//nolint:paralleltest,tparallel)

Addresses #3399.

PR Stack (2/3): #3682 (propagation) → This PR → #3684 (client spans)

Test plan

  • Updated span name expectations in all tests
  • Updated attribute name expectations (legacy → stable OTEL conventions)
  • Added tests for standard mcp.server.operation.duration metric with correct attributes
  • Added tests for session tracking and RecordSessionEnd
  • Added tests for session end on HTTP DELETE (success and error cases)
  • Added tests for mapTransport, parseRemoteAddr, httpProtocolVersion
  • Updated integration test HTTP attribute assertions to stable names
  • go build ./... passes
  • go test ./pkg/telemetry/... passes
  • task lint-fix passes with 0 issues

Screenshots

Traces in tempo/grafana

image

Metrics in Prometheus/Grafana

image

🤖 Generated with Claude Code

ChrisJBurns and others added 2 commits February 6, 2026 20:12
Add MetaCarrier (TextMapCarrier for MCP _meta fields) and
InjectMetaTraceContext for injecting traceparent/tracestate
into outgoing MCP requests, per the MCP OTEL specification.

This enables distributed tracing across vMCP → backend boundaries
using the standard W3C Trace Context format propagated through
MCP params._meta.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename attributes to match the MCP OpenTelemetry specification:
- mcp.method → mcp.method.name
- mcp.request.id → jsonrpc.request.id
- mcp.tool.name → gen_ai.tool.name
- mcp.tool.arguments → gen_ai.tool.call.arguments
- mcp.prompt.name → gen_ai.prompt.name
- mcp.transport → network.transport (with standard value mapping)

Add standard metrics: mcp.server.operation.duration and
mcp.server.session.duration histograms with spec-defined bucket
boundaries. Add session tracking with TTL-based cleanup.

Update span naming to "{method} {target}" format, add error.type
attribute, client.address/port, and gen_ai.operation.name.

Addresses #3399.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Feb 6, 2026

Codecov Report

❌ Patch coverage is 83.14607% with 45 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.30%. Comparing base (0761277) to head (65d3bd9).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
pkg/telemetry/middleware.go 83.14% 36 Missing and 9 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3683      +/-   ##
==========================================
+ Coverage   66.26%   66.30%   +0.03%     
==========================================
  Files         427      427              
  Lines       41765    41923     +158     
==========================================
+ Hits        27676    27795     +119     
- Misses      11977    12010      +33     
- Partials     2112     2118       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ChrisJBurns ChrisJBurns changed the base branch from otel/propagation-foundation to main February 9, 2026 18:38
Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Feb 9, 2026
ChrisJBurns and others added 2 commits February 9, 2026 19:39
- Source session ID from Mcp-Session-Id HTTP header instead of _meta
- Derive protocol version from actual HTTP request, not transport type
- Update HTTP attributes to stable OTEL semantic conventions
- Emit mcp.server.session.duration on HTTP DELETE session termination
- Unexport MCPOperationDurationBuckets (no external consumers)
- Propagate metric creation errors with no-op fallback
- Initialize tool call counter once at startup instead of per-request
- Add telemetry migration guide for renamed/new/removed attributes
- Remove old-to-new attribute name comments from code

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Per OTEL HTTP semantic conventions for server spans, 4xx client
errors should leave span status unset rather than setting it to
Error. Only 5xx server errors should set codes.Error and the
error.type attribute.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/L Large PR: 600-999 lines changed labels Feb 9, 2026
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@ChrisJBurns ChrisJBurns changed the title Align telemetry middleware with MCP OTEL semantic conventions DRAFT: Align telemetry middleware with MCP OTEL semantic conventions Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant