-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Problem
Operators using swamp have no visibility into what the CLI is doing internally during model method execution, workflow runs, or datastore operations. When something is slow or fails, there's no structured observability data to debug cause-and-effect chains. Users currently have to rely on log output and manual timing to understand performance or diagnose issues.
Proposed Solution
Add native OpenTelemetry instrumentation to swamp's core execution paths, emitting traces that capture the full lifecycle of operations. This would give operators the ability to:
- See cause-and-effect chains: A single trace spanning workflow → job → step → method execution → resource writes, showing how operations compose
- Debug performance issues: Identify slow vault resolutions, CEL evaluations, datastore syncs, lock acquisitions, or method executions with precise timing
- Correlate with extension traces: User-defined extension models (like
@bixu/github/repo) already emit their own OTel spans — native swamp tracing would let these appear as children of the CLI's orchestration spans, producing a complete end-to-end trace
Key instrumentation points
- CLI command dispatch (root span per invocation)
- Repository initialization (datastore sync, lock acquisition)
- Model method execution (CEL evaluation, argument resolution, vault lookups, method
execute()call) - Workflow orchestration (workflow run → job → step, with data chaining resolution)
- Data lifecycle (resource writes, garbage collection)
Transport and configuration
- Default to OTLP/HTTP (
/v1/tracesendpoint) for maximum compatibility in heterogeneous network environments (proxies, load balancers, firewalls that may not support gRPC) - Configuration via environment variables following OTel conventions:
OTEL_EXPORTER_OTLP_ENDPOINT— collector endpointOTEL_EXPORTER_OTLP_HEADERS— auth headers (e.g.x-honeycomb-team=<key>)OTEL_SERVICE_NAME— defaults toswampOTEL_TRACES_EXPORTER— defaults tootlp(set tononeto disable)
- Tracing should be off by default (zero overhead when not configured) and activate when
OTEL_EXPORTER_OTLP_ENDPOINTis set
Signal priority
If only one OTel signal can be enabled in the first iteration, it should be traces. Traces provide the most immediate value for understanding swamp's execution model, which is inherently hierarchical (workflow → job → step → method → API call). Metrics and logs can follow later.
Alternatives Considered
- Structured logging only: Provides some observability but lacks the hierarchical parent-child relationships that make traces valuable for understanding swamp's execution model
- Extension-only tracing (current state): Extensions like
@bixu/github/repocan emit their own spans, but without native swamp instrumentation these spans are orphaned — there's no parent context from the CLI's orchestration layer to connect them into a complete trace
Additional Context
We've built a @bixu/opentelemetry extension model and added tracing to @bixu/github/repo as a proof-of-concept. The extension-level tracing works well but highlights the gap: we can see the GitHub API calls but not the swamp orchestration around them (vault resolution, CEL evaluation, workflow scheduling). Native instrumentation would close that gap.