feat(langchain): add OpenTelemetry GenAI semantic conventions#3673
feat(langchain): add OpenTelemetry GenAI semantic conventions#3673jemo21k wants to merge 8 commits intotraceloop:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds LangGraph and LangChain instrumentation: graph invocation, command routing, agent factory and middleware hook wrappers; GenAI semantic attributes including graph_structure; a graph JSON extraction utility; and test updates aligning span names and validating new attributes. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Pregel as LangGraph (Pregel)
participant Command
participant Middleware as AgentMiddleware
participant Tool
participant Tracer as OpenTelemetry (Tracer)
Client->>Pregel: call stream()/astream()
activate Pregel
Tracer->>Tracer: start graph span "invoke_agent {graph_name}"
Pregel->>Command: emit/process Command
Tracer->>Tracer: create command span if goto present
Pregel->>Middleware: before_model / before_agent
Tracer->>Tracer: start middleware span ("execute_task")
Middleware-->>Pregel: continue
Pregel->>Tool: execute tool
Tracer->>Tracer: start tool span ("execute_tool")
Tool-->>Pregel: result
Pregel-->>Client: yield output
Tracer->>Tracer: end graph span
deactivate Pregel
sequenceDiagram
participant Client
participant Factory as Agent Factory
participant Agent
participant Tracer as OpenTelemetry (Tracer)
Client->>Factory: create_react_agent()/create_agent()
activate Factory
Tracer->>Tracer: start agent-creation span
Factory->>Agent: instantiate (tools, system prompts)
Tracer->>Tracer: record agent id/name/tools as GenAI attributes
Factory-->>Client: return Agent
deactivate Factory
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Important
Looks good to me! 👍
Reviewed everything up to 2790746 in 28 seconds. Click for details.
- Reviewed
2469lines of code in11files - Skipped
0files when reviewing. - Skipped posting
0draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
Workflow ID: wflow_cU019DWFBKTkKPeW
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In
`@packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py`:
- Around line 220-223: The guard currently calls
self._wrap_agent_factories(tracer) only when is_package_available("langchain")
is True, which prevents wrapping langgraph.prebuilt.create_react_agent if
langgraph is installed without langchain; update the conditional logic so
_wrap_middleware_hooks(tracer) remains gated by
is_package_available("langchain") but _wrap_agent_factories(tracer) is called
when is_package_available("langgraph") is True (or when langchain is present),
i.e., separate the guards to check is_package_available("langgraph") before
calling _wrap_agent_factories(tracer) and keep existing call to
_wrap_middleware_hooks(tracer) under the langchain check, referencing the
_wrap_agent_factories and _wrap_middleware_hooks methods and
langgraph.prebuilt.create_react_agent.
In
`@packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py`:
- Around line 288-302: The code mutates a list pulled from the OpenTelemetry
context (first_child_pending[0] = False), which is not thread-safe; instead
update the context with an immutable flag value. Replace the mutable-list
pattern around LANGGRAPH_FIRST_CHILD_PENDING_KEY so you read first_child_pending
= context_api.get_value(LANGGRAPH_FIRST_CHILD_PENDING_KEY), treat it as a
boolean (e.g., True means first child), and when flipping it after creating the
span (in the branch that calls self.tracer.start_span with
context=set_span_in_context(graph_span)), write the new boolean back into the
context using context_api.set_value(LANGGRAPH_FIRST_CHILD_PENDING_KEY, False,
<appropriate_context>) rather than mutating the list; update any other code that
reads this key to expect a boolean, and keep the span creation code
(self.tracer.start_span/span_name/graph_span) logic unchanged.
- Around line 923-935: Remove the dead/duplicate on_retriever_error method (the
earlier definition that calls self._handle_error(...) then retrieves and ends
the span) so only the later correct on_retriever_error implementation remains;
do not reintroduce the redundant _get_span/_end_span calls because _handle_error
already ends the span and removes it from self.spans, and ensure references to
on_retriever_error, _handle_error, _get_span, _end_span, and self.spans are
consistent after removal.
- Around line 516-523: The code calls
context_api.attach(context_api.set_value("langgraph_current_node", name)) and
discards the returned token, leaking context; capture the returned token (from
context_api.attach) and store it (e.g., push onto a handler-owned stack like
self._langgraph_tokens) when setting the node in the langchain callback handler,
and ensure you call context_api.detach(token) to pop/cleanup the context when
the node finishes (e.g., in the corresponding exit/cleanup/finally block or when
clearing metadata); if detaching at that lifecycle point is infeasible,
explicitly document the intentional persistence of "langgraph_current_node"
instead of leaking silently.
- Around line 879-881: The assignment that constructs a new SpanHolder and
assigns it into self.spans (creating SpanHolder(span=..., workflow_name=...,
entity_name=..., entity_path=...)) is incorrect and redundant: SpanHolder
requires token, context, and children and _create_span already creates and
stores a proper SpanHolder (including token/context/children). Remove this
explicit SpanHolder construction/assignment so the previously created SpanHolder
from _create_span remains intact and you don't lose the context token or leak
context.
In
`@packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/patch.py`:
- Around line 78-106: _get_graph_name currently calls config.get('run_name')
without ensuring config is a dict which can raise AttributeError for
RunnableConfig or other types; update the logic after extracting config (in
function _get_graph_name) to first check if isinstance(config, dict) and use
config.get('run_name') in that case, otherwise attempt to read an attribute like
getattr(config, "run_name", None) (or similar accessor) before falling back to
instance.get_name() and the default "LangGraph".
In `@packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py`:
- Around line 771-794: Rename and reword the test to match its actual assertion:
change the test name test_middleware_failure_status to something like
test_middleware_super_call_success_on_inner_failure and update the docstring to
state that the wrapper around AgentMiddleware.before_model records the super()
call as "success" even when the middleware's own before_model (defined in
FailingMiddleware) raises a ValueError; keep the invocation of
FailingMiddleware().before_model(...) and the assertion that
middleware_span.attributes[SpanAttributes.GEN_AI_TASK_STATUS] == "success", and
ensure comments mention that the wrapper only observes the super() call
succeeding before the exception in FailingMiddleware.before_model is raised.
- Line 4: Update the import for AgentMiddleware to use the public documented
path: replace the non-public import that references
langchain.agents.middleware.types with the documented public import from
langchain.agents.middleware so tests use the stable API (look for the import
statement referencing AgentMiddleware and adjust it to import from
langchain.agents.middleware).
🧹 Nitpick comments (12)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/langgraph_utils.py (2)
76-78: Consider preserving the original exception type or using a custom exception.Re-raising as a bare
Exceptionerases the original exception type, making it harder for callers to handle specific failure modes. Since the only caller (_set_graph_span_attributesinpatch.py) catchesExceptionbroadly and logs at debug level, this is low-impact in practice.♻️ Suggested improvement
- except Exception as e: - # Re-raise to let caller handle - raise Exception(f"Failed to extract graph structure: {e}") from e + except Exception: + logger.debug("Failed to extract graph structure", exc_info=True) + return json.dumps({"nodes": [], "edges": []})Alternatively, if you want to keep the re-raise pattern, at least use a domain-specific exception class.
38-43: Node extraction relies on iteration order and doesn't handle non-string node IDs.If
graph.nodesyields non-string node IDs (e.g., integers or custom objects), they'll be appended as-is but then used as dict keys in edges and serialized viajson.dumps, which may produce unexpected results or fail. Consider coercing node IDs to strings for safety.packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/patch.py (3)
22-27: Context key"langgraph_current_node"is used as a raw string literal in multiple files.
LANGGRAPH_FLOW_KEY,LANGGRAPH_GRAPH_SPAN_KEY, andLANGGRAPH_FIRST_CHILD_PENDING_KEYare properly defined as constants (Lines 23-27), but"langgraph_current_node"(used here at Line 203 and incallback_handler.pyLine 520) is not. This creates a hidden coupling that's easy to break during refactoring.♻️ Define the missing constant alongside the others
LANGGRAPH_FLOW_KEY = "langgraph_flow" LANGGRAPH_GRAPH_SPAN_KEY = "langgraph_graph_span" LANGGRAPH_FIRST_CHILD_PENDING_KEY = "langgraph_first_child_pending" +LANGGRAPH_CURRENT_NODE_KEY = "langgraph_current_node"Then use
LANGGRAPH_CURRENT_NODE_KEYin bothpatch.pyLine 203 andcallback_handler.pyLine 520.Also applies to: 195-239
15-20:_generate_agent_idusesid(instance)which is non-deterministic across runs.
id()returns the memory address, so the same logical agent will produce different IDs across process restarts or even within the same process if the object is garbage-collected and recreated. This makes thegen_ai.agent.idattribute unsuitable for correlating agents across traces from different processes.This may be acceptable if the ID is only meant to be unique within a single trace, but worth documenting the limitation explicitly.
316-331: UseGenAITaskStatusenum values instead of string literals.
GenAITaskStatuswas defined in the semantic conventions package but these wrappers use hardcoded"success"/"failure"strings. This applies to Lines 325, 328, 355, 358 here, and similarly throughoutcallback_handler.py.♻️ Use the enum
+from opentelemetry.semconv_ai import GenAIOperationName, GenAITaskStatus, SpanAttributes ... - span.set_attribute(SpanAttributes.GEN_AI_TASK_STATUS, "success") + span.set_attribute(SpanAttributes.GEN_AI_TASK_STATUS, GenAITaskStatus.SUCCESS.value) ... - span.set_attribute(SpanAttributes.GEN_AI_TASK_STATUS, "failure") + span.set_attribute(SpanAttributes.GEN_AI_TASK_STATUS, GenAITaskStatus.FAILURE.value)Also applies to: 346-361
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)
847-878:on_retriever_startbypasses_create_task_spanand manually creates the span.Unlike
on_tool_startandon_chain_start,on_retriever_startcalls_create_spandirectly rather than_create_task_span. This means it misses the GenAI provider name detection logic (LangGraphvslangchainbased on context) that_create_task_spanprovides. Consider using_create_task_spanfor consistency, or at minimum add theGEN_AI_PROVIDER_NAMEattribute.packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)
330-348: UseGenAITaskStatusenum values instead of raw strings for task status attributes.The
GenAITaskStatusenum is defined withSUCCESS = "success"andFAILURE = "failure", but the codebase uses raw string literals instead of enum values when settingSpanAttributes.GEN_AI_TASK_STATUS. This occurs in callback_handler.py (lines 556, 815, 899, 978) and patch.py (lines 325, 328, 355, 358). UseGenAITaskStatus.SUCCESS.valueandGenAITaskStatus.FAILURE.valuefor consistency and type safety.Additionally, some operation names in
GenAIOperationName(EXECUTE_TASK, LLM_REQUEST, VECTOR_DB_RETRIEVE) are not defined in the OpenTelemetry GenAI semantic specification; they are custom extensions. If these are intentional, document the rationale. The spec-compliant values (CREATE_AGENT, INVOKE_AGENT, EXECUTE_TOOL) align correctly with the specification.packages/opentelemetry-instrumentation-langchain/tests/test_agents.py (3)
40-57: Set assertion silently deduplicates repeated span names.The assertion uses
set(), but several span names appear twice (e.g.,"execute_task RunnableLambda"on Lines 41 and 49,"execute_task ChatPromptTemplate"on Lines 44 and 52, etc.). Since a set discards duplicates, this only verifies that each unique name is present — it cannot catch a regression where one of the two expected occurrences is missing.This is pre-existing behavior, but since these lines are being modified, consider switching to a multiset comparison to also assert correct counts:
Suggested approach
+from collections import Counter ... - assert set([span.name for span in spans]) == { - "execute_task RunnableLambda", - ... - } + assert Counter([span.name for span in spans]) == Counter([ + "execute_task RunnableLambda", + "execute_task RunnableParallel<agent_scratchpad>", + "execute_task RunnableAssign<agent_scratchpad>", + "execute_task ChatPromptTemplate", + "ChatOpenAI.chat", + "execute_task ToolsAgentOutputParser", + "execute_task RunnableSequence", + "execute_tool tavily_search_results_json", + "execute_task RunnableLambda", + "execute_task RunnableParallel<agent_scratchpad>", + "execute_task RunnableAssign<agent_scratchpad>", + "execute_task ChatPromptTemplate", + "ChatOpenAI.chat", + "execute_task ToolsAgentOutputParser", + "execute_task RunnableSequence", + "AgentExecutor.workflow", + ])
84-101: Same set-deduplication concern as above applies here.Same pattern — duplicate span names are lost inside the
set(). The sameCounter-based fix would strengthen this assertion too.
178-194: Same set-deduplication concern applies to this third test.Consistent with the other two test functions.
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py (2)
669-676: Weak assertion:>= 1when exactly one goto is expected.The graph has a single
Command(goto="target_node"), so exactly one goto span should be produced. Using>= 1could mask a regression where duplicate goto spans are emitted.Suggested tightening
- assert len(goto_spans) >= 1 + assert len(goto_spans) == 1
748-768: Same weak>= 1assertion pattern; tighten to== 1.Only one middleware call is made, so exactly one span is expected.
Suggested fix
- assert len(middleware_spans) >= 1 + assert len(middleware_spans) == 1
.../opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/__init__.py
Outdated
Show resolved
Hide resolved
...emetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
Show resolved
Hide resolved
...emetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
Outdated
Show resolved
Hide resolved
...emetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
Outdated
Show resolved
Hide resolved
...emetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
Outdated
Show resolved
Hide resolved
...ges/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/patch.py
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py
Show resolved
Hide resolved
packages/opentelemetry-instrumentation-langchain/tests/test_langgraph.py
Outdated
Show resolved
Hide resolved
Adds gen_ai.* attributes for LangChain/LangGraph instrumentation, aligned with OpenTelemetry GenAI Agent Spans specification. Changes: - Add GenAI span naming: invoke_agent, execute_task, execute_tool, create_agent, goto - Add gen_ai.* attributes: provider.name, operation.name, agent.*, task.*, conversation.id, tool.definitions, workflow.structure - Add Command routing instrumentation with goto spans - Add create_react_agent factory instrumentation - Add retriever and middleware hook instrumentation - Update tests for new span naming conventions Co-Authored-By: Ronen Schaffer <ronensc@users.noreply.github.com> Co-Authored-By: dany-moshkovich <dany-moshkovich@users.noreply.github.com> Co-Authored-By: Claude <noreply@anthropic.com>
2790746 to
0995cb2
Compare
- Remove dead on_retriever_error method (shadowed by later definition) - Remove redundant SpanHolder creation in on_retriever_start - Add isinstance(config, dict) check in _get_graph_name - Add LANGGRAPH_CURRENT_NODE_KEY constant and use it consistently - Separate LangGraph/LangChain availability guards for agent factories - Rename test to match actual assertion behavior - Add documentation for intentional context token non-detachment Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In
`@packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py`:
- Around line 834-884: The on_retriever_start handler computes workflow_name,
entity_path, name (entity_name) and metadata but does not forward them to
_create_span, so the returned SpanHolder lacks those fields; change the
_create_span call in on_retriever_start to pass workflow_name=workflow_name,
entity_name=name, entity_path=entity_path, and metadata=metadata (in addition to
the existing run_id, parent_run_id, span_name, SpanKind.CLIENT) so the
SpanHolder is populated and child spans can resolve
get_workflow_name/get_entity_path correctly; keep the existing
_set_span_attribute calls if desired for redundancy.
In
`@packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/patch.py`:
- Around line 143-149: When the wrapped generator or async_wrapper raises, the
finally block ends graph_span without marking it errored; update both the
generator wrapper (the for-item loop using wrapped(*args, **kwargs) and finally)
and the async_wrapper (async for / await wrapper) to catch exceptions and set
graph_span status to ERROR before ending: in the except block capture the
exception and call graph_span.set_status(...) using the OpenTelemetry
Status/StatusCode ERROR with the exception message, then re-raise; ensure you
still call graph_span.end() and context_api.detach(graph_span_ctx) and
context_api.detach(langgraph_ctx) in the finally.
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/patch.py (1)
234-237:str(goto_destinations)produces Python list repr, not a portable format.
str(goto_destinations)yields"['node1', 'node2']"(Python repr with single quotes). Usejson.dumpsfor a consistent, parseable serialization:♻️ Suggested change
else: span.set_attribute( - SpanAttributes.LANGGRAPH_COMMAND_GOTO_NODES, str(goto_destinations) + SpanAttributes.LANGGRAPH_COMMAND_GOTO_NODES, json.dumps(goto_destinations) )
...ges/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/patch.py
Show resolved
Hide resolved
- Pass workflow_name, entity_name, entity_path, metadata to _create_span in on_retriever_start for proper SpanHolder population - Set graph span status to ERROR on exception in sync/async wrappers - Use json.dumps instead of str() for goto_destinations serialization Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py`:
- Around line 834-888: The on_retriever_start handler creates the span via
_create_span and therefore never gets the GenAI provider/task attributes set
(missing SpanAttributes.GEN_AI_PROVIDER_NAME); update on_retriever_start to
either call _create_task_span instead of _create_span (passing the same run_id,
parent_run_id, span_name, SpanKind.CLIENT, workflow_name, entity_name/name,
entity_path, metadata) so the provider and task-level attributes are applied
consistently, or after creating the span explicitly set
SpanAttributes.GEN_AI_PROVIDER_NAME to the provider value used elsewhere (use
the same provider resolution used by _create_task_span) and ensure
GenAIOperationName.VECTOR_DB_RETRIEVE is still set.
🧹 Nitpick comments (7)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/patch.py (4)
15-20: Agent ID is non-deterministic across process restarts.
id(instance)returns the CPython memory address, which changes on every restart and can be reused after GC. The generatedGEN_AI_AGENT_IDis therefore only meaningful within a single process lifetime. If downstream consumers (dashboards, alerts) correlate by agent ID across runs, they'll see fragmented data.If cross-run stability isn't a requirement, consider adding a brief note in the docstring to set expectations (e.g., "unique within a single process execution").
122-188: Significant code duplication between sync and async graph wrappers.The sync
wrapper(Lines 122–153) andasync_wrapper(Lines 155–186) share identical logic for name extraction, context setup, attribute setting, error handling, and teardown — differing only infor/async forandyield/yield. Consider extracting the shared context-management into a helper (or a context manager) to reduce the surface area for future divergence.
262-281:Send = type(None)fallback is clever but fragile — consider a clearer sentinel.When
langgraph.types.Sendisn't importable, you assignSend = type(None), then guard withSend is not type(None). This works, but is non-obvious on first read. A simpleSend = Nonesentinel withSend is not None and isinstance(...)would be more idiomatic.Proposed simplification
try: from langgraph.types import Send except ImportError: - # If Send is not available, just handle strings - Send = type(None) + Send = None destinations = [] if isinstance(goto, str): destinations.append(goto) - elif Send is not type(None) and isinstance(goto, Send): + elif Send is not None and isinstance(goto, Send): destinations.append(goto.node) elif isinstance(goto, (list, tuple)): for item in goto: if isinstance(item, str): destinations.append(item) - elif Send is not type(None) and isinstance(item, Send): + elif Send is not None and isinstance(item, Send): destinations.append(item.node)
232-232: Consider using a standardGenAIOperationNamevalue or alternative approach for the LangGraph "goto" command.While
"goto"appears intentional for LangGraph's routing commands, it bypasses theGenAIOperationNameenum (which definesCREATE_AGENT,INVOKE_AGENT,EXECUTE_TASK,EXECUTE_TOOL,LLM_REQUEST, andVECTOR_DB_RETRIEVE). Since "goto" represents control flow rather than a GenAI operation, either:
- Map it to an existing operation like
EXECUTE_TASK.valueif semantically appropriate, or- Use a different span attribute (e.g.,
GEN_AI_TASK_KIND) for LangGraph-specific command types rather than reusingGEN_AI_OPERATION_NAMEThis improves type safety and clarity about which attributes represent standard GenAI operations.
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (3)
343-358: Inconsistent span naming: agents keep legacy format while tools/tasks use GenAI convention.Agent spans use
{name}.{kind.value}(e.g.,"AgentExecutor.workflow"), while tools use"execute_tool {name}"and tasks use"execute_task {name}". The comment says "Keep existing workflow span naming" — if this is intentional for backward compatibility, consider adding a brief code comment explaining the migration plan (or confirming there isn't one).
867-874: Redundant attribute setting inon_retriever_start.
TRACELOOP_WORKFLOW_NAMEandTRACELOOP_ENTITY_PATHare already set by_create_span(Lines 310–311) sinceworkflow_nameandentity_pathare now passed as keyword arguments. Lines 873–874 set them again redundantly.Remove redundant lines
# Set GenAI semantic convention attributes _set_span_attribute( span, SpanAttributes.GEN_AI_OPERATION_NAME, GenAIOperationName.VECTOR_DB_RETRIEVE.value ) _set_span_attribute(span, SpanAttributes.TRACELOOP_SPAN_KIND, TraceloopSpanKindValues.TASK.value) _set_span_attribute(span, SpanAttributes.TRACELOOP_ENTITY_NAME, name) - _set_span_attribute(span, SpanAttributes.TRACELOOP_WORKFLOW_NAME, workflow_name) - _set_span_attribute(span, SpanAttributes.TRACELOOP_ENTITY_PATH, entity_path)
461-463: Pre-existing:_create_llm_spanoverwritesSpanHolder, losing the span-context token from_create_span.
_create_span(Line 431) stores aSpanHolderwith the span-context attachment token. Then Line 461 creates a newSpanHolderwith only the suppression token, discarding the original. The span-context token is never detached. This is a pre-existing issue not introduced by this PR, but worth tracking as tech debt — the context leak accumulates over the lifetime of the process.
...emetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
Show resolved
Hide resolved
- Set provider name based on LangGraph flow context in retriever spans - Remove redundant TRACELOOP_WORKFLOW_NAME and TRACELOOP_ENTITY_PATH attributes (already set by _create_span) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py (1)
781-790:⚠️ Potential issue | 🟠 Major
metadatanot forwarded to_create_task_span— tool spans lose metadata attributes.
on_tool_startreceivesmetadata(Line 769) but doesn't pass it to_create_task_span. Compare withon_chain_start(Line 507) which does pass it. This means tool spans won't get metadata-based span attributes or association properties.🐛 Proposed fix
span = self._create_task_span( run_id, parent_run_id, name, TraceloopSpanKindValues.TOOL, workflow_name, name, entity_path, + metadata=metadata, serialized=serialized, )
🤖 Fix all issues with AI agents
In
`@packages/opentelemetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py`:
- Around line 879-890: The guard in on_retriever_start is missing the not
should_emit_events() check, causing input attributes to be set when event
emission is enabled; update the conditional to match other handlers by changing
the existing should_send_prompts() check to not should_emit_events() and
should_send_prompts(), so that before calling json.dumps and _set_span_attribute
for SpanAttributes.TRACELOOP_ENTITY_INPUT and SpanAttributes.GEN_AI_TASK_INPUT
you verify both not should_emit_events() and should_send_prompts().
- Around line 911-928: The guard that wraps the prompt/document output currently
uses should_send_prompts() alone; update the condition to match the pattern in
on_chain_end/on_tool_end by adding a negation check for should_emit_events()
(i.e., if should_send_prompts() and not should_emit_events():) so the block that
builds docs_output, dumps output_json, and calls _set_span_attribute for
SpanAttributes.TRACELOOP_ENTITY_OUTPUT and SpanAttributes.GEN_AI_TASK_OUTPUT
only runs when prompts should be sent and events are not being emitted.
...emetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
Outdated
Show resolved
Hide resolved
...emetry-instrumentation-langchain/opentelemetry/instrumentation/langchain/callback_handler.py
Outdated
Show resolved
Hide resolved
- Pass metadata to _create_task_span in on_tool_start - Add missing not should_emit_events() guard in on_retriever_start - Add missing not should_emit_events() guard in on_retriever_end Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
galkleinman
left a comment
There was a problem hiding this comment.
Great PR! I have couple of comments that are worth paying attention to.
LMK if you have things you want to discuss there.
on the callback_handler btw, there should be also on_retriever_error() i think so.
| # Set current node in context for Command source tracking. | ||
| # Note: We intentionally don't store/detach this token because in async | ||
| # scenarios, detaching tokens created in a different context causes errors. | ||
| # The context is automatically cleaned up when the graph execution completes. | ||
| if metadata and metadata.get("langgraph_node") == name: | ||
| try: | ||
| context_api.attach( | ||
| context_api.set_value(LANGGRAPH_CURRENT_NODE_KEY, name) | ||
| ) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
Context token is attached but never detached
I understand the comment about async detach errors, but never detaching a context token is a leak — in long-running services this accumulates over time. Could we explore using a contextvars.ContextVar instead? That would sidestep the OTel attach/detach lifecycle entirely while still being async-safe
There was a problem hiding this comment.
Sure we I implement a solution where we detach it also.
| # GenAI Operation Attributes (OpenTelemetry GenAI Semantic Conventions) | ||
| GEN_AI_OPERATION_NAME = "gen_ai.operation.name" | ||
| GEN_AI_PROVIDER_NAME = "gen_ai.provider.name" | ||
| GEN_AI_CONVERSATION_ID = "gen_ai.conversation.id" | ||
|
|
||
| # GenAI Agent Attributes | ||
| GEN_AI_AGENT_ID = "gen_ai.agent.id" | ||
| GEN_AI_AGENT_NAME = "gen_ai.agent.name" | ||
| GEN_AI_SYSTEM_INSTRUCTIONS = "gen_ai.system_instructions" | ||
|
|
||
| # GenAI Task Attributes | ||
| GEN_AI_TASK_ID = "gen_ai.task.id" | ||
| GEN_AI_TASK_NAME = "gen_ai.task.name" | ||
| GEN_AI_TASK_PARENT_ID = "gen_ai.task.parent.id" | ||
| GEN_AI_TASK_INPUT = "gen_ai.task.input" | ||
| GEN_AI_TASK_OUTPUT = "gen_ai.task.output" | ||
| GEN_AI_TASK_STATUS = "gen_ai.task.status" | ||
| GEN_AI_TASK_KIND = "gen_ai.task.kind" | ||
|
|
||
| # GenAI Tool Attributes | ||
| GEN_AI_TOOL_NAME = "gen_ai.tool.name" | ||
| GEN_AI_TOOL_DESCRIPTION = "gen_ai.tool.description" | ||
| GEN_AI_TOOL_TYPE = "gen_ai.tool.type" | ||
| GEN_AI_TOOL_DEFINITIONS = "gen_ai.tool.definitions" | ||
| GEN_AI_TOOL_CALL_ARGUMENTS = "gen_ai.tool.call.arguments" | ||
| GEN_AI_TOOL_CALL_RESULT = "gen_ai.tool.call.result" | ||
|
|
||
| # GenAI Workflow Attributes | ||
| GEN_AI_WORKFLOW_STRUCTURE = "gen_ai.workflow.structure" | ||
|
|
||
| # LangGraph-specific Attributes (vendor namespace) | ||
| LANGGRAPH_COMMAND_SOURCE_NODE = "langgraph.command.source_node" | ||
| LANGGRAPH_COMMAND_GOTO_NODE = "langgraph.command.goto_node" | ||
| LANGGRAPH_COMMAND_GOTO_NODES = "langgraph.command.goto_nodes" | ||
|
|
||
|
|
||
| class Events(Enum): | ||
| DB_QUERY_EMBEDDINGS = "db.query.embeddings" | ||
| DB_QUERY_RESULT = "db.query.result" | ||
| DB_SEARCH_EMBEDDINGS = "db.search.embeddings" | ||
| DB_SEARCH_RESULT = "db.search.result" | ||
|
|
||
|
|
||
| class EventAttributes(Enum): | ||
| # Query Embeddings | ||
| DB_QUERY_EMBEDDINGS_VECTOR = "db.query.embeddings.vector" | ||
|
|
||
| # Query Result (canonical format) | ||
| DB_QUERY_RESULT_ID = "db.query.result.id" | ||
| DB_QUERY_RESULT_SCORE = "db.query.result.score" | ||
| DB_QUERY_RESULT_DISTANCE = "db.query.result.distance" | ||
| DB_QUERY_RESULT_METADATA = "db.query.result.metadata" | ||
| DB_QUERY_RESULT_VECTOR = "db.query.result.vector" | ||
| DB_QUERY_RESULT_DOCUMENT = "db.query.result.document" | ||
|
|
||
| # SEARCH | ||
| DB_SEARCH_EMBEDDINGS_VECTOR = "db.search.embeddings.vector" | ||
|
|
||
| DB_SEARCH_RESULT_QUERY_ID = "db.search.query.id" # For multi-vector searches | ||
| DB_SEARCH_RESULT_ID = "db.search.result.id" | ||
| DB_SEARCH_RESULT_SCORE = "db.search.result.score" | ||
| DB_SEARCH_RESULT_DISTANCE = "db.search.result.distance" | ||
| DB_SEARCH_RESULT_ENTITY = "db.search.result.entity" | ||
|
|
||
|
|
||
| class LLMRequestTypeValues(Enum): | ||
| COMPLETION = "completion" | ||
| CHAT = "chat" | ||
| RERANK = "rerank" | ||
| EMBEDDING = "embedding" | ||
| UNKNOWN = "unknown" | ||
|
|
||
|
|
||
| class TraceloopSpanKindValues(Enum): | ||
| WORKFLOW = "workflow" | ||
| TASK = "task" | ||
| AGENT = "agent" | ||
| TOOL = "tool" | ||
| UNKNOWN = "unknown" | ||
|
|
||
|
|
||
| class GenAIOperationName(Enum): | ||
| """ | ||
| Operation names for GenAI spans following OpenTelemetry GenAI semantic conventions. | ||
| See: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/ | ||
| """ | ||
|
|
||
| CREATE_AGENT = "create_agent" | ||
| INVOKE_AGENT = "invoke_agent" | ||
| EXECUTE_TASK = "execute_task" | ||
| EXECUTE_TOOL = "execute_tool" | ||
| LLM_REQUEST = "llm_request" | ||
| VECTOR_DB_RETRIEVE = "vector_db_retrieve" | ||
|
|
||
|
|
||
| class GenAITaskStatus(Enum): | ||
| """Task execution status values for gen_ai.task.status attribute.""" | ||
|
|
||
| SUCCESS = "success" | ||
| FAILURE = "failure" |
There was a problem hiding this comment.
why aren't we just importing from the official otel semconv the attributes that exist there?
There was a problem hiding this comment.
regarding the semantic conventions alignment - I checked the official package and the GenAI attributes are still in incubation (opentelemetry.semconv._incubating.attributes.gen_ai_attributes.GenAiOperationNameValues). They only have CHAT, EMBED, and TEXT_COMPLETION right now.
Our code uses agent-specific operations like CREATE_AGENT, INVOKE_AGENT, EXECUTE_TASK, etc. that aren't in the official spec yet.
Should I import and use the official values where they exist, or just keep our custom enum and document which values are extensions? What's your preference?
| graph_structure = extract_graph_json(instance) | ||
| graph_span.set_attribute(SpanAttributes.GEN_AI_WORKFLOW_STRUCTURE, graph_structure) |
There was a problem hiding this comment.
The current approach serializes the graph structure as a JSON string in a single attribute.
OTel natively supports string[] attributes, which would be more idiomatic, more compact, and directly queryable by trace backends (no JSON parsing needed):
span.set_attribute("gen_ai.workflow.nodes", ["node1", "node2", "node3"])
span.set_attribute("gen_ai.workflow.edges", ["node1 -> node2", "node1 -> node3"])This eliminates the JSON overhead, the confusing grouped edge format, and makes the data first-class in any OTel-compatible backend. The extraction function also simplifies significantly - no json.dumps, no edge grouping needed.
| def _generate_agent_id(name: str, instance: Any) -> str: | ||
| """Generate a unique agent ID by hashing the instance memory address.""" | ||
| instance_id = str(id(instance)) | ||
| hash_input = f"{name}_{instance_id}" | ||
| hashed = hashlib.sha256(hash_input.encode()).hexdigest()[:16] | ||
| return f"{name}_{hashed}" |
There was a problem hiding this comment.
Python reuses memory addresses after garbage collection, so id(instance) can produce the same value for different objects across invocations.
Use uuid.uuid4() for a truly unique ID?
There was a problem hiding this comment.
About the agent ID generation - you're right that id(instance) can get reused after GC. We want the same graph instance to have the same gen_ai.agent.id across both the create_agent and invoke_agent spans so backends can correlate them.
Approach 1: Cache UUID on instance
def _generate_agent_id(name: str, instance: Any) -> str:
if not hasattr(instance, '_otel_agent_id'):
unique_id = uuid.uuid4().hex[:16]
instance.otel_agent_id = f"{name}{unique_id}"
return instance._otel_agent_id
Approach 2: Use ContextVar mapping (avoids mutating instance)
import contextvars
from weakref import WeakKeyDictionary
Module level
_agent_id_cache: WeakKeyDictionary = WeakKeyDictionary()
def _generate_agent_id(name: str, instance: Any) -> str:
if instance not in _agent_id_cache:
unique_id = uuid.uuid4().hex[:16]
agent_id_cache[instance] = f"{name}{unique_id}"
return _agent_id_cache[instance]
The WeakKeyDictionary automatically cleans up when instances are GC'd, and we don't mutate the graph instance.
Which approach do you prefer, or is there a better pattern you'd recommend?
|
|
||
| except Exception as e: | ||
| # Re-raise to let caller handle | ||
| raise Exception(f"Failed to extract graph structure: {e}") from e |
There was a problem hiding this comment.
Consider either letting the original exception propagate naturally, or defining a small custom exception if you want the wrapper message.
| # Set GenAI semantic convention attributes | ||
| # Check LangGraph flow context to set appropriate provider | ||
| langgraph_flow = context_api.get_value(LANGGRAPH_FLOW_KEY) | ||
| provider_name = "LangGraph" if langgraph_flow else "langchain" |
There was a problem hiding this comment.
should be lowercase according to otel semconv
| # Context key for marking LangGraph flow | ||
| LANGGRAPH_FLOW_KEY = "langgraph_flow" | ||
| # Context key for storing the graph span as parent for callback-created spans | ||
| LANGGRAPH_GRAPH_SPAN_KEY = "langgraph_graph_span" | ||
| # Context key for tracking if first child of graph span is pending (mutable list [bool]) | ||
| LANGGRAPH_FIRST_CHILD_PENDING_KEY = "langgraph_first_child_pending" | ||
| # Context key for tracking current node (for Command source tracking) | ||
| LANGGRAPH_CURRENT_NODE_KEY = "langgraph_current_node" |
There was a problem hiding this comment.
As I mentioned during our meeting, i'm not sure the context tricks will work here, or at least won't work for some cases, since we're using langchain callback mechanism which is called with different contexts. look at the SpanHolder examples, this is the only (ugly) way of passing data between different callbacks of the "same context".
There was a problem hiding this comment.
You mentioned that context tricks won't work with LangChain callbacks and that SpanHolder is the way to go. We're currently using 4 context keys:
LANGGRAPH_FLOW_KEY - Marks LangGraph vs langchain flow
LANGGRAPH_GRAPH_SPAN_KEY - Stores graph span for parent-child linking
LANGGRAPH_FIRST_CHILD_PENDING_KEY - Tracks first child with mutable list pattern
LANGGRAPH_CURRENT_NODE_KEY - Tracks current node for Command source tracking
Our tests (including async ones) are all passing, but I want to understand which context keys are problematic. Should we refactor all of them to use SpanHolder, or just specific ones? Is this a blocker for merging or something we can address in follow-up work?
|
You mentioned on_retriever_error() should exist - it's already there at callback_handler.py lines 1030-1039. Did you miss it in the review, or were you referring to something else about that handler? |
…edback - Replace JSON string gen_ai.workflow.structure with separate arrays - Add gen_ai.workflow.nodes and gen_ai.workflow.edges attributes - Update extract_graph_json() → extract_graph_structure() returning tuple - Edges now formatted as "source -> target" strings for better readability Addresses maintainer request to use native OTel string[] instead of JSON. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Change gen_ai.provider.name from "LangGraph" to "langgraph" - Update all occurrences in patch.py and callback_handler.py - Update test assertions to match lowercase convention Addresses maintainer feedback on inconsistent casing. OTel semantic conventions use lowercase for provider names. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…tructure Remove try/except wrapper that re-raised as generic Exception. The caller already handles exceptions with debug logging. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
Thanks for the detailed review! I've pushed fixes for the items we agree on, and have a few questions on the remaining points. Done:
|
This PR adds OpenTelemetry GenAI semantic convention support to LangChain/LangGraph instrumentation, aligned with the official GenAI Agent Spans specification.
Fixes: #3439
Spec references:
Changes
Span Naming (GenAI convention)
invoke_agent {name}- Graph/agent invocationsexecute_task {name}- Task spansexecute_tool {name}- Tool spanscreate_agent {name}- Agent factory callsgoto {target}- Command routingNew Attributes
gen_ai.operation.name- Operation typegen_ai.provider.name- Provider (LangGraph/langchain)gen_ai.agent.id/name- Agent identificationgen_ai.task.id/name/status/kind- Task trackinggen_ai.conversation.id- Thread/session trackinggen_ai.tool.definitions- Tool schemas (OpenAI format)gen_ai.workflow.structure- Graph topology JSONgen_ai.system_instructions- Agent system promptslanggraph.command.source_node/goto_node- Command routingInstrumentation Coverage
Pregel.stream/astream(graph invocation)Command.__init__(routing)create_react_agentfactoryAgentMiddlewarehooksAcknowledgments:
Special thanks to @galkleinman for guidance on the semantic conventions approach.
feat(instrumentation): ...orfix(instrumentation): ....Important
Added comprehensive LangGraph instrumentation with GenAI semantic conventions, including graph invocation tracing, middleware hook wrapping, agent factory instrumentation, and retriever operation tracking.
LangGraph Instrumentation
_wrap_langgraph_components()method to wrapPregel.streamandPregel.astreamfor graph invocation tracing, andCommand.__init__to capture routing commands with source and destination node information._wrap_agent_factories()to instrument both LangGraph prebuilt agents (create_react_agent) and LangChain agents (create_agent) at their module locations and re-export locations._wrap_middleware_hooks()to wrap synchronous and asynchronousAgentMiddlewarehook methods (before_model,after_model,before_agent,after_agentand their async variants)._uninstrument()method to properly unwrap all LangGraph components, middleware hooks, and agent factory functions.GenAI Semantic Conventions
SpanAttributesclass including operation names, provider, conversation ID, agent attributes (ID, name, system instructions), task attributes (ID, name, parent ID, input, output, status, kind), tool attributes (name, description, type, definitions, call arguments, results), and workflow structure.GenAIOperationNameenum with values forcreate_agent,invoke_agent,execute_task,execute_tool,llm_request, andvector_db_retrieveoperations, andGenAITaskStatusenum withsuccessandfailurevalues.Callback Handler Enhancements
on_chain_start()to extract conversation ID from config and set LangGraph node context; updated_create_span()to check for LangGraph flow context and parent first child spans to graph spans._create_task_span()with GenAI semantic convention attributes including provider name, operation name, and kind-specific attributes (agent name/ID, tool name/type/description, task name/ID/parent ID).on_retriever_start(),on_retriever_end(), andon_retriever_error()methods to handle retriever callbacks withvector_db_retrievespan naming and GenAI semantic convention attributes._create_llm_span().on_chain_end(),on_tool_end(), and_handle_error()to set task status to success or failure.Utilities and Helpers
langgraph_utils.pymodule withextract_graph_json()function to extract nodes and edges from LangGraph Pregel instances, filtering special nodes and grouping edges by source.patch.pymodule with helper functions for generating unique agent IDs, managing span lifecycle and context, extracting graph names, setting GenAI attributes on spans, extracting tool definitions, and wrapping middleware hooks and agent factories.Test Updates
test_agents.py,test_chains.py,test_documents_chains.py,test_lcel.py,test_llms.py) to use new span naming convention: changed from"ComponentName.task"to"execute_task ComponentName"format.test_langgraph.pyincludingtest_create_react_agent_span,test_retriever_span_attributes,test_middleware_hook_span_attributes,test_langgraph_custom_name,test_command_with_goto,test_send_extraction,test_create_agent_with_system_prompt,test_async_middleware_hook, andtest_middleware_failure_statusto verify GenAI semantic convention attributes and LangGraph-specific functionality."invoke_agent LangGraph"span as parent of workflow spans.This description was created by
for 2790746. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Tests
Bug Fixes