Skip to content

Comments

feat(azure-search): Add Azure AI Search instrumentation#3667

Open
maheshbabugorantla wants to merge 41 commits intotraceloop:mainfrom
maheshbabugorantla:mbg/traceloop-azure-ai-search-instrumentation
Open

feat(azure-search): Add Azure AI Search instrumentation#3667
maheshbabugorantla wants to merge 41 commits intotraceloop:mainfrom
maheshbabugorantla:mbg/traceloop-azure-ai-search-instrumentation

Conversation

@maheshbabugorantla
Copy link

@maheshbabugorantla maheshbabugorantla commented Feb 8, 2026

Details

Add OpenTelemetry instrumentation for Azure AI Search (azure-search-documents SDK). This instrumentation provides comprehensive observability for Azure AI Search operations including document search, indexing, and management operations.

Features

  • SearchClient instrumentation (10 methods): search, get_document, get_document_count, upload_documents, merge_documents, delete_documents, merge_or_upload_documents, index_documents, autocomplete, suggest

  • SearchIndexClient instrumentation (15 methods): Index CRUD, listing, statistics, text analysis, synonym map CRUD, and name-only listing

  • SearchIndexerClient instrumentation (21 methods): Indexer, data source connection, and skillset management including name-only listing methods

  • SearchIndexingBufferedSender instrumentation (6 methods): Buffered document operations and flush

  • Async support: All 52 sync methods also instrumented for async variants (104 total instrumented methods)

  • Vector/Hybrid/Semantic search support: Captures query_type, top_k, filters, vector query kind, vector weight, and oversampling parameters

  • Response attribute capturing: Search results count, document batch success/failure counts, autocomplete/suggest results count, service statistics, scoring profiles

  • Content capture with toggle: Request/response content captured as indexed span attributes (documents, suggestions, vector embeddings), gated by TRACELOOP_TRACE_CONTENT env var (default: enabled). Uses indexed attributes (not span.add_event()) so content is visible in APM backends like Elastic APM.

  • OpenTelemetry semantic conventions compliance: 44 custom span attributes following GenAI specification patterns

  • Error handling: All attribute extraction wrapped with @dont_throw decorator for safety

Semantic Conventions Added (44 attributes)

# Core attributes
azure_search.index_name
azure_search.search.text
azure_search.search.top
azure_search.search.skip
azure_search.search.filter
azure_search.search.query_type
azure_search.search.results_count
azure_search.search.scoring_profile
azure_search.search.scoring_parameters
azure_search.search.facets
azure_search.search.order_by
azure_search.search.highlight_fields
azure_search.search.highlight_pre_tag
azure_search.search.highlight_post_tag
azure_search.search.search_mode
azure_search.search.minimum_coverage
azure_search.search.select_fields

# Vector search attributes
azure_search.search.vector_query_kind
azure_search.search.vector_weight
azure_search.search.vector_oversampling
azure_search.search.vector_exhaustive
azure_search.search.vector_filter_mode

# Semantic search attributes
azure_search.search.semantic_configuration
azure_search.search.semantic_query
azure_search.search.semantic_max_wait

# Document attributes
azure_search.document.key
azure_search.document.count
azure_search.document.succeeded_count
azure_search.document.failed_count

# Autocomplete/Suggest
azure_search.suggester_name
azure_search.autocomplete.results_count
azure_search.suggest.results_count
azure_search.analyzer_name

# Indexer attributes
azure_search.indexer_name
azure_search.indexer_status
azure_search.data_source_name
azure_search.data_source_type
azure_search.skillset_name
azure_search.skillset_skill_count
azure_search.documents_processed
azure_search.documents_failed

# Synonym map attributes
azure_search.synonym_map.name
azure_search.synonym_map.synonyms_count

# Service statistics
azure_search.service.document_count
azure_search.service.index_count

# Content capture attributes (gated by TRACELOOP_TRACE_CONTENT)
db.query.result.document.{index}
db.search.result.entity.{index}
db.search.embeddings.vector.{index}
db.query.result.id.{index}
db.query.result.metadata.{index}

Testing

  • 258 tests total (228 unit tests + 30 integration tests)

  • 30 VCR cassettes for API call recording/replay

  • Unit tests cover all attribute extraction, dispatch pipelines, content capture toggle, error handling, and instrumentor lifecycle

  • 14 multi-step workflow tests (7 sync + 7 async) validate traces tell a debuggable story for production troubleshooting: search pipeline, document lifecycle CRUD, typeahead pipeline, bulk ingestion partial failure, index management pipeline, content privacy toggle, and error-then-retry

  • Integration tests cover SearchClient, SearchIndexClient, SearchIndexerClient, and SynonymMap operations with real Azure API responses

Screenshots (Elastic APM / Kibana)

Tested with the sample app (azure_search_app.py) against a live Azure AI Search instance.

Trace Waterfall — Index, Upload, and Merge Operations

azure_hotel_search_demo.workflow (8.6s) showing create_index, get_service_statistics, upload_documents, merge_documents, and batch_index_documents spans with color-coded types (service, internal, Azure AI Search, HTTP):

Trace waterfall showing azure_hotel_search_demo.workflow with create_index, upload_documents, merge_documents, and batch_index_documents spans

Upload Documents — Content Capture as Indexed Span Attributes

azure_search.upload_documents span detail showing input documents captured as db.query.result.document.{i} and indexing results as db.query.result.metadata.{i}, plus document_count, succeeded_count, and failed_count:

Upload documents span detail with content capture showing input documents and indexing results as indexed span attributes

Trace Waterfall — Search, Vector, Hybrid, and Autocomplete Operations

Continuation showing text_search, vector_search, hybrid_search, search_with_scoring_profile, and autocomplete spans with nested azure_search.search and HTTP POST calls:

Trace waterfall showing text_search, vector_search, hybrid_search, search_with_scoring_profile, and autocomplete spans

Autocomplete — Span Attributes and Content Capture

azure_search.autocomplete span with index_name, search_text=lux, suggester_name=sg, content captured as db.search.result.entity.0={"text": "luxury", "query_plus_text": "luxury"}, and autocomplete_results_count=1:

Autocomplete span detail with index_name, search_text, suggester_name, content capture, and results count

Fixes #2303

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Summary by CodeRabbit

  • New Features
    • OpenTelemetry instrumentation for Azure AI Search with vendor metadata, rich span attributes, and configurable content tracing.
  • Documentation
    • Package README and a detailed span-attributes guide with usage, examples, and developer guidance.
  • Tests
    • Extensive integration test suite with many recorded HTTP cassettes covering search, indexing, indexers, synonym maps, and related flows.
  • Chores
    • Packaging/project config, lint/python settings, sample app env/deps, Flake8 config, and user-facing missing-instrumentation warnings.

Add 23 semantic convention attributes for Azure AI Search instrumentation
following OpenTelemetry GenAI specification patterns:

Core Attributes:
- azure_search.index_name, azure_search.search.text
- azure_search.search.top, azure_search.search.skip
- azure_search.search.filter, azure_search.search.query_type
- azure_search.document.count, azure_search.document.key
- azure_search.suggester_name, azure_search.analyzer_name

Indexer Pipeline Attributes:
- azure_search.indexer_name, azure_search.indexer.status
- azure_search.indexer.documents_processed/failed
- azure_search.data_source_name, azure_search.data_source.type
- azure_search.skillset_name, azure_search.skillset.skill_count

Response Attributes:
- azure_search.search.results_count
- azure_search.document.succeeded_count/failed_count
- azure_search.autocomplete.results_count
- azure_search.suggest.results_count
Implement AzureSearchInstrumentor with comprehensive method coverage:

SearchClient (10 methods):
- search, get_document, get_document_count
- upload_documents, merge_documents, delete_documents
- merge_or_upload_documents, index_documents
- autocomplete, suggest

SearchIndexClient (7 methods):
- create_index, create_or_update_index, delete_index
- get_index, list_indexes, get_index_statistics, analyze_text

SearchIndexerClient (18 methods):
- Indexer management: create/update/delete/get/run/reset indexers
- Data source connections: create/update/delete/get operations
- Skillset management: create/update/delete/get operations

Features:
- Full sync and async support (67 total instrumented methods)
- Response attribute capturing for search results and batch operations
- Span attributes for vector/hybrid/semantic search parameters
- @dont_throw decorator for graceful error handling
Add 51 tests covering all instrumentation functionality:

Unit Tests (33 tests):
- SearchClient span creation and attributes
- SearchIndexClient index management operations
- SearchIndexerClient pipeline operations
- Instrumentation lifecycle (instrument/uninstrument)
- Response attribute extraction

Integration Tests (18 tests):
- VCR cassette-based API call recording/replay
- Real SDK behavior verification

Test Configuration:
- VCR config with header filtering for api-key/authorization
- allow_playback_repeats for search iterator support
- InMemorySpanExporter for span verification
Configure Poetry package for opentelemetry-instrumentation-azure-search:

Dependencies:
- python >=3.9,<4
- opentelemetry-api ^1.38.0
- opentelemetry-instrumentation >=0.59b0
- opentelemetry-semantic-conventions-ai ^0.49.6
- azure-search-documents >=11.0.0

Dev Dependencies:
- pytest, pytest-vcr for testing
- vcrpy >=7.0.0 (fixed urllib3 compatibility)

Configuration:
- .python-version: 3.9.5
- poetry.toml: in-project virtualenvs
- OpenTelemetry instrumentor entry point registered
Add Azure AI Search as optional dependency in traceloop-sdk:

Installation:
- pip install 'traceloop-sdk[azure-search]'
- pip install 'traceloop-sdk[all]'

Usage:
- Traceloop.init() with auto-instrumentation
- Instruments.AZURE_SEARCH for explicit selection

Fixes traceloop#2303
Align with repository-wide migration from Poetry to UV:

Package Configuration:
- Update pyproject.toml to use [project] format with hatchling
- Add [tool.uv.sources] for local semconv-ai dependency
- Update .python-version to 3.10
- Replace poetry.lock with uv.lock

Nx Project Configuration:
- Update project.json to use nx:run-commands executor
- Replace poetry commands with uv commands
- Update lint target to use ruff instead of flake8

Test Fixes:
- Fix unused variable warnings from ruff linter
- All 51 tests passing
… and async support

- Add vector search attributes: vector_queries_count, vector_fields,
  k_nearest_neighbors, vector_exhaustive, vector_filter_mode
- Add semantic search attributes: semantic_configuration_name,
  query_caption, query_answer
- Add additional search attributes: search_mode, scoring_profile,
  select, search_fields
- Fix async wrapping: separate _async_wrap that properly awaits coroutines
- Add error handling: set StatusCode.ERROR with description on exceptions
- Refactor _wrap into _sync_wrap/_async_wrap with shared helpers
- Add 21 new tests covering all new functionality
- Add 12 new semantic convention attributes
Add pytest-cov>=7.0.0 to the test dependency group and configure
coverage settings in pyproject.toml to enable measuring test coverage
for the azure_search instrumentation package.
Add TestSyncWrapDispatch class with tests exercising the full _sync_wrap
dispatch pipeline for every method type: search, document CRUD,
autocomplete, suggest, index management, indexer management, data source,
and skillset operations. Each test verifies the correct span is created
with expected attributes and that the wrapped function's return value
passes through correctly.
…tions

Add TestAttributeFunctionEdgeCases class testing boundary conditions
for all attribute setter functions: positional args vs kwargs fallbacks,
generator inputs, enum value extraction, missing/None values, non-standard
types, and direct function invocations for search, document batch,
index management, analyze_text, indexer, data source, and skillset
operations.
Add TestUtilsAndLifecycle class covering the dont_throw decorator
behavior (exception suppression, custom exception logger callbacks),
the suppression key bypass path, _wrap function's sync/async delegation,
and instrumentor uninstrument/reinstrument cycle.
Add TestRemainingCoverageGaps class targeting every uncovered branch:
unconvertible documents (TypeError in list()), falsy fields/actions/
index/indexer/data_source/skillset args, non-string non-object index
names, missing analyzer names, ImportError during _instrument, Exception
during _uninstrument, missing class on module, and dont_throw with no
exception_logger configured.

149 tests, 100% statement and branch coverage across all 5 source files.
…ibutes

Add missing semantic convention attributes for Azure AI Search:
- Synonym map operations: name, synonyms_count
- Service statistics: document_count, index_count
- Extended vector search: query_kind, weight, oversampling
- Search parameters: facets, order_by
…r instrumentation

Extend instrumentation to cover the full Azure AI Search SDK surface:
- SearchIndexClient: synonym map CRUD (6 methods), get_service_statistics,
  list_index_names, get_synonym_map_names
- SearchIndexerClient: get_indexer_names, get_data_source_connection_names,
  get_skillset_names
- SearchIndexingBufferedSender: upload, merge, delete, merge_or_upload,
  index_documents, flush
- Async variants for all new methods
- Attribute extraction for synonym map name/count, service document/index
  counts, and response dispatch for new operations
…ONTENT

Add should_send_content() utility gated by TRACELOOP_TRACE_CONTENT env var
(default: enabled). Supports per-request override via OpenTelemetry context
API with override_enable_content_tracing key. Truthy values: true, 1, yes, on.
… and service stats

Add 1174 lines of new unit tests covering:
- TestShouldSendContent: env var toggle, truthy/falsy values, context override
- TestContentCapture: get_document, autocomplete, suggest, upload/merge/delete
  documents, vector embeddings, index_documents, content disabled/re-enabled
- TestSynonymMap: create, get, list, update, delete with list-based synonyms
- TestServiceStatistics: document_count, index_count response attributes
- TestBufferedSender: method registration and db.system attribute
- TestNameOnlyListingMethods: list_index_names, get_indexer/datasource/skillset
  names, dispatch handling
…lient, and content capture

Extend integration test suite with 12 new tests and VCR cassettes:
- TestSynonymMapIntegration (6 tests): create, get, list, update, delete,
  list names — with proper setup/teardown and list-based synonyms constructor
- TestSearchIndexerClientIntegration (3 tests): get_indexer_names,
  get_data_source_connection_names, get_skillset_names — each creates
  resources before listing to ensure non-empty responses
- TestSearchIndexClientIntegration (2 tests): get_service_statistics,
  list_index_names
- TestSearchClientIntegration (1 test): content_disabled_no_content_attributes
- 12 new VCR cassettes recorded against live Azure AI Search service
- Fix version.py to match pyproject.toml (0.49.6 -> 0.51.1)
- Add PyPI badge to README matching other instrumentation packages
- Remove stale traceloop-sdk[azure-search] extras reference since
  azure-search is a required dependency, not an optional extra
@CLAassistant
Copy link

CLAassistant commented Feb 8, 2026

CLA assistant check
All committers have signed the CLA.

@ellipsis-dev
Copy link
Contributor

ellipsis-dev bot commented Feb 8, 2026

⚠️ This PR is too big for Ellipsis, but support for larger PRs is coming soon. If you want us to prioritize this feature, let us know at help@ellipsis.dev


Generated with ❤️ by ellipsis.dev

@coderabbitai
Copy link

coderabbitai bot commented Feb 8, 2026

📝 Walkthrough

Walkthrough

Adds a new OpenTelemetry Azure AI Search instrumentation package with instrumentor, wrappers, utils, semantic conventions, extensive VCR-backed integration tests and fixtures, packaging/tooling and sample-app/SDK wiring, developer docs, and runtime warning utilities; includes content-capture gating and per-request overrides.

Changes

Cohort / File(s) Summary
Core Instrumentation
packages/opentelemetry-instrumentation-azure-search/opentelemetry/instrumentation/azure_search/__init__.py, packages/opentelemetry-instrumentation-azure-search/opentelemetry/instrumentation/azure_search/wrapper.py
Adds AzureSearchInstrumentor, installs sync/async wrappers for many Azure Search client methods, starts CLIENT spans, extracts rich request/response attributes, handles errors and suppression, and gates content capture.
Config & Utils
packages/opentelemetry-instrumentation-azure-search/opentelemetry/instrumentation/azure_search/config.py, .../utils.py, .../version.py
Introduces Config.exception_logger, content-gating helpers (should_send_content, max_content_items, max_content_length), _is_truthy, dont_throw decorator, and package __version__.
Semantic Conventions
packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
Adds numerous azure_search.* span attribute constants for search params, document/indexer/synonym-map metadata, vector/semantic search fields, and service statistics.
Docs & Guides
packages/opentelemetry-instrumentation-azure-search/README.md, packages/opentelemetry-instrumentation-azure-search/docs/SPAN_ATTRIBUTES_GUIDE.md
New README and developer guide documenting instrumentation usage, per-operation attributes, content capture behavior, per-request overrides, and extension/testing guidance.
Tests & Fixtures
packages/opentelemetry-instrumentation-azure-search/tests/test_azure_search_integration.py, packages/opentelemetry-instrumentation-azure-search/tests/conftest.py
Adds a large VCR-backed integration test suite and pytest fixtures (env defaults, VCR config, in-memory exporter, exporter clearing) exercising many client operations and asserting spans/attributes.
VCR Cassettes
packages/opentelemetry-instrumentation-azure-search/tests/cassettes/**
Many new cassette YAML fixtures covering search, index management, suggest/autocomplete, analyzer, synonym maps, indexers, service stats, and CRUD flows to support deterministic integration tests.
Packaging & Tooling
packages/opentelemetry-instrumentation-azure-search/pyproject.toml, .../project.json, .../.flake8, .../.python-version
Adds package pyproject (metadata, deps, entry point), Nx project.json targets, .flake8 config, and .python-version (3.10) for build/test/lint tooling.
SDK Integration & Warnings
packages/traceloop-sdk/pyproject.toml, packages/traceloop-sdk/traceloop/sdk/instruments.py, packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py, packages/traceloop-sdk/traceloop/sdk/utils/instrumentation_warnings.py
Wires AZURE_SEARCH into SDK: adds enum member, init_azure_search_instrumentor(), library-detection and missing-instrumentation warning logic, and centralized deduplicated warnings with INSTRUMENT_TO_EXTRA and suppression env var.
Sample App
packages/sample-app/.env.example, packages/sample-app/pyproject.toml
Adds AZURE_SEARCH env vars and runtime dependency azure-search-documents; registers local instrumentation source.
Misc & Docs
packages/traceloop-sdk/README.md
Updates README installation/usage to include Azure Search and available extras.

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant Instr as AzureSearchInstrumentor
    participant Wrapper as Instrumentation Wrapper
    participant Tracer as OpenTelemetry Tracer
    participant Client as Azure Search Client
    participant Exporter as Span Exporter

    App->>Instr: instrument()
    Instr->>Wrapper: install wrappers (WRAPPED_METHODS)
    App->>Client: call instrumented_method(...)
    Client->>Wrapper: wrapped_method_call
    Wrapper->>Tracer: start_as_current_span("azure_search.*")
    Wrapper->>Wrapper: set request attributes
    Wrapper->>Client: invoke original method
    Client-->>Wrapper: response / exception
    Wrapper->>Wrapper: set response attributes / status (maybe content)
    Wrapper->>Tracer: end span
    Tracer->>Exporter: export span
    Wrapper-->>App: return result
Loading

Estimated Code Review Effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐰 I hopped through code to trace each doc and call,
Spans sprouted carrots down the telemetry hall,
Indexes, vectors, and attributes in tow,
Toggles for content so only needed bits show,
A rabbit's cheer — hop, tag, and let traces grow!

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(azure-search): Add Azure AI Search instrumentation' clearly and concisely summarizes the main change: adding OpenTelemetry instrumentation for Azure AI Search SDK.
Linked Issues check ✅ Passed The PR fully addresses all coding requirements from linked issue #2303: provides an autoinstrumentor for Azure AI Search SDK, instruments 52+ synchronous and async methods, captures search/indexing/management operations, supports telemetry for RAG workflows, and implements defensive attribute extraction.
Out of Scope Changes check ✅ Passed All changes directly support the main objective: core instrumentation (azure_search package), semantic conventions, integration with traceloop-sdk, sample app integration, and comprehensive tests with VCR cassettes. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
packages/opentelemetry-instrumentation-azure-search/opentelemetry/instrumentation/azure_search/utils.py (1)

63-91: Missing functools.wraps on the inner wrappers.

Without @functools.wraps(func), the decorated functions lose their __name__, __doc__, and __module__ attributes. This affects debuggability (e.g., stack traces, logging) and any downstream code that inspects function metadata.

Proposed fix
+import functools
 import asyncio
 import logging
 import os
 import traceback
 ...
     async def async_wrapper(*args, **kwargs):
+    `@functools.wraps`(func)
+    async def async_wrapper(*args, **kwargs):
         try:
             return await func(*args, **kwargs)
         except Exception as e:
             _handle_exception(e, func, logger)

-    def sync_wrapper(*args, **kwargs):
+    `@functools.wraps`(func)
+    def sync_wrapper(*args, **kwargs):
         try:
             return func(*args, **kwargs)
         except Exception as e:
             _handle_exception(e, func, logger)
packages/opentelemetry-instrumentation-azure-search/opentelemetry/instrumentation/azure_search/wrapper.py (1)

272-330: top is read from kwargs twice.

Line 279 sets AZURE_SEARCH_SEARCH_TOP from kwargs.get("top"), and then lines 290-292 read kwargs.get("top") again to set VECTOR_DB_QUERY_TOP_K. Consider reusing a local variable.

Proposed fix
+    top = kwargs.get("top")
     _set_span_attribute(span, SpanAttributes.AZURE_SEARCH_SEARCH_TEXT, search_text)
-
-    _set_span_attribute(span, SpanAttributes.AZURE_SEARCH_SEARCH_TOP, kwargs.get("top"))
+    _set_span_attribute(span, SpanAttributes.AZURE_SEARCH_SEARCH_TOP, top)
     _set_span_attribute(span, SpanAttributes.AZURE_SEARCH_SEARCH_SKIP, kwargs.get("skip"))
     _set_span_attribute(span, SpanAttributes.AZURE_SEARCH_SEARCH_FILTER, kwargs.get("filter"))
     ...
-    # Set top_k for vector DB convention
-    top = kwargs.get("top")
     if top:
         _set_span_attribute(span, SpanAttributes.VECTOR_DB_QUERY_TOP_K, top)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…BUTES_GUIDE

- Replace poetry commands with uv (pytest, ruff check)
- Update routing example to show _set_request_attributes() dispatcher pattern
- Add notes for _set_response_attributes() and content capture dispatchers
- Update document dates
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/traceloop-sdk/README.md (1)

42-52: ⚠️ Potential issue | 🟡 Minor

Quick Start example uses deprecated OpenAI API (openai.ChatCompletion.create).

The openai.ChatCompletion.create class-based API was removed in openai >= 1.0. Since the current openai SDK is well past 1.0, this example will fail for most users.

Suggested update
 ```python
 Traceloop.init(app_name="joke_generation_service")
+from openai import OpenAI
+
+client = OpenAI()

 `@workflow`(name="joke_creation")
 def create_joke():
-    completion = openai.ChatCompletion.create(
-        model="gpt-3.5-turbo",
+    completion = client.chat.completions.create(
+        model="gpt-4o-mini",
         messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}],
     )

-    return completion.choices[0].message.content
+    return completion.choices[0].message.content
</details>

</blockquote></details>

</blockquote></details>
🤖 Fix all issues with AI agents
In
`@packages/opentelemetry-instrumentation-azure-search/docs/SPAN_ATTRIBUTES_GUIDE.md`:
- Around line 623-641: The example in _set_analyze_text_attributes leaves
analyzer_name uninitialized when analyze_request is falsy, causing a NameError;
initialize analyzer_name to None at the start (before the if analyze_request
block) or set it from kwargs first, then overwrite from analyze_request if
present, ensuring subsequent checks and the enum handling use a defined variable
(reference: function _set_analyze_text_attributes and symbol analyzer_name).
- Around line 509-517: Replace all occurrences of "poetry run <command>" with
"uv run <command>" in the documentation snippet (specifically the pytest and
flake8 examples in SPAN_ATTRIBUTES_GUIDE.md around the pytest example and the
later reference at line ~658); update the two example blocks so they use "uv run
pytest tests/test_azure_search_integration.py --record-mode=all" and "uv run
pytest ... --record-mode=none" and change any "poetry run flake8" mention to "uv
run flake8" to comply with the project's package manager convention.

In
`@packages/opentelemetry-instrumentation-azure-search/opentelemetry/instrumentation/azure_search/wrapper.py`:
- Around line 340-353: The helper _set_document_batch_attributes currently
converts non-__len__ document iterables to a list to measure length (setting
SpanAttributes.AZURE_SEARCH_DOCUMENT_COUNT), which can exhaust generators before
the actual call in _sync_wrap/_set_request_attributes; update
_set_document_batch_attributes to only set the document count when the documents
object has a __len__ attribute and otherwise skip counting (do not call list()
or otherwise consume the iterable) so generators/iterators are left intact for
the downstream Azure SDK call.

In `@packages/opentelemetry-instrumentation-azure-search/README.md`:
- Around line 3-5: The README's badge image tag is missing alt text causing an
accessibility lint (MD045); update the <img> element for the PyPI badge (the
<img> tag shown in the diff) to include a descriptive alt attribute (e.g.,
alt="PyPI package: opentelemetry-instrumentation-azure-search") so screen
readers can convey the badge purpose.

In
`@packages/opentelemetry-instrumentation-azure-search/tests/test_azure_search_integration.py`:
- Around line 801-817: The TestSearchIndexerClientIntegration class uses
INTEGRATION_TEST_INDEX (e.g., in calls with
target_index_name=INTEGRATION_TEST_INDEX) but lacks the setup_test_index fixture
that ensures the index exists; add a fixture named setup_test_index (matching
the other test classes' implementation) or an autouse/class-scoped fixture that
creates or validates the index before tests run so SearchIndexerClient
operations don't fail when this class runs in isolation.

In `@packages/traceloop-sdk/pyproject.toml`:
- Line 31: Add a missing `[tool.uv.sources]` entry for the package name
"opentelemetry-instrumentation-azure-search": locate the `[tool.uv.sources]`
section and insert an entry mapping that package name to its local/editable
source directory (the same pattern used for the other instrumentation deps),
place it in alphabetical order with the other entries, and ensure the key
exactly matches "opentelemetry-instrumentation-azure-search" so local editable
resolution works during development.

In `@packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py`:
- Around line 600-605: The VERTEXAI and VOYAGEAI branches still use the old
pattern; change them to assign the init result to the same `result` variable
(e.g., `result = init_vertexai_instrumentor(should_enrich_metrics,
base64_image_uploader)` and `result = init_voyageai_instrumentor()`), then set
`instrument_set = True` only when `result` is truthy so the existing post-loop
warning logic that checks `result` (used elsewhere) will run consistently for
`init_vertexai_instrumentor` and `init_voyageai_instrumentor`.
🧹 Nitpick comments (14)
packages/opentelemetry-instrumentation-azure-search/pyproject.toml (1)

30-46: Consider removing autopep8 from dev dependencies since Ruff is already configured.

Having both autopep8 and ruff for formatting/linting is redundant. Other instrumentation packages in this repo typically rely solely on Ruff. As per coding guidelines, Ruff is the designated linter.

Also, pytest-asyncio may need an asyncio_mode configuration (e.g., auto or strict) in a [tool.pytest.ini_options] section to avoid warnings or unexpected behavior with async tests.

Proposed changes
 [dependency-groups]
 dev = [
-  "autopep8>=2.2.0,<3",
   "pytest-sugar==1.0.0",
   "pytest>=8.2.2,<9",
   "ruff>=0.4.0",
 ]

And add:

[tool.pytest.ini_options]
asyncio_mode = "auto"
packages/opentelemetry-instrumentation-azure-search/opentelemetry/instrumentation/azure_search/utils.py (1)

23-45: Missing @functools.wraps(func) on the inner wrapper.

Without functools.wraps, the decorated functions lose their __name__, __doc__, and __module__ attributes, which degrades debuggability and introspection. The func.__name__ reference in the logger still works (it captures from the closure), but any external code inspecting the decorated function will see "wrapper" instead of the original name.

Proposed fix
+import functools
 import logging
 import os
 import traceback
 def dont_throw(func):
     """..."""
     logger = logging.getLogger(func.__module__)
 
+    `@functools.wraps`(func)
     def wrapper(*args, **kwargs):
         try:
             return func(*args, **kwargs)
         except Exception as e:
             logger.debug(
                 "OpenLLMetry failed to trace in %s, error: %s",
                 func.__name__,
                 traceback.format_exc(),
             )
             if Config.exception_logger:
                 Config.exception_logger(e)
 
     return wrapper
packages/traceloop-sdk/traceloop/sdk/utils/instrumentation_warnings.py (1)

46-73: The function only acts when target_library_installed=True, making the parameter a bit misleading.

The function name warn_missing_instrumentation and its target_library_installed parameter might confuse callers — it only logs when the library is installed (suggesting the user should add the instrumentation extra). The if target_library_installed: guard on line 68 is the only code path that does anything. Consider whether the parameter should be checked by the caller instead, or rename for clarity.

This is minor — the current usage in tracing.py passes target_library_installed=True explicitly, which works correctly.

packages/opentelemetry-instrumentation-azure-search/opentelemetry/instrumentation/azure_search/wrapper.py (2)

36-103: Dispatch functions _set_request_attributes and _set_response_attributes are not guarded with @dont_throw.

While the individual helper functions they call are protected, the dispatch functions themselves could theoretically throw (e.g., if method comparison or delegation fails unexpectedly). More importantly, an exception in these dispatchers would propagate up to _sync_wrap/_async_wrap, set the span to ERROR status, and re-raise — effectively breaking the user's call because of instrumentation. Adding @dont_throw would make the instrumentation fully transparent.

Proposed fix
+@dont_throw
 def _set_request_attributes(span, method, instance, args, kwargs):
     """Set all pre-call span attributes based on the method being called."""
+@dont_throw
 def _set_response_attributes(span, method, response, args, kwargs):
     """Set all post-call span attributes from the response."""

Also applies to: 105-131


722-753: Content attributes for documents are unbounded — large batches could create oversized spans.

Both _set_document_batch_request_content_attributes (line 729) and _set_index_documents_request_content_attributes (line 748) iterate over all documents/actions, setting a span attribute per item. Similarly, the response content functions (lines 803, 830) iterate all results. For large batches (e.g., thousands of documents), this could produce spans with thousands of attributes, causing performance issues or exceeding backend limits.

Consider adding a reasonable cap (e.g., first 100 items) consistent with how other instrumentation packages handle content capture.

packages/opentelemetry-instrumentation-azure-search/opentelemetry/instrumentation/azure_search/__init__.py (2)

724-727: Use lazy %-style formatting in logger.debug.

f-strings are eagerly evaluated even when the log level is above DEBUG. Use %-formatting for log calls to defer string construction.

Proposed fix
-                logger.debug(f"Could not wrap {module}.{wrap_object}.{wrap_method}")
+                logger.debug(
+                    "Could not wrap %s.%s.%s", module, wrap_object, wrap_method
+                )

20-693: Consider generating sync/async method lists programmatically to reduce ~600 lines of duplication.

Each async list is an exact mirror of its sync counterpart with .aio appended to the module path. A helper function could derive one from the other, eliminating the risk of the two drifting out of sync.

Example approach
def _make_async_methods(sync_methods):
    return [
        {**m, "module": m["module"] + ".aio"} if ".aio" not in m["module"] else m
        for m in sync_methods
    ]

# Replace the explicit async lists:
ASYNC_SEARCH_CLIENT_METHODS = _make_async_methods(SEARCH_CLIENT_METHODS)
ASYNC_SEARCH_INDEX_CLIENT_METHODS = _make_async_methods(SEARCH_INDEX_CLIENT_METHODS)
ASYNC_SEARCH_INDEXER_CLIENT_METHODS = _make_async_methods(SEARCH_INDEXER_CLIENT_METHODS)
ASYNC_BUFFERED_SENDER_METHODS = _make_async_methods(BUFFERED_SENDER_METHODS)
packages/opentelemetry-instrumentation-azure-search/tests/cassettes/test_azure_search_integration/TestSearchIndexClientIntegration.test_get_service_statistics.yaml (1)

1-80: Two identical interactions recorded — verify this is intentional.

The cassette contains two identical GET requests to the servicestats endpoint. If the test only calls get_service_statistics once, the second interaction is unused and could be trimmed. If the test calls it twice (e.g., to verify idempotency or span creation), this is fine.

packages/opentelemetry-instrumentation-azure-search/docs/SPAN_ATTRIBUTES_GUIDE.md (1)

21-21: Add a language specifier to the fenced code block.

Flagged by markdownlint (MD040). Since this block represents a directory tree, use a text or plaintext language identifier.

-```
+```text
 packages/opentelemetry-instrumentation-azure-search/
packages/opentelemetry-instrumentation-azure-search/tests/conftest.py (2)

15-16: Hardcoded Azure Search endpoint leaks infrastructure naming.

The fallback "https://traceloop-otel-os.search.windows.net" exposes the actual Azure resource name used for recording. Consider using a more obviously-fake placeholder (e.g., https://placeholder.search.windows.net) and adjusting VCR cassette host matching accordingly, or document that this must match the cassette recordings.


52-54: Redundant exporter clearing with per-class fixtures in integration tests.

This clear_exporter fixture is autouse=True at function scope, so it already runs before every test. Each test class in test_azure_search_integration.py also defines its own clear_exporter_before_test fixture that does the same thing. The duplication is harmless but unnecessary — consider removing the per-class fixtures from the integration test file, or vice versa.

packages/opentelemetry-instrumentation-azure-search/tests/test_azure_search_integration.py (3)

45-78: Duplicated setup_test_index fixture across test classes.

TestSearchClientIntegration.setup_test_index (lines 45–78) and TestSearchIndexClientIntegration.setup_test_index (lines 386–419) are identical. Consider extracting this into a shared fixture in conftest.py (scoped to session or module) to reduce duplication and ensure the index is set up once for all classes.

Also applies to: 386-419


148-149: Move import time to module level.

import time appears inside three different test methods. While this works, placing it at the module level with the other imports is cleaner.

Also applies to: 288-289, 325-326


825-833: Duplicated placeholder connection string.

The same Azure Storage placeholder connection string appears twice (lines 828–831 and 876–879). Extract it to a module-level constant for maintainability.

Proposed fix

Add near the top of the file:

PLACEHOLDER_STORAGE_CONNECTION_STRING = (
    "DefaultEndpointsProtocol=https;AccountName=placeholder;"
    "AccountKey=placeholder;EndpointSuffix=core.windows.net"
)

Then reference it in the fixtures:

             connection_string=os.environ.get(
                 "AZURE_STORAGE_CONNECTION_STRING",
-                "DefaultEndpointsProtocol=https;AccountName=placeholder;AccountKey=placeholder;EndpointSuffix=core.windows.net",
+                PLACEHOLDER_STORAGE_CONNECTION_STRING,
             ),

Also applies to: 873-881

Comment on lines 600 to 605
elif instrument == Instruments.VERTEXAI:
if init_vertexai_instrumentor(should_enrich_metrics, base64_image_uploader):
instrument_set = True
elif instrument == Instruments.VOYAGEAI:
if init_voyageai_instrumentor():
instrument_set = True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

VERTEXAI and VOYAGEAI branches not converted to the new result pattern.

These two branches still use the old if init_*(): instrument_set = True pattern instead of result = init_*(), which means they bypass the post-loop warning logic on lines 622-626. If the instrumentor fails but the target library is installed, no missing-instrumentation warning will be emitted for these two.

Proposed fix
         elif instrument == Instruments.VERTEXAI:
-            if init_vertexai_instrumentor(should_enrich_metrics, base64_image_uploader):
-                instrument_set = True
+            result = init_vertexai_instrumentor(should_enrich_metrics, base64_image_uploader)
         elif instrument == Instruments.VOYAGEAI:
-            if init_voyageai_instrumentor():
-                instrument_set = True
+            result = init_voyageai_instrumentor()
🤖 Prompt for AI Agents
In `@packages/traceloop-sdk/traceloop/sdk/tracing/tracing.py` around lines 600 -
605, The VERTEXAI and VOYAGEAI branches still use the old pattern; change them
to assign the init result to the same `result` variable (e.g., `result =
init_vertexai_instrumentor(should_enrich_metrics, base64_image_uploader)` and
`result = init_voyageai_instrumentor()`), then set `instrument_set = True` only
when `result` is truthy so the existing post-loop warning logic that checks
`result` (used elsewhere) will run consistently for `init_vertexai_instrumentor`
and `init_voyageai_instrumentor`.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@packages/opentelemetry-instrumentation-azure-search/docs/SPAN_ATTRIBUTES_GUIDE.md`:
- Around line 21-34: The fenced code block that lists the package tree (starting
with "packages/opentelemetry-instrumentation-azure-search/") is missing a
language identifier; update the Markdown fenced block to include a language
(e.g., add "text" after the opening ```), so the block becomes ```text ... ```
to satisfy markdownlint and improve rendering.

…atch size

Previously, _set_document_batch_attributes() called list() on documents
without __len__, which would exhaust generators/iterators before the
actual Azure SDK call received them. Now we simply skip the count for
non-sized iterables, preserving the user's data intact.
…tegration tests

Ensures the target index exists before indexer tests run, matching the
pattern used by SearchClient and SearchIndexClient test classes.
- Add alt text to PyPI badge image (MD045)
- Add language identifier to fenced code block (MD040)
- Initialize analyzer_name before conditional in example to prevent NameError
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In
`@packages/opentelemetry-instrumentation-azure-search/opentelemetry/instrumentation/azure_search/wrapper.py`:
- Around line 506-510: The synonyms count logic incorrectly treats an empty
string as one item; in the block that sets synonyms_count (the branch handling
"if synonyms:" and the elif for isinstance(synonyms, str)), change the string
path to first compute s = synonyms.strip(), then if not s set synonyms_count = 0
else set synonyms_count = len(s.split("\n")). Update the code that assigns
synonyms_count for string inputs accordingly (refer to the synonyms variable and
synonyms_count assignment in wrapper.py) so empty/whitespace-only synonym
strings yield a count of 0.
- Around line 449-468: The code in _set_data_source_attributes assigns
data_source_type from getattr(data_source, "type", None) but may receive a
SearchIndexerDataSourceType enum; convert it to a primitive string before
passing to _set_span_attribute. Update _set_data_source_attributes so after
obtaining data_source_type it normalizes enums (e.g., use data_source_type.value
if present, otherwise str(data_source_type) or None) and then call
_set_span_attribute(span, SpanAttributes.AZURE_SEARCH_DATA_SOURCE_TYPE,
data_source_type) with that normalized string.

In
`@packages/opentelemetry-instrumentation-azure-search/tests/test_azure_search_integration.py`:
- Around line 351-372: The test test_content_disabled_no_content_attributes
currently calls search_client.get_document_count(), which never emits content
attributes, so replace that call with an operation that normally produces
content attributes (for example call search_client.search(...) or
search_client.get_document(...) or search_client.upload_documents(...)) while
keeping the monkeypatch of TRACELOOP_TRACE_CONTENT="false"; this ensures the
test exercises the content suppression gate and then assert on spans as before.
Use the existing symbols test_content_disabled_no_content_attributes,
TRACELOOP_TRACE_CONTENT, and one of search_client.search,
search_client.get_document, or search_client.upload_documents to implement the
change.
- Around line 858-873: The setup currently calls create_data_source_connection
and then create_indexer (creating ds_connection and indexer) but if
create_indexer fails the data source is never removed; wrap the resource setup
and test assertions in a guaranteed teardown block: after creating ds_connection
and indexer (or attempting to), ensure cleanup of the
SearchIndexerDataSourceConnection and SearchIndexer happens in a finally (or by
registering addCleanup/addfinalizer) so deletion of ds_connection and indexer
always runs regardless of failures in create_indexer or assertions.
🧹 Nitpick comments (3)
packages/opentelemetry-instrumentation-azure-search/opentelemetry/instrumentation/azure_search/wrapper.py (2)

714-726: Unbounded per-document span attributes may hit OTel limits or degrade performance.

The content-capture loops (here and in _set_index_documents_request_content_attributes, _set_document_batch_response_content_attributes, _set_index_documents_response_content_attributes, _set_autocomplete_content_attributes, _set_suggest_content_attributes) iterate over the entire collection without a cap. The default OTel SDK SpanLimits.max_number_of_attributes is 128 — attributes beyond that are silently dropped. For large batches (thousands of documents), this generates many JSON strings that are never recorded, wasting CPU and memory on the hot path.

Consider adding a configurable cap (e.g., first 50 items) and recording the total count separately so users know data was truncated.

Illustrative cap pattern
+MAX_CONTENT_ITEMS = 50  # configurable cap for per-item span attributes
+
 `@dont_throw`
 def _set_document_batch_request_content_attributes(span, args, kwargs):
     """Set indexed db.query.result.N.document attributes for each input document."""
     documents = kwargs.get("documents") or (args[0] if args else None)
     if not documents:
         return
 
-    for i, doc in enumerate(documents):
+    for i, doc in enumerate(documents):
+        if i >= MAX_CONTENT_ITEMS:
+            break
         _set_span_attribute(
             span,
             f"{EventAttributes.DB_QUERY_RESULT_DOCUMENT.value}.{i}",
             _safe_json_dumps(doc),
         )

789-839: Duplicated response content extraction logic.

_set_document_batch_response_content_attributes and _set_index_documents_response_content_attributes share identical loop bodies (extracting key, succeeded, status_code, error_message and setting indexed attributes). The same applies to their non-content counterparts at Lines 547–590. A shared helper accepting the results list would reduce ~40 lines of duplication.

packages/opentelemetry-instrumentation-azure-search/tests/test_azure_search_integration.py (1)

37-101: Consider extracting duplicated fixtures into conftest.py or a base class.

index_client_setup, setup_test_index, clear_exporter_before_test, and index_client are duplicated nearly verbatim across all four test classes. Moving them to session- or module-scoped fixtures in conftest.py would reduce ~120 lines of boilerplate and make it easier to keep the setup logic consistent.

…ring

- Convert SearchIndexerDataSourceType enum to string before setting
  span attribute, matching the pattern used for query_type and
  vector_filter_mode enums
- Fix edge case where empty synonyms string incorrectly reports count
  of 1 instead of 0
…and content gate

- Use get_document() instead of get_document_count() in content-disabled
  test so it actually exercises the content suppression gate
- Wrap indexer client test assertions in try/finally to guarantee
  resource cleanup even when assertions fail
- Update VCR cassette for content-disabled test
…ead of IndexDocumentsResult

The SDK's index_documents() returns a plain list of IndexingResult, not
an IndexDocumentsResult object. The response handler was checking
response.results which returned None, silently skipping succeeded/failed
count attributes.
…ertions

- Add helper functions (_get_only_span, _assert_base_span, _span_attrs)
  to eliminate duplication and enforce consistent assertions
- Assert StatusCode.OK on every span via _assert_base_span
- Verify response attribute values (succeeded_count, failed_count,
  results_count) against actual VCR cassette data
- Parse and verify content capture attribute values via json.loads()
  instead of existence-only checks
- Remove redundant per-class clear_exporter fixtures (conftest handles it)
- Remove unnecessary time.sleep() calls (no-op during VCR playback)
- Remove try/except around SDK calls (VCR playback is deterministic)
- Use try/finally for guaranteed cleanup in create/delete test pairs
- Add factory functions for client construction
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In
`@packages/opentelemetry-instrumentation-azure-search/opentelemetry/instrumentation/azure_search/wrapper.py`:
- Around line 412-428: The span attribute helpers
(_set_indexer_management_attributes, _set_skillset_attributes, and
_set_data_source_attributes) currently pass through the raw first-arg for
non-create operations which may be an object; update their non-create branch to
mirror the synonym-map/name resolution logic used elsewhere (i.e., if the
resolved value is not a string, attempt to extract a .name attribute from the
object) so that when callers pass an indexer/skillset/data_source object you set
the AZURE_SEARCH_*_NAME attribute to the object's name rather than the object
itself.
- Around line 730-735: The loop that sets one span attribute per document (see
the enumerate(documents) usage in the response helpers) must be limited to avoid
unbounded attributes; define a constant max (e.g., MAX_SPAN_ATTR_ITEMS = 100)
and only iterate up to that cap when setting attributes in functions like
_set_document_batch_request_content_attributes,
_set_index_documents_request_content_attributes,
_set_autocomplete_content_attributes, _set_suggest_content_attributes, and the
response helpers (the enumerate(documents) loop shown); if documents were
truncated, set an additional span attribute indicating truncation (e.g.,
"<EventAttributes>.<type>.truncated_count") and emit a debug/warning log when
truncation occurs so operators can see when batches exceeded the cap.
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-azure-search/opentelemetry/instrumentation/azure_search/wrapper.py (1)

575-599: _set_index_documents_response_attributes duplicates _set_document_batch_response_attributes.

The succeeded/failed counting logic (Lines 575-599) is nearly identical to Lines 553-572. The only difference is the initial response unwrapping (list vs object with .results). Same applies to the content variants at Lines 798-852 vs 824-852. A small shared helper would reduce this duplication.

Sketch: shared helper
def _count_indexing_results(results):
    """Count succeeded/failed from a list of IndexingResult."""
    succeeded = sum(1 for r in results if getattr(r, "succeeded", False))
    return succeeded, len(results) - succeeded

def _extract_results_list(response):
    """Normalize response to a list of IndexingResult."""
    if isinstance(response, list):
        return response
    results = getattr(response, "results", None)
    return results if isinstance(results, list) else None

…gging validation

Add 14 workflow tests (7 sync + 7 async) that validate traces tell a
debuggable story for 2am production troubleshooting:

- search_pipeline: upload→search trace correlation with content attrs
- document_lifecycle: 5-step CRUD audit trail with mutation tracking
- typeahead_pipeline: autocomplete+suggest results counts and content
- bulk_ingestion_partial_failure: per-document metadata with error details
- index_management_pipeline: 5-span deployment pipeline correlation
- content_privacy_across_pipeline: zero content attrs when disabled
- error_then_retry_success: StatusCode.ERROR + OK in same trace
- Add object-to-name resolution for non-create indexer, data source,
  and skillset operations (mirrors synonym map pattern)
- Add configurable content item cap via TRACELOOP_TRACE_CONTENT_MAX_ITEMS
  env var (default: 100) to avoid exceeding OTel SpanLimits on large batches
- Refactor duplicated indexing result content logic into shared helper
- Fix doc example in SPAN_ATTRIBUTES_GUIDE.md to not consume generators
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In
`@packages/opentelemetry-instrumentation-azure-search/docs/SPAN_ATTRIBUTES_GUIDE.md`:
- Around line 127-138: The example currently consumes generators by calling
list(documents) in the else-branch (variable documents), which can exhaust
iterators; remove that else-block or replace the logic to only count when
__len__ exists. Update the snippet around documents to follow the safe pattern
used later: if documents and hasattr(documents, "__len__") then set count and
call _set_span_attribute(span, ATTR_DOCUMENT_COUNT, count); otherwise skip
counting and do not convert the iterable to a list.
- Around line 432-445: The "Good" example currently consumes iterators by
calling len(list(documents)); change it to "✅ Good - safe type handling that
doesn't consume iterators" and only use len(documents) when hasattr(documents,
'__len__') — otherwise skip counting (set count = None) to avoid exhausting
generators; also update the "❌ Bad" example to show the consuming pattern (e.g.,
converting documents to list and then len) and label it as "❌ Bad - consumes
generators/iterators" so readers don't follow the destructive pattern.
🧹 Nitpick comments (5)
packages/opentelemetry-instrumentation-azure-search/opentelemetry/instrumentation/azure_search/utils.py (1)

42-64: dont_throw should preserve the wrapped function's metadata with functools.wraps.

Without @functools.wraps(func), all functions decorated with @dont_throw (dozens in wrapper.py) lose their __name__, __doc__, and __module__. This complicates debugging and breaks any introspection that relies on function identity.

Proposed fix
 import logging
 import os
 import traceback
+import functools
 
 from opentelemetry import context as context_api
 from opentelemetry.instrumentation.azure_search.config import Config
@@ -49,6 +50,7 @@
     # Obtain a logger specific to the function's module
     logger = logging.getLogger(func.__module__)
 
+    `@functools.wraps`(func)
     def wrapper(*args, **kwargs):
         try:
             return func(*args, **kwargs)
packages/opentelemetry-instrumentation-azure-search/opentelemetry/instrumentation/azure_search/wrapper.py (4)

17-21: _set_span_attribute guard is fine but the trailing return is unnecessary.

Nit: the bare return on line 21 has no effect since the function would return None anyway. Not blocking.


36-103: Dispatcher functions lack @dont_throw protection.

_set_request_attributes and _set_response_attributes are the only attribute-setting functions not wrapped with @dont_throw. While the individual helpers they call are protected, an unexpected exception in the dispatcher itself (e.g., if method were ever None and a future refactor introduced .startswith()) would propagate uncaught into _sync_wrap/_async_wrap and crash the instrumented call.

The risk is low today since the bodies are just string equality checks, but adding @dont_throw here would be consistent with the rest of the module and future-proof.

Proposed fix
+@dont_throw
 def _set_request_attributes(span, method, instance, args, kwargs):
     """Set all pre-call span attributes based on the method being called."""
+@dont_throw
 def _set_response_attributes(span, method, response, args, kwargs):
     """Set all post-call span attributes from the response."""

Also applies to: 105-131


210-264: top is fetched from kwargs twice.

Line 217 sets AZURE_SEARCH_SEARCH_TOP from kwargs.get("top"), then lines 228-230 fetch kwargs.get("top") again to set VECTOR_DB_QUERY_TOP_K. Consider reusing the local variable.

Proposed fix
-    _set_span_attribute(span, SpanAttributes.AZURE_SEARCH_SEARCH_TOP, kwargs.get("top"))
+    top = kwargs.get("top")
+    _set_span_attribute(span, SpanAttributes.AZURE_SEARCH_SEARCH_TOP, top)
     _set_span_attribute(span, SpanAttributes.AZURE_SEARCH_SEARCH_SKIP, kwargs.get("skip"))
     _set_span_attribute(span, SpanAttributes.AZURE_SEARCH_SEARCH_FILTER, kwargs.get("filter"))
 
@@ -225,9 +226,7 @@
         _set_span_attribute(span, SpanAttributes.AZURE_SEARCH_SEARCH_QUERY_TYPE, qt_str)
 
     # Set top_k for vector DB convention
-    top = kwargs.get("top")
     if top:
         _set_span_attribute(span, SpanAttributes.VECTOR_DB_QUERY_TOP_K, top)

713-767: max_content_items() is called on every loop iteration, re-reading the env var each time.

Each of these content-capture loops (lines 721, 740, 761, 787, 806, 818) calls max_content_items() per iteration, which reads os.getenv each time. Consider hoisting the call before the loop.

Example for one function (apply similarly to others)
 `@dont_throw`
 def _set_search_vector_embeddings_attributes(span, kwargs):
     """Set indexed db.search.embeddings.N.vector attributes for vector queries."""
     vector_queries = kwargs.get("vector_queries")
     if not vector_queries:
         return
 
+    cap = max_content_items()
     for i, vq in enumerate(vector_queries):
-        if i >= max_content_items():
+        if i >= cap:
             break

AsyncSearchItemPaged.get_count() is a coroutine function, not a sync
method. Calling it synchronously returned a coroutine object instead of
the count, producing "Invalid type coroutine for attribute" warnings.

Split into sync _set_search_response_attributes (skips coroutine
functions) and async _set_search_response_attributes_async (awaits
get_count). The async variant is called from _async_wrap for search.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In
`@packages/opentelemetry-instrumentation-azure-search/opentelemetry/instrumentation/azure_search/wrapper.py`:
- Line 242: The span attribute for search mode is being set with the enum object
directly; update the call that uses _set_span_attribute with
SpanAttributes.AZURE_SEARCH_SEARCH_MODE to pass the enum's underlying string
(e.g., use kwargs.get("search_mode").value when present, or None if missing)
similar to how query_type, vector_filter_mode, query_caption, and query_answer
are handled so the recorded span contains a primitive string instead of an enum
instance.
- Around line 560-578: The async helper _set_search_response_attributes_async
must keep its manual try/except because the current dont_throw decorator in
utils.py is not async-aware; update dont_throw to detect
asyncio.iscoroutinefunction and return an async wrapper that awaits the inner
call and catches exceptions (mirroring the anthropic implementation) so async
functions like _set_search_response_attributes_async can safely use `@dont_throw`
without losing await/exception handling; ensure the updated dont_throw preserves
the existing synchronous behavior, retains logging of exceptions, and reference
the decorator name dont_throw and the function
_set_search_response_attributes_async when making the change.
🧹 Nitpick comments (2)
packages/opentelemetry-instrumentation-azure-search/opentelemetry/instrumentation/azure_search/wrapper.py (2)

36-38: Consider adding @dont_throw to the dispatch functions for defensive consistency.

_set_request_attributes and _set_response_attributes are the only undecorated functions in the call chain between the user's wrapped call and the span attribute helpers. While their bodies are safe if/elif chains over string comparisons (calling @dont_throw-decorated helpers), decorating them would guard against any future refactoring that introduces riskier logic. This is purely defensive.

Also applies to: 105-108


751-753: max_content_items() is re-evaluated on every loop iteration.

Each call reads and parses an environment variable. Consider caching the result once before each loop:

Example fix (apply to all content loops)
 def _set_search_vector_embeddings_attributes(span, kwargs):
     vector_queries = kwargs.get("vector_queries")
     if not vector_queries:
         return
 
+    limit = max_content_items()
     for i, vq in enumerate(vector_queries):
-        if i >= max_content_items():
+        if i >= limit:
             break

Also applies to: 770-772, 791-793, 817-819, 836-838, 848-850

…ES_GUIDE

Replace all doc examples that consume generators with list() to use the
safe hasattr(__len__) pattern instead, matching the actual production
code behavior.
… enum

- Update dont_throw decorator to detect async functions and return an
  async wrapper that awaits the inner call (mirrors Anthropic pattern)
- Apply @dont_throw to _set_search_response_attributes_async, removing
  the manual try/except
- Convert search_mode enum to string before setting span attribute,
  consistent with query_type, vector_filter_mode, etc.
Avoid calling max_content_items() on every loop iteration by hoisting
the result into a local variable before the loop.
…e consistency

Decorate _set_request_attributes and _set_response_attributes with
@dont_throw so the entire attribute-setting call chain is guarded
against unexpected exceptions.
- Replace method in [...] list checks with module-level frozensets for O(1) dispatch
- Hoist should_send_content() into _sync_wrap/_async_wrap (1 call per span, not 2)
- Cache max_content_items() and max_content_length() once per span, pass through chain
- Add TRACELOOP_TRACE_CONTENT_MAX_LENGTH env var (default 16KB) to cap serialized content
- Merge batch response counting + content iteration into single-pass loop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🚀 Feature: Support for Azure AI Search

2 participants