Skip to content

fix: sanitize gemini openai tool schemas#7703

Closed
bugkeep wants to merge 1 commit intoAstrBotDevs:masterfrom
bugkeep:codex/fix-gemini-openai-tool-schema
Closed

fix: sanitize gemini openai tool schemas#7703
bugkeep wants to merge 1 commit intoAstrBotDevs:masterfrom
bugkeep:codex/fix-gemini-openai-tool-schema

Conversation

@bugkeep
Copy link
Copy Markdown

@bugkeep bugkeep commented Apr 21, 2026

Summary

  • Sanitize OpenAI-compatible tool parameter schemas for Gemini models using the existing Gemini-compatible schema subset.
  • Keep the default OpenAI schema unchanged for non-Gemini providers.
  • Add regression coverage for nested unsupported fields such as examples, default, and additionalProperties.

Fixes #7572

Testing

  • uv run pytest tests/unit/test_tool_google_schema.py tests/unit/test_func_tool_manager.py -q
  • uv run pytest tests/unit -q
  • uv run ruff format .
  • uv run ruff check .

Summary by Sourcery

Sanitize OpenAI-compatible tool parameter schemas when used with Gemini models while preserving existing behavior for other providers.

New Features:

  • Add an option to generate Gemini-compatible OpenAI tool schemas for function tools.

Bug Fixes:

  • Ensure Gemini tool schemas exclude unsupported fields such as examples, default, and additionalProperties, including in nested structures.
  • Restore and clean up temporary module registrations in the Google schema tests to avoid cross-test side effects.

Enhancements:

  • Refactor Gemini schema handling into a shared helper used by both Google GenAI and OpenAI-style tool schemas.

Tests:

  • Extend unit tests to cover Gemini-compatible OpenAI tool schema sanitization and regression cases for nested unsupported fields.

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Apr 21, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="astrbot/core/agent/tool.py" line_range="277" />
<code_context>
+
+        return result
+
+    def openai_schema(
+        self,
+        omit_empty_parameter_field: bool = False,
</code_context>
<issue_to_address>
**issue (complexity):** Consider splitting Gemini-specific behavior into dedicated helpers and an explicit `openai_gemini_schema` method so `openai_schema` keeps a single, clear responsibility and provider concerns stay separated.

You can keep the new functionality but localize the complexity by:

1. Splitting the Gemini behavior out of `openai_schema` into a dedicated method.
2. Making the boolean flag a thin compatibility wrapper instead of the primary contract.
3. Keeping `_google_compatible_schema` provider-focused and adding an OpenAI adapter.

For example:

```python
@staticmethod
def _gemini_function_schema(schema: dict[str, Any]) -> dict[str, Any]:
    """Convert schema to the subset accepted by Gemini function declarations."""
    # (move the current _google_compatible_schema implementation here)
    ...
```

Then make OpenAI’s Gemini-compatible variant explicit:

```python
def openai_schema(
    self,
    omit_empty_parameter_field: bool = False,
    gemini_compatible_schema: bool = False,
) -> list[dict]:
    """
    Convert tools to OpenAI API function calling schema format.

    gemini_compatible_schema is kept for backwards compatibility; prefer
    openai_gemini_schema() for new code.
    """
    if gemini_compatible_schema:
        return self.openai_gemini_schema(omit_empty_parameter_field)

    result = []
    for tool in self.tools:
        func_def = {"type": "function", "function": {"name": tool.name}}
        if tool.description:
            func_def["function"]["description"] = tool.description

        if tool.parameters is not None:
            if (
                tool.parameters and tool.parameters.get("properties")
            ) or not omit_empty_parameter_field:
                func_def["function"]["parameters"] = tool.parameters
        result.append(func_def)
    return result


def openai_gemini_schema(
    self,
    omit_empty_parameter_field: bool = False,
) -> list[dict]:
    """OpenAI schema adapted to Gemini-compatible subset."""
    result = []
    for tool in self.tools:
        func_def = {"type": "function", "function": {"name": tool.name}}
        if tool.description:
            func_def["function"]["description"] = tool.description

        if tool.parameters is not None:
            if (
                tool.parameters and tool.parameters.get("properties")
            ) or not omit_empty_parameter_field:
                func_def["function"]["parameters"] = self._gemini_function_schema(
                    tool.parameters
                )
        result.append(func_def)
    return result
```

`google_schema` then stays clearly provider-specific:

```python
def google_schema(self) -> dict:
    """Convert tools to Google GenAI API format."""
    tools = []
    for tool in self.tools:
        d: dict[str, Any] = {"name": tool.name}
        if tool.description:
            d["description"] = tool.description
        if tool.parameters:
            d["parameters"] = self._gemini_function_schema(tool.parameters)
        tools.append(d)

    declarations: dict[str, Any] = {}
    if tools:
        declarations["function_declarations"] = tools
    return declarations
```

Finally, you can keep the deprecated method’s surface area minimal by delegating to the primary OpenAI method without exposing the new mode in its signature:

```python
@deprecated(reason="Use openai_schema() instead", version="4.0.0")
def get_func_desc_openai_style(
    self,
    omit_empty_parameter_field: bool = False,
):
    # If you really need Gemini semantics here, call openai_gemini_schema()
    return self.openai_schema(omit_empty_parameter_field)
```

This preserves all current behavior but:

- `openai_schema` regains a single clear responsibility.
- Gemini-specific behavior is discoverable via `openai_gemini_schema()` and `_gemini_function_schema`.
- Provider-specific schema logic is not conflated under a “google-compatible” helper reused from OpenAI.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.


return result

def openai_schema(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider splitting Gemini-specific behavior into dedicated helpers and an explicit openai_gemini_schema method so openai_schema keeps a single, clear responsibility and provider concerns stay separated.

You can keep the new functionality but localize the complexity by:

  1. Splitting the Gemini behavior out of openai_schema into a dedicated method.
  2. Making the boolean flag a thin compatibility wrapper instead of the primary contract.
  3. Keeping _google_compatible_schema provider-focused and adding an OpenAI adapter.

For example:

@staticmethod
def _gemini_function_schema(schema: dict[str, Any]) -> dict[str, Any]:
    """Convert schema to the subset accepted by Gemini function declarations."""
    # (move the current _google_compatible_schema implementation here)
    ...

Then make OpenAI’s Gemini-compatible variant explicit:

def openai_schema(
    self,
    omit_empty_parameter_field: bool = False,
    gemini_compatible_schema: bool = False,
) -> list[dict]:
    """
    Convert tools to OpenAI API function calling schema format.

    gemini_compatible_schema is kept for backwards compatibility; prefer
    openai_gemini_schema() for new code.
    """
    if gemini_compatible_schema:
        return self.openai_gemini_schema(omit_empty_parameter_field)

    result = []
    for tool in self.tools:
        func_def = {"type": "function", "function": {"name": tool.name}}
        if tool.description:
            func_def["function"]["description"] = tool.description

        if tool.parameters is not None:
            if (
                tool.parameters and tool.parameters.get("properties")
            ) or not omit_empty_parameter_field:
                func_def["function"]["parameters"] = tool.parameters
        result.append(func_def)
    return result


def openai_gemini_schema(
    self,
    omit_empty_parameter_field: bool = False,
) -> list[dict]:
    """OpenAI schema adapted to Gemini-compatible subset."""
    result = []
    for tool in self.tools:
        func_def = {"type": "function", "function": {"name": tool.name}}
        if tool.description:
            func_def["function"]["description"] = tool.description

        if tool.parameters is not None:
            if (
                tool.parameters and tool.parameters.get("properties")
            ) or not omit_empty_parameter_field:
                func_def["function"]["parameters"] = self._gemini_function_schema(
                    tool.parameters
                )
        result.append(func_def)
    return result

google_schema then stays clearly provider-specific:

def google_schema(self) -> dict:
    """Convert tools to Google GenAI API format."""
    tools = []
    for tool in self.tools:
        d: dict[str, Any] = {"name": tool.name}
        if tool.description:
            d["description"] = tool.description
        if tool.parameters:
            d["parameters"] = self._gemini_function_schema(tool.parameters)
        tools.append(d)

    declarations: dict[str, Any] = {}
    if tools:
        declarations["function_declarations"] = tools
    return declarations

Finally, you can keep the deprecated method’s surface area minimal by delegating to the primary OpenAI method without exposing the new mode in its signature:

@deprecated(reason="Use openai_schema() instead", version="4.0.0")
def get_func_desc_openai_style(
    self,
    omit_empty_parameter_field: bool = False,
):
    # If you really need Gemini semantics here, call openai_gemini_schema()
    return self.openai_schema(omit_empty_parameter_field)

This preserves all current behavior but:

  • openai_schema regains a single clear responsibility.
  • Gemini-specific behavior is discoverable via openai_gemini_schema() and _gemini_function_schema.
  • Provider-specific schema logic is not conflated under a “google-compatible” helper reused from OpenAI.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors Gemini-compatible schema generation into a static method and adds a gemini_compatible_schema flag to openai_schema to sanitize tool parameters for Gemini models. The openai_source provider is updated to apply this sanitization for Gemini models, and unit tests are added to verify the behavior. Feedback was provided to explicitly set the nullable property when converting type lists that include "null" to ensure nullability is preserved in the Gemini API.

Comment on lines +232 to +234
# Gemini API expects 'type' to be a string, while JSON Schema allows lists.
if isinstance(origin_type, list):
target_type = next((t for t in origin_type if t != "null"), "string")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When converting a list of types (e.g., ["string", "null"]) to a single type for Gemini, the information about nullability is lost unless the nullable keyword is also present in the original schema. While Gemini supports the nullable field, it might be safer to explicitly set result["nullable"] = True if "null" was present in the origin_type list, ensuring the model knows the field can be null.

Suggested change
# Gemini API expects 'type' to be a string, while JSON Schema allows lists.
if isinstance(origin_type, list):
target_type = next((t for t in origin_type if t != "null"), "string")
# Gemini API expects 'type' to be a string, while JSON Schema allows lists.
if isinstance(origin_type, list):
if "null" in origin_type:
result["nullable"] = True
target_type = next((t for t in origin_type if t != "null"), "string")

@Soulter
Copy link
Copy Markdown
Member

Soulter commented Apr 21, 2026

I think it's better to delete example field instead of using field whitelist

@bugkeep bugkeep closed this Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]Gemini 模型工具声明阶段报 400 错误:Unknown name "examples" at function_declarations

2 participants