Skip to content

Add multimodal support for Gemini tool calls (Blob & Arrays)#57

Open
aubrypaul wants to merge 6 commits intomainfrom
gemini-multimodal-tool-outputs
Open

Add multimodal support for Gemini tool calls (Blob & Arrays)#57
aubrypaul wants to merge 6 commits intomainfrom
gemini-multimodal-tool-outputs

Conversation

@aubrypaul
Copy link
Contributor

No description provided.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Multimodal function-call support: tool responses can include file/blob parts alongside JSON payloads.
  • Improvements

    • Unified handling of strings, objects, blobs, and arrays across providers; message framing now includes multimodal parts for clearer function-call results.
    • Added more verbose logging and formatting consistency.
  • Tests

    • New tests and utilities to validate functions returning single files and file arrays (PDF/file scenarios).
  • Documentation

    • Notice about limitations when functions return files and filename preservation.

Walkthrough

Added multimodal function-calling: functions returning blobs or blob arrays are converted into Gemini inline parts and emitted as parts in tool responses; non-blob results remain JSON. Introduced helpers for Gemini inline parts, updated handlers for Gemini and OpenAI function-calling, and added tests that generate PDF/blob outputs.

Changes

Cohort / File(s) Summary
Core handler + helpers
src/code.gs
Added createGeminiInlinePart and blobToGeminiInlinePart; updated addFile and core logic to produce Gemini inline parts from blobs; added logging and minor formatting tweaks.
Gemini function-calling flow
src/code.gs
Refactored _handleGeminiToolCalls to convert blob/blob-array responses into Gemini inline parts and emit tool-role responses containing parts arrays or jsonResponse for JSON payloads.
OpenAI function-calling flow
src/code.gs
Refactored _handleOpenAIToolCalls to detect blob-like values and arrays, convert them into blob-specific content formats, and stringify non-blob objects; function_call_output now carries either JSON payload or multimodal parts.
Tests & PDF/blob utilities
src/testFunctions.gs
Added tests testFunctionCallingReturnBlob, testFunctionCallingReturnBlobArray; added _escapeHtml, _createSimplePdf, generateReceipt, generateWelcomePack; updated model constants to gpt-5.2 and gemini-3-pro-preview.
Docs
README.md
Inserted a limitation note about function-calling file returns: model may not preserve original filenames and recommend deterministic script-side filenames.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant Handler as ToolCallHandler
    participant Exec as FunctionExecutor
    participant Conv as BlobConverter
    participant API as APIClient

    Client->>Handler: invoke function call (Gemini/OpenAI)
    Handler->>Exec: execute target function
    Exec-->>Handler: returns (string | object | blob | [blobs])

    alt blob or [blobs]
        Handler->>Conv: convert blob(s) to Gemini inline parts
        Conv-->>Handler: return parts array
        Handler->>API: send tool-role response with `parts` array
    else JSON/object or string
        Handler->>API: send jsonResponse payload
    end

    API-->>Client: deliver API response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive No pull request description was provided by the author, making it impossible to assess relevance to the changeset. Add a description explaining the changes, such as why multimodal support is needed and how the implementation handles blob returns in tool responses.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main objective of the changeset: adding multimodal support (Blob and array handling) for Gemini tool calls.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch gemini-multimodal-tool-outputs

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/code.gs (2)

490-513: ⚠️ Potential issue | 🟡 Minor

Mixed arrays (blobs + non-blobs) silently lose binary data.

When functionResponse is an array where only some elements are blob-like, the every(isBlobLike) check on line 495 fails and the entire array falls into the jsonResponse path (line 498). Blob objects don't JSON-serialize meaningfully (they'll become empty objects or lose their binary content).

Consider either:

  1. Partitioning the array into blob vs. non-blob items and handling each group separately.
  2. Throwing or logging a warning for mixed arrays so callers know the blobs are dropped.
💡 Option 2: warn on mixed arrays
            if (Array.isArray(functionResponse)) {
-            if (functionResponse.length > 0 && functionResponse.every(isBlobLike)) {
-              multimodalParts = functionResponse.map(blobToGeminiInlinePart);
-            } else { // non-blob arrays
-              jsonResponse = functionResponse;
-            }
+            const blobs = functionResponse.filter(isBlobLike);
+            const nonBlobs = functionResponse.filter(item => !isBlobLike(item));
+            if (blobs.length > 0 && nonBlobs.length > 0) {
+              console.warn('[GenAIApp] - Mixed array (blobs + non-blobs) returned by function. Blobs will be sent as multimodal parts; non-blobs as JSON.');
+            }
+            if (blobs.length > 0) {
+              multimodalParts = blobs.map(blobToGeminiInlinePart);
+            }
+            if (nonBlobs.length > 0) {
+              jsonResponse = nonBlobs;
+            }

483-487: 🧹 Nitpick | 🔵 Trivial

Logging placement: log before the processing logic for easier debugging.

The verbose log at line 485 is placed after the function call (line 482) but before the response processing (lines 488+). This is fine for knowing the function was called, but consider also logging a summary of the response type (blob, array, string, etc.) after the classification logic to aid debugging multimodal flows.

🤖 Fix all issues with AI agents
In `@src/code.gs`:
- Around line 1779-1786: The helper createGeminiInlinePart currently accepts a
filename and sets inlineData.displayName, but displayName is not a supported
field so the parameter and mapping are unused; remove the filename parameter
from createGeminiInlinePart and delete the displayName property from the
returned inlineData object, and update any callers of createGeminiInlinePart to
stop passing a filename (or, if Gemini actually expects filename at a different
level, instead move that value out of inlineData to the correct parent field
where used); reference function name createGeminiInlinePart when making these
changes and ensure all call sites are adjusted accordingly.
- Around line 196-202: The addImage logic is constructing Gemini parts using
snake_case fields (inline_data, mime_type) which the REST API won't accept;
update addImage to call the existing createGeminiInlinePart helper instead of
building parts manually so the payload uses camelCase (inlineData, mimeType).
Locate addImage and replace its parts array construction with a call to
createGeminiInlinePart(fileInfo.mimeType, blobToBase64, fileInfo.fileName) (or
pass parameters in the helper's expected order), ensuring the parts property
uses the helper return value so all field names match the Gemini REST API.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/testFunctions.gs`:
- Around line 189-193: The receipt currently interpolates amount directly in
generateReceipt and can drop trailing zeros or show float artifacts; before
calling _createSimplePdf (and before embedding in the template array), format
the numeric amount to a deterministic two-decimal string (e.g., use
amount.toFixed(2) or Intl.NumberFormat with minimumFractionDigits:2) and then
interpolate that formatted string into the `Customer: ...`/`Amount: €...` lines
so receipts always show two decimal places; update generateReceipt to compute
formattedAmount and pass `Amount: €${formattedAmount}` to _createSimplePdf.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37fd6af and c71010d.

📒 Files selected for processing (1)
  • src/testFunctions.gs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/code.gs (1)

1594-1625: 🧹 Nitpick | 🔵 Trivial

OpenAI blob-output path is correct; minor log-placement inconsistency.

The OpenAI Responses API function_call_output.output accepts either a string or an array of ResponseInputFile/ResponseInputImage/ResponseInputText objects, and the docs explicitly state that for functions returning files, an array of file objects can be passed. The blobToResponseInputFileContent → array approach is therefore valid.

Minor nit: the Gemini handler logs function ${functionName}() called by Gemini before blob processing (line 1485), while the OpenAI handler logs the equivalent after blob processing (line 1624). Aligning the log position for consistency would make traces easier to correlate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/code.gs` around lines 1594 - 1625, The verbose logging for OpenAI handler
is placed after blob/object processing, causing inconsistent trace ordering with
the Gemini handler; move the console.log(`[GenAIApp] - function
${functionName}() called by OpenAI.`) so it runs immediately when the function
is invoked (i.e., before the block that inspects and converts functionResponse
using isBlobLike and blobToResponseInputFileContent), keeping the same verbose
conditional and message text to align log placement with the Gemini handler.
♻️ Duplicate comments (2)
src/code.gs (2)

1781-1799: New helpers are correctly implemented.

createGeminiInlinePart uses the correct camelCase field names (inlineData, mimeType) required by the Gemini REST API, and displayName inside inlineData is a valid field — the Gemini docs confirm it must be unique when referenced via {"$ref": "..."} from functionResponse.response. Both helpers look good.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/code.gs` around lines 1781 - 1799, Both helpers are correctly
implemented: keep createGeminiInlinePart and blobToGeminiInlinePart as-is (they
use inlineData, mimeType and displayName correctly and properly base64-encode
blob bytes via Utilities.base64Encode(blob.getBytes())); no code changes
required—leave the functions createGeminiInlinePart and blobToGeminiInlinePart
unchanged and mark the change approved.

196-202: addFile now correctly uses createGeminiInlinePart (camelCase).

The fix is correct. Note that addImage (lines 119–124) still builds Gemini content with inline_data / mime_type (snake_case) rather than the same helper — that inconsistency was flagged in a prior review and remains unaddressed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/code.gs` around lines 196 - 202, addImage currently constructs Gemini
content using snake_case keys (inline_data, mime_type) instead of reusing the
helper createGeminiInlinePart like addFile does; update the addImage
implementation to call createGeminiInlinePart(fileInfo.mimeType, blobToBase64,
fileInfo.fileName) (or equivalent arg order used by createGeminiInlinePart) and
remove the manual inline_data/mime_type object so both addFile and addImage
consistently use the same camelCase helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/code.gs`:
- Around line 1494-1499: The current Array branch treats mixed arrays as
"non-blob" and lets Blob objects be stringified to "{}", losing data; update the
Array handling around functionResponse so you detect mixed arrays (use
functionResponse.some(isBlobLike) && functionResponse.some(x => !isBlobLike(x)))
and then either throw an error or filter with a clear warning. If failing fast:
throw a descriptive Error indicating a mixed blob/non-blob array was received;
if filtering: call processLogger.warn (or equivalent) and build multimodalParts
= functionResponse.filter(isBlobLike).map(blobToGeminiInlinePart) and set
jsonResponse = functionResponse.filter(x => !isBlobLike(x)) so blobs are
preserved and non-blob elements are handled separately instead of being lost
during JSON.stringify.
- Around line 1530-1534: The code pushes function results into contents with
role: "tool", which is invalid for generateContent; change the pushed object to
use role: "user" and ensure the pushed parts (responseParts) are formatted as
function responses (e.g., parts with type "functionResponse") so the
generateContent API accepts them; update the block that constructs the contents
array where responseParts is used (the push that currently sets role: "tool") to
set role: "user" and include the appropriate functionResponse part metadata.
- Around line 1515-1526: The code unconditionally adds parts to the
functionResponse (responseParts.push -> functionResponse) when multimodalParts
exists, which breaks Gemini 2.5 compatibility; update the logic to check the
active model/version (e.g., your model identifier or config used when calling
the API) and only include the parts field for Gemini 3+ models (Gemini 3
series/Vertex AI) while leaving it out for Gemini 2.5; also when parts are
supported and jsonResponse is empty, ensure the JSON response emitted by
jsonResponse (used with functionName) includes proper references to attached
parts using the {"$ref":"<displayName>"} format so the model can resolve the
blobs (use multimodalParts and functionName/jsonResponse as the reference
points).

---

Outside diff comments:
In `@src/code.gs`:
- Around line 1594-1625: The verbose logging for OpenAI handler is placed after
blob/object processing, causing inconsistent trace ordering with the Gemini
handler; move the console.log(`[GenAIApp] - function ${functionName}() called by
OpenAI.`) so it runs immediately when the function is invoked (i.e., before the
block that inspects and converts functionResponse using isBlobLike and
blobToResponseInputFileContent), keeping the same verbose conditional and
message text to align log placement with the Gemini handler.

---

Duplicate comments:
In `@src/code.gs`:
- Around line 1781-1799: Both helpers are correctly implemented: keep
createGeminiInlinePart and blobToGeminiInlinePart as-is (they use inlineData,
mimeType and displayName correctly and properly base64-encode blob bytes via
Utilities.base64Encode(blob.getBytes())); no code changes required—leave the
functions createGeminiInlinePart and blobToGeminiInlinePart unchanged and mark
the change approved.
- Around line 196-202: addImage currently constructs Gemini content using
snake_case keys (inline_data, mime_type) instead of reusing the helper
createGeminiInlinePart like addFile does; update the addImage implementation to
call createGeminiInlinePart(fileInfo.mimeType, blobToBase64, fileInfo.fileName)
(or equivalent arg order used by createGeminiInlinePart) and remove the manual
inline_data/mime_type object so both addFile and addImage consistently use the
same camelCase helper.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c71010d and d25a30f.

📒 Files selected for processing (2)
  • README.md
  • src/code.gs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/testFunctions.gs (1)

189-194: ⚠️ Potential issue | 🟡 Minor

Format currency to fixed decimal places.

The amount parameter is interpolated directly, which can produce floating-point artifacts or inconsistent decimal places in receipts (e.g., 42.5 instead of 42.50).

💵 Suggested fix
 function generateReceipt(customerName, amount) {
+  const formattedAmount = Number(amount).toFixed(2);
   return _createSimplePdf("receipt.pdf", "Receipt", [
     `Customer: ${customerName}`,
-    `Amount: €${amount}`
+    `Amount: €${formattedAmount}`
   ]);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/testFunctions.gs` around lines 189 - 194, The receipt prints raw numeric
`amount` which can show floating artifacts or inconsistent decimals; update
generateReceipt to format the amount to two decimal places (e.g., using
amount.toFixed(2) or Intl.NumberFormat for Euro) before interpolating so the
string passed to _createSimplePdf is like `Amount: €{formattedAmount}`; modify
generateReceipt (and any callers if necessary) to compute formattedAmount and
pass that into the array given to _createSimplePdf.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/testFunctions.gs`:
- Around line 160-165: The _escapeHtml helper currently escapes &, <, and > but
not quotes; update function _escapeHtml to also replace double quotes (") and
single quotes (') with &quot; and &#39; respectively so it is safe for use in
attribute contexts; modify the replace chain on _escapeHtml (the
String(value).replace(...).replace(...)) to include .replace(/"/g,
"&quot;").replace(/'/g, "&#39;") while preserving existing escapes and behavior.

---

Duplicate comments:
In `@src/testFunctions.gs`:
- Around line 189-194: The receipt prints raw numeric `amount` which can show
floating artifacts or inconsistent decimals; update generateReceipt to format
the amount to two decimal places (e.g., using amount.toFixed(2) or
Intl.NumberFormat for Euro) before interpolating so the string passed to
_createSimplePdf is like `Amount: €{formattedAmount}`; modify generateReceipt
(and any callers if necessary) to compute formattedAmount and pass that into the
array given to _createSimplePdf.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d25a30f and 30aa615.

📒 Files selected for processing (1)
  • src/testFunctions.gs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants