Skip to content

fix(inference): gate Responses fallback by endpoint host#4236

Open
YOMXXX wants to merge 4 commits into
tinyhumansai:mainfrom
YOMXXX:fix/GH-4203-responses-host-gate
Open

fix(inference): gate Responses fallback by endpoint host#4236
YOMXXX wants to merge 4 commits into
tinyhumansai:mainfrom
YOMXXX:fix/GH-4203-responses-host-gate

Conversation

@YOMXXX

@YOMXXX YOMXXX commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add an endpoint-host Responses API capability gate for cloud providers.
  • Keep custom / unknown slugs permissive unless their endpoint host matches a known built-in chat-completions-only provider.
  • Demote deterministic /responses route-missing 404s: cache the endpoint as unsupported, log at info, skip Sentry, and still propagate the terminal error.
  • Add regression coverage for NVIDIA-style custom slugs and the first /responses 404 demotion path.
  • Repair stale localized README links that block the required Markdown Link Check.

Problem

  • Issue Responses API: host-based capability gate for chat-only endpoints + demote first 404 (TAURI-RUST-5A1) #4203 reports repeated Sentry noise when a custom or renamed provider slug points at a chat-only OpenAI-compatible host such as NVIDIA.
  • The previous slug-only gate allowed those custom slugs to attempt /responses, guaranteeing an extra 404 on fresh processes and a false-positive Sentry event.
  • The repository-wide Markdown Link Check currently fails on unrelated stale localized README links; those links must be fixed for this PR to become mergeable.

Solution

  • Introduce cloud_endpoint_supports_responses_api(slug, endpoint) so built-in slugs keep the explicit capability table while custom slugs are also checked against known built-in chat-only hosts.
  • Use that endpoint-aware gate when building bearer-auth cloud providers.
  • Treat route-missing /responses 404s on fallback paths as endpoint capability discovery instead of reportable provider failures.
  • Replace stale localized README links for Reddit, GitBook voice, and star-history so the docs link gate can pass.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — changed lines (Vitest + cargo-llvm-cov merged via diff-cover) meet the gate enforced by .github/workflows/pr-ci.yml. Local full coverage was not run; required CI coverage gate will enforce this before merge.
  • Coverage matrix updated — N/A: behaviour-only provider fallback fix; no feature row added/removed/renamed.
  • All affected feature IDs from the matrix are listed in the PR description under ## Related — N/A: no matrix feature ID.
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • Manual smoke checklist updated if this touches release-cut surfaces (docs/RELEASE-MANUAL-SMOKE.md) — N/A: provider fallback internals only.
  • Linked issue closed via Closes #NNN in the ## Related section

Impact

  • Runtime impact is limited to cloud inference provider selection and fallback error handling.
  • Custom OpenAI-compatible providers on unknown hosts remain allowed to use /responses fallback.
  • Known chat-only provider hosts avoid deterministic /responses retries and Sentry noise.
  • Docs impact is limited to stale link cleanup in localized README files.
  • No migration, security, or new network dependency impact.

Related


AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

  • Key: TAURI-RUST-5A1
  • URL: N/A

Commit & Branch

Validation Run

  • pnpm --filter openhuman-app format:check — N/A: Rust-only app path; docs/Rust change.
  • pnpm typecheck — N/A: no frontend/TypeScript changes.
  • Focused tests:
    • GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib custom_slug_on_builtin_chat_only_host_does_not_expose_responses_api
    • GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib missing_responses_route_404_marks_unsupported_without_sentry_event
    • GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib openhuman::inference::provider::factory::factory_tests
    • GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib openhuman::inference::provider::compatible::tests
    • GGML_NATIVE=OFF cargo test --manifest-path Cargo.toml --lib responses
  • Rust fmt/check (if changed):
    • cargo fmt --manifest-path Cargo.toml --check
    • git diff --check
    • GGML_NATIVE=OFF cargo check --manifest-path Cargo.toml
  • Tauri fmt/check (if changed): N/A: no Tauri changes.

Validation Blocked

  • command: lychee --version
  • error: local lychee binary is not installed.
  • impact: local link-check replication unavailable; GitHub Markdown Link Check validates the docs change.

Behavior Changes

  • Intended behavior change: custom slugs pointed at known chat-only provider hosts no longer advertise Responses API fallback.
  • User-visible effect: fewer duplicate failed provider calls and fewer false-positive Sentry reports for unsupported /responses routes.

Parity Contract

  • Legacy behavior preserved: OpenAI still uses /responses; unknown custom proxies remain permissive.
  • Guard/fallback/dispatch parity checks: built-in capability table remains authoritative for built-in slugs; endpoint-host guard only narrows custom slugs on known chat-only hosts.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): N/A
  • Canonical PR: this PR
  • Resolution (closed/superseded/updated): N/A

Summary by CodeRabbit

  • New Features

    • Improved automatic detection of whether a provider endpoint supports the Responses API, making retries/fallbacks more reliable.
  • Bug Fixes

    • Tightened fallback gating for known “chat-completions-only” host endpoints, while keeping unknown/custom endpoints permissive.
    • When the /responses route returns 404, fallback handling proceeds without generating Sentry events.
  • Tests

    • Added regression coverage for 404 “missing /responses” fallback behavior and Sentry suppression.
  • Documentation

    • Updated README navigation/link details and refreshed Star History embeds across multiple languages.

@YOMXXX YOMXXX requested a review from a team June 28, 2026 13:08
@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds endpoint-host-aware Responses API fallback gating, updates missing-route 404 handling to log and suppress Sentry, and refreshes Reddit, native voice, and Star History links in localized READMEs.

Changes

Responses API host-based capability gate and 404 demotion

Layer / File(s) Summary
Host-based capability helper
src/openhuman/config/schema/cloud_providers.rs
Adds cloud_endpoint_supports_responses_api(slug, endpoint), private endpoint host helpers, the updated import, and a unit test covering custom slugs on chat-only and permissive hosts.
Factory wiring and missing-route handling
src/openhuman/inference/provider/factory.rs, src/openhuman/inference/provider/compatible_helpers.rs, src/openhuman/inference/provider/compatible_tests.rs
Replaces slug-based fallback gating with the endpoint-aware helper, extracts the missing-route 404 predicate, adds the info log for fallback disablement, and verifies the no-Sentry regression path.

README link and embed updates

Layer / File(s) Summary
README navigation and embed links
docs/README.de.md, docs/README.ja-JP.md, docs/README.ko.md, docs/README.ur-pk.md, docs/README.zh-CN.md
Updates Reddit link formatting, native voice documentation targets, and Star History embed URLs across the localized README files.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • tinyhumansai/openhuman#4009: Adds the earlier builtin-slug Responses API gate that this PR extends to endpoint-host-aware detection.
  • tinyhumansai/openhuman#4065: Touches the same Responses fallback/unsupported caching path that this PR’s 404 handling and regression test exercise.
  • tinyhumansai/openhuman#4068: Modifies the same chat_via_responses missing-route 404 handling and Sentry behavior updated here.

Suggested labels

rust-core, sentry-traced-bug, bug

Suggested reviewers

  • oxoxDev
  • sanil-23

🐇 I hopped by the endpoint lane,
Sniffed the host and spared the pain.
A 404 got a quieter tune,
And README links now bloom in June.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The localized README link fixes in docs/*.md are unrelated to the linked inference issue and go beyond the requested scope. Move the README link cleanup to a separate docs-only PR or remove it from this change unless it is required for the issue.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and accurately describes the main inference change: gating Responses fallback by endpoint host.
Linked Issues check ✅ Passed The PR implements the host-based gate and first-/responses 404 demotion requested in #4203, with regression coverage.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot added bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. sentry-traced-bug Bug identified via Sentry triage labels Jun 28, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/openhuman/config/schema/cloud_providers.rs (1)

240-254: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add a debug/trace breadcrumb for the host-based capability decision.

This new gate decides whether /responses stays enabled, but there is no debug trail for the builtin/custom branch, the permissive parse-failure path, or the final host-match result. Please log the normalized host and final decision here so routing changes are diagnosable in production. As per coding guidelines, "Add debug logging to entry/exit, branches, external calls, retries/timeouts, state transitions, and errors using log/tracing at debug/trace level in Rust".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/config/schema/cloud_providers.rs` around lines 240 - 254, Add a
debug/trace breadcrumb in cloud_endpoint_supports_responses_api so the
host-based /responses decision is observable in production. Log the
builtin/custom branch, the endpoint_host parse-failure fallback, and the
normalized host plus final boolean result using log/tracing at debug or trace
level. Use the existing cloud_endpoint_supports_responses_api and endpoint_host
flow to place the logs without changing the capability logic.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/config/schema/cloud_providers.rs`:
- Around line 240-254: Add a debug/trace breadcrumb in
cloud_endpoint_supports_responses_api so the host-based /responses decision is
observable in production. Log the builtin/custom branch, the endpoint_host
parse-failure fallback, and the normalized host plus final boolean result using
log/tracing at debug or trace level. Use the existing
cloud_endpoint_supports_responses_api and endpoint_host flow to place the logs
without changing the capability logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c33294d5-e3df-4e4f-ab9e-5fd6155bed69

📥 Commits

Reviewing files that changed from the base of the PR and between 5a41a4f and 7871d08.

📒 Files selected for processing (4)
  • src/openhuman/config/schema/cloud_providers.rs
  • src/openhuman/inference/provider/compatible_helpers.rs
  • src/openhuman/inference/provider/compatible_tests.rs
  • src/openhuman/inference/provider/factory.rs

coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 28, 2026
@YOMXXX YOMXXX force-pushed the fix/GH-4203-responses-host-gate branch from be63c48 to 91c03cf Compare June 28, 2026 13:53
@YOMXXX

YOMXXX commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

The localized README link fixes are intentionally included because the repository-wide required Markdown Link Check fails on current upstream/main for those stale links, even though #4203 is a Rust inference fix. They are isolated in the docs commit and unblock the required gate; no product behavior changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. sentry-traced-bug Bug identified via Sentry triage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Responses API: host-based capability gate for chat-only endpoints + demote first 404 (TAURI-RUST-5A1)

1 participant