Skip to content

feat: 新增 console.x.ai 路由,让 basic 账号使用 grok-4.3/4.20 系列(默认开启 web search)#542

Open
cloudriver8 wants to merge 4 commits into
chenyme:mainfrom
cloudriver8:feat/console-x-ai-routing
Open

feat: 新增 console.x.ai 路由,让 basic 账号使用 grok-4.3/4.20 系列(默认开启 web search)#542
cloudriver8 wants to merge 4 commits into
chenyme:mainfrom
cloudriver8:feat/console-x-ai-routing

Conversation

@cloudriver8
Copy link
Copy Markdown

Summary

将 6 个 console 系列模型(grok-4.3 / grok-4 / grok-4.20 / grok-4.20-reasoning / grok-4.20-non-reasoning / grok-4.20-multi-agent)通过 console.x.ai/v1/responses 路由,使免费 basic 账号即可调用——免账户充值。

主要功能 (feat)

  • 新协议层 app/dataplane/reverse/protocol/xai_console.py

    • 支持 Responses API 完整请求 / 响应格式
    • 结构化 input 数组:多模态 (text + image_url/base64)、对话历史、function_call / function_call_output
    • 原生 function calling(透传 OpenAI 兼容 tools / tool_choice,不再走 ToolSieve XML 注入)
    • instructions 字段聚合 role=system 消息,改善推理模型表现
    • 完整的 SSE 流式适配器 ConsoleStreamAdapter,处理 14+ 种上游事件类型
  • 模型注册 app/control/model/registry.py + app/control/model/spec.py

    • 新增 console_model 字段标识 console 路由模型
    • 6 个新模型公开名映射到上游真实模型 ID
  • 路由 / 端点

    • app/dataplane/reverse/runtime/endpoint_table.py 新增 CONSOLE_RESPONSES
    • app/dataplane/reverse/planner.pyspec.is_console() 模型路由到 console
    • app/products/openai/chat.py 新增 _console_completions(流式 + 非流式),自动注入 web_search 工具,提取并注入 search_sources 到响应根字段
    • app/products/openai/responses.py 新增 _console_responses_dispatch,直接透传上游 Responses 格式 + SSE 事件
    • app/products/anthropic/messages.py 通过 Chat Completions bridge 复用,让 /v1/messages 也支持新模型
  • 额度耗尽自动绕过 app/control/account/invalid_credentials.py + state_machine.py

    • HTTP 402(trial credits 用完)映射为 FeedbackKind.RATE_LIMITED
    • 账号池自动跳过额度耗尽的 token,自动切换到其它可用账号

修复 (fix)

  • multi-agent 信源提取grok-4.20-multi-agent 上游不发 web_search_call items,仅以 message annotations 形式发布 URL(start_index == end_index == 0)。原本只从 web_search_call 提取,导致 multi-agent 的 search_sources 始终为空。新增 fallback 从 annotations 提取 URL,dedupe 后注入。

  • WebUI 模型下拉列表过滤/webui/api/models 端点之前列出所有 enabled=True 的模型,包括账号池根本没有的 super/heavy tier,用户选了就报 No available accounts for this model tier。改为复用 /v1/models 同样的 _available_pools + _model_available_for_pools 过滤逻辑,两个端点输出保持一致。

不破坏现有行为

  • grok.com 路径(grok-4.20-fast / 0309-non-reasoning / imagine 等)零改动
  • ModelSpec 新增字段默认值 None,老模型不受影响
  • 老模型 grok-4.20-multi-agent-0309 等保留在 registry 里(HEAVY tier),有 heavy 账号的用户仍可用
  • WebUI 过滤逻辑跟 /v1/models 同源,行为对齐

背景说明

  • 触发原因:basic 账号无法使用 grok-4.3 / 4.20 reasoning 等高级模型,但 xAI 提供了 console.x.ai/v1/responses 入口用同一套 SSO cookie 可访问。
  • 已知限制:xAI 服务端不返回 reasoning summary 明文(即使设置 reasoning.summary='detailed' + include=['reasoning.encrypted_content']),客户端只能看到 reasoning_tokens 计数。这是上游行为,类似 OpenAI o1。

Testing

自动化检查

  • python -m py_compile 全文件语法验证通过
  • python -m pyflakes 仅原有未使用 import 警告,新增代码无问题

实测部署(AWS EC2 / Debian 12 / 88 个 basic 账号)

调用 /v1/chat/completions 非流式:

curl -X POST http://127.0.0.1:8000/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"grok-4.3","messages":[{"role":"user","content":"Who is the current CEO of OpenAI? One sentence."}]}'

返回(截断):

{
  "model": "grok-4.3",
  "choices": [{"message": {
    "content": "The current CEO of OpenAI is Sam Altman.[[1]](https://www.clay.com/dossier/openai-ceo)[[2]](https://en.wikipedia.org/wiki/Sam_Altman)",
    "annotations": [{"type": "url_citation", "url_citation": {"url": "https://www.clay.com/dossier/openai-ceo", "title": "1", "start_index": 40, "end_index": 86}}]
  }}],
  "usage": {"prompt_tokens": 3806, "completion_tokens": 1022, "reasoning_tokens": 493, "total_tokens": 4828},
  "search_sources": [{"url": "https://en.wikipedia.org/wiki/Sam_Altman"}]
}

多模型 / 多端点真实场景验证

模型 端点 流式 结果
grok-4.3 /v1/chat/completions ✅ 10 sources, reasoning=493
grok-4.3 /v1/chat/completions ✅ SSE 正常
grok-4.3 /v1/responses ✅ 透传 output array
grok-4.3 /v1/messages ✅ Anthropic SSE 事件流完整
grok-4.20-non-reasoning chat ✅ 24 sources, reasoning=0
grok-4.20-reasoning chat ✅ 5 sources, reasoning=417
grok-4.20-multi-agent chat ✅ 5 sources (fallback 生效), reasoning=1609

WebUI 过滤验证

修复前:/webui/api/models 返回 30+ 个模型,包括 super/heavy tier。
修复后:与 /v1/models 输出对齐,仅返回 basic 账号能用的 9 个:

grok-4, grok-4.3, grok-4.20, grok-4.20-fast, grok-4.20-multi-agent,
grok-4.20-non-reasoning, grok-4.20-reasoning, grok-4.20-0309-non-reasoning,
grok-imagine-image-lite

Related

  • N/A

… fallback

Multi-agent console models (grok-4.20-multi-agent) skip web_search_call
output items entirely and publish citation URLs only as document-level
annotations on the assistant message with start_index == end_index == 0.
The previous extractor relied solely on web_search_call items, leaving
search_sources empty for these responses despite usage reporting real
reasoning_tokens and visible citations.

- extract_console_search_sources() now falls back to message annotations
  after exhausting web_search_call sources, deduping against URLs already
  collected so single-agent responses are unchanged.
- ConsoleStreamAdapter mirrors the same fallback inside the SSE handler
  for response.output_text.annotation.added events.
- Strip title when it duplicates the URL (common in multi-agent output).

Verified end-to-end against grok-4.20-multi-agent on AWS deployment:
sources count goes from 0 to a non-empty list while annotations and
inline [[N]](url) citations remain intact.
The /webui/api/models endpoint listed every enabled ModelSpec regardless
of which account pool the deployment actually has. Users with only basic
accounts saw super/heavy-tier model names in the dropdown that
immediately failed with 'No available accounts for this model tier'
when selected.

Reuse the same _available_pools + _model_available_for_pools filter that
/v1/models already applies, so both endpoints stay in sync. Dropdown
now shows only the subset of enabled models the configured account pool
can actually serve.

Verified on AWS deployment with 88 basic accounts: WebUI dropdown shrinks
from ~30 entries to the 9 basic-tier-eligible models (console + grok.com
basic + image), eliminating the unreachable-tier confusion.
timi778 added a commit to joyce677/grok2api that referenced this pull request May 18, 2026
Import PR chenyme#542 from chenyme/grok2api with console.x.ai Responses routing for grok-4.3 and grok-4.20 model variants, default web search support, account feedback handling for 402 responses, and WebUI model availability filtering.
timi778 added a commit to timi778/grok2api that referenced this pull request May 18, 2026
highkay added a commit to highkay/grok2api that referenced this pull request May 19, 2026
@imjcal
Copy link
Copy Markdown

imjcal commented May 19, 2026

佬反馈个bug,你这个版本的新模型不受
全局附加指令
为每次请求注入统一的 system 消息,用于约束模型行为或固定角色设定。
的影响,失效了

- Add default_reasoning_effort field to ModelSpec

- Set high for grok-4/grok-4.3/grok-4.20, leave others unset

- Auto-fill default effort in chat.py and responses.py dispatch

- Forward reasoning_effort param through router.py

- Verified: reasoning_tokens 190->273 (+44%) without explicit effort
@cloudriver8
Copy link
Copy Markdown
Author

佬反馈个bug,你这个版本的新模型不受 全局附加指令 为每次请求注入统一的 system 消息,用于约束模型行为或固定角色设定。 的影响,失效了

Not a code bug. [build_console_payload] grok2api/app/dataplane/reverse/protocol/xai_console.py:267:0-333:18) correctly reads from features.custom_instruction. The issue was that the value in config.toml was mistakenly placed under the [app] section instead of [features]. Verified working after moving it to the correct section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants