Skip to content

fix(kb): 修复知识库向量维度不一致时的错误提示与更新状态#7866

Open
Sisyphbaous-DT-Project wants to merge 4 commits intoAstrBotDevs:masterfrom
Sisyphbaous-DT-Project:fix/kb-validate-faiss-index-dimension
Open

fix(kb): 修复知识库向量维度不一致时的错误提示与更新状态#7866
Sisyphbaous-DT-Project wants to merge 4 commits intoAstrBotDevs:masterfrom
Sisyphbaous-DT-Project:fix/kb-validate-faiss-index-dimension

Conversation

@Sisyphbaous-DT-Project
Copy link
Copy Markdown
Contributor

@Sisyphbaous-DT-Project Sisyphbaous-DT-Project commented Apr 28, 2026

修复 #7794 喵。

这次 PR 主要修复知识库上传和更新过程中 embedding 维度不一致时的错误归因问题喵。

之前在已有 FAISS 索引维度和当前 embedding provider 配置维度不一致时,错误可能会在后续写入阶段才暴露,并且容易被包装成比较泛化的 storage error 喵。

这样用户看到的错误信息不够明确,也不容易判断到底是索引维度、embedding provider,还是文档写入本身出了问题喵。

Modifications / 改动点 喵

  • 在打开已有 FAISS index 时,增加 index 实际维度和当前 embedding provider 维度的校验喵。

  • 如果两者不一致,会直接抛出用户可读的 KnowledgeBaseUploadError,提示用户使用原 embedding 模型,或者删除并重建知识库索引喵。

  • 将批量生成 embedding 时的普通异常包装为 stage="embedding"KnowledgeBaseUploadError,避免继续被上层误报为写入知识库索引失败喵。

  • 调整 update_kb() 的失败处理逻辑,让知识库重新初始化失败时向 API 调用方返回错误,而不是静默返回旧 helper 并显示更新成功喵。

  • 在失败回滚时保留旧的知识库 helper 和旧的可用 vec_db,避免一次失败更新影响现有知识库继续使用喵。

  • 补充了 FAISS index 维度不匹配、维度一致、批量 embedding 失败包装,以及 update_kb() 初始化失败回滚的单元测试喵。

  • This is NOT a breaking change. / 这不是一个破坏性变更喵。

Screenshots or Test Results / 运行截图或测试结果 喵

本次改动已补充对应单元测试,覆盖以下场景喵。

  • 空内容批量插入时不会调用 embedding provider、document storage 或 embedding storage 喵。

  • embedding provider 返回的向量数量与文本分块数量不一致时,会返回明确的向量化失败信息喵。

  • 批量生成 embedding 抛出普通异常时,会被归类为 embedding 阶段错误,而不是 storage 阶段错误喵。

  • 已有 FAISS index 维度与当前 provider 维度不一致时,会直接抛出明确的维度不匹配错误喵。

  • 已有 FAISS index 维度与当前 provider 维度一致时,可以正常加载已有索引喵。

  • update_kb() 重新初始化失败时,会向调用方暴露错误,同时保留旧 helper 和旧 vec_db 喵。

  • update_kb() 重新初始化成功后,才会切换到新的 helper,并关闭旧 helper 喵。

建议验证命令如下喵。

pytest tests/unit/test_faiss_vec_db.py tests/unit/test_kb_manager_resilience.py

Checklist / 检查清单 喵

  • If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc. / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过喵。

  • My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above. / 我的更改经过了良好的测试,**并已在上方提供了“验证步骤”和“运行截图”**喵。

  • I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml. / 我确认这次没有引入新的依赖库喵。

  • My changes do not introduce malicious code. / 我的更改没有引入恶意代码喵。

Summary by Sourcery

Improve knowledge base update resilience and provide clearer, user-facing errors for FAISS index and embedding failures.

Bug Fixes:

  • Ensure knowledge base updates surface reinitialization failures to callers while keeping the previous helper and vector DB available.
  • Validate FAISS index dimensions against the configured embedding model and fail early when they mismatch, instead of deferring to later storage errors.
  • Classify batch embedding failures as embedding-stage errors rather than generic storage errors, preserving clearer error attribution.

Enhancements:

  • Add rollback logic that restores prior knowledge base configuration and init error state when reinitialization of a new helper fails.

Tests:

  • Extend FAISS vec DB tests to cover dimension mismatch/match, embedding count mismatch details, and wrapping of batch embedding failures into user-facing errors.
  • Extend knowledge base manager resilience tests to cover surfacing reinitialization errors, preserving old helpers, and switching helpers only after successful reinit.

@auto-assign auto-assign Bot requested review from Fridemn and advent259141 April 28, 2026 08:00
@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Apr 28, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In KnowledgeBaseManager.update_kb, both KnowledgeBaseUploadError and generic exceptions are converted to ValueError, which drops the structured stage/details information; consider either re‑raising the original KnowledgeBaseUploadError or wrapping it while preserving its type and metadata so callers can programmatically distinguish embedding/index issues.
  • When wrapping generic failures in FaissVecDB.insert_batch into KnowledgeBaseUploadError, you only include content_count in details; consider also including the original exception type/message (or a sanitized version) in details to aid debugging while keeping user_message user‑friendly.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `KnowledgeBaseManager.update_kb`, both `KnowledgeBaseUploadError` and generic exceptions are converted to `ValueError`, which drops the structured `stage`/`details` information; consider either re‑raising the original `KnowledgeBaseUploadError` or wrapping it while preserving its type and metadata so callers can programmatically distinguish embedding/index issues.
- When wrapping generic failures in `FaissVecDB.insert_batch` into `KnowledgeBaseUploadError`, you only include `content_count` in `details`; consider also including the original exception type/message (or a sanitized version) in `details` to aid debugging while keeping `user_message` user‑friendly.

## Individual Comments

### Comment 1
<location path="astrbot/core/db/vec_db/faiss_impl/vec_db.py" line_range="120-127" />
<code_context>
+                max_retries=max_retries,
+                progress_callback=progress_callback,
+            )
+        except KnowledgeBaseUploadError:
+            raise
+        except Exception as exc:
+            raise KnowledgeBaseUploadError(
+                stage="embedding",
+                user_message=f"向量化失败:批量生成嵌入向量时出错。{exc}",
+                details={"content_count": content_count},
+            ) from exc
         end = time.time()
         logger.debug(
</code_context>
<issue_to_address>
**issue (bug_risk):** Broad `except Exception` also wraps cancellation and system-level errors into `KnowledgeBaseUploadError`.

This will also catch `asyncio.CancelledError` (and similar system-level exceptions), wrapping them in `KnowledgeBaseUploadError` and breaking cooperative cancellation/shutdown. Prefer explicitly re-raising cancellation and known errors, and only wrapping unexpected ones, e.g.:

```python
        except KnowledgeBaseUploadError:
            raise
        except asyncio.CancelledError:
            raise
        except Exception as exc:
            ...
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/db/vec_db/faiss_impl/vec_db.py
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the robustness of knowledge base management by introducing dimension validation for FAISS indices and improving error handling during the update process. Key changes include the addition of a rollback mechanism in KnowledgeBaseManager.update_kb to restore state if re-initialization fails, and the wrapping of embedding generation processes to surface structured errors via KnowledgeBaseUploadError. Feedback focuses on preserving the structured information within KnowledgeBaseUploadError by avoiding its conversion to a generic ValueError and suggests refactoring repetitive error-handling logic into a shared helper function to improve maintainability.

Comment thread astrbot/core/knowledge_base/kb_mgr.py Outdated
f"知识库 {kb.kb_name}({kb.kb_id}) 重新初始化失败,继续使用旧实例: {e}",
exc_info=True,
)
raise ValueError(str(e)) from e
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

KnowledgeBaseUploadError 重新包装为 ValueError 会导致原始异常中包含的结构化信息(如 stagedetails)丢失。KnowledgeBaseUploadError 专门用于携带用户可见的错误信息及其上下文,直接抛出该异常可以让上层调用方(如 API 接口)更好地利用这些详情来返回更精确的错误响应。建议直接抛出原始异常。此外,根据项目规范,新增的知识库附件处理功能应配备相应的单元测试。

Suggested change
raise ValueError(str(e)) from e
raise e
References
  1. New functionality, such as handling attachments, should be accompanied by corresponding unit tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里确实存在结构化错误信息丢失的问题。最初这里将 KnowledgeBaseUploadError 转成 ValueError,主要是为了避免 update_kb() 初始化失败时被路由层误判为更新成功;但这样会丢失原始异常里的 stagedetails。我已经调整为在 KnowledgeBaseUploadError 分支中保留原异常类型并直接上抛,只在通用异常分支里继续使用 ValueError 暴露初始化失败信息。这样既能避免“伪成功”,也能保留 embedding/index 相关错误的结构化上下文。

exc_info=True,
)
return kb_helper
raise ValueError(f"知识库重新初始化失败:{e}") from e
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

同上,将通用异常包装为 ValueError 可能会掩盖错误的本质。虽然这里是为了向 API 调用方暴露错误,但建议考虑使用更具体的异常类型(例如将其包装为 KnowledgeBaseUploadError),或者至少确保错误消息足够清晰且不丢失原始异常的上下文。另外,由于此处处理逻辑与前文高度相似,建议将其重构为共享的助手函数以避免代码重复。

References
  1. When implementing similar functionality for different cases (e.g., direct vs. quoted attachments), refactor the logic into a shared helper function to avoid code duplication.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里关于重复逻辑抽取成共享 helper 的方向我认同,不过这次我没有继续做这一步重构。当前这次 PR 的目标是尽量以最小 diff 修复 #7794 对应的两个核心问题:索引维度校验,以及初始化失败时不要伪成功。目前已经先修复了更关键的一点:KnowledgeBaseUploadError 不再在 update_kb() 中被降级为 ValueError,从而保留了结构化错误信息。至于进一步抽取共享 helper,我倾向于放在后续单独的整理里处理,避免这次 PR scope 继续扩大。

@Sisyphbaous-DT-Project Sisyphbaous-DT-Project changed the title fix(kb): validate FAISS index dimension and surface embedding errors fix(kb): 修复知识库向量维度不一致时的错误提示与更新状态 Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant