Add transformers.js support for offline local embeddings#390
Add transformers.js support for offline local embeddings#390sharshenov wants to merge 2 commits into
Conversation
9733ded to
71c0aab
Compare
d2065ce to
d08bf80
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new local/offline embedding provider powered by Transformers.js so the server can generate embeddings without external services (e.g., Ollama or hosted APIs), targeting CPU-only deployments.
Changes:
- Introduces
TransformersJSEmbeddings(Transformers.js + ONNX) with lazy model initialization and dimension auto-detection. - Wires a new
transformers:provider into embedding config/factory and DocumentStore initialization. - Adds dependency + docs for configuration (including
TRANSFORMERS_DEVICE) and caching guidance.
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/store/embeddings/TransformersJSEmbeddings.ts | New embeddings implementation using @huggingface/transformers pipeline |
| src/store/embeddings/TransformersJSEmbeddings.test.ts | Unit tests with a mocked transformers pipeline |
| src/store/embeddings/EmbeddingFactory.ts | Adds transformers provider and instantiation logic |
| src/store/embeddings/EmbeddingConfig.ts | Extends supported provider union to include transformers |
| src/store/DocumentStore.ts | Uses provider-specific dimension detection for Transformers.js |
| package.json | Adds @huggingface/transformers runtime dependency |
| package-lock.json | Locks new HF/ONNX runtime dependency tree |
| docs/guides/embedding-models.md | Documents new provider usage and environment variables |
| README.md | Notes Docker cache persistence for Transformers.js models |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "@alpinejs/collapse": "^3.15.8", | ||
| "@fastify/formbody": "^8.0.2", | ||
| "@fastify/static": "^8.3.0", | ||
| "@huggingface/transformers": "^4.1.0", |
There was a problem hiding this comment.
Adding @huggingface/transformers as a required dependency pulls in native, install-scripted transitive deps (notably onnxruntime-node and sharp). This can cause install failures or much slower installs for users who never enable the transformers provider.
Consider making Transformers.js support optional via lazy/dynamic import plus optionalDependencies/feature flagging, or splitting it into a separate installable extra so the default install path stays lightweight.
There was a problem hiding this comment.
If this dependency is optional, then the distribution by docker gets more complicated. Should 2 separate docker images being shipped then?
| this.isInitialized = true; | ||
| } | ||
|
|
||
| return this.encoder!; |
There was a problem hiding this comment.
This violates our TypeScript/Biome guidelines: non-null assertions are forbidden in this repo, and npm run lint flags this line. Please return a narrowed local value instead, e.g. assign/validate the encoder before returning it.
| ``` | ||
|
|
||
| **Pre-configured models:** | ||
| - `Xenova/bge-small-en-v1.5` (384d, recommended default) |
There was a problem hiding this comment.
Low priority: these local models are 384d/768d, while the default database vector dimension remains 1536. It would be helpful to mention that users can set embeddings.vectorDimension / DOCS_MCP_EMBEDDINGS_VECTOR_DIMENSION to match the selected local model and avoid padding/storage overhead.
There was a problem hiding this comment.
Good spot! I can try to detect the models dimensions early and auto-configure the server while still allowing to override the setting (as it is currently)
|
Thanks for the contribution! This is a great addition overall and fits really well with the local/offline story of the project. Besides the inline comments already left, there are a few smaller gaps that would be helpful to address in the next update:
Thank you for putting this together. |
8868afb to
ff2fa85
Compare
| "sentence-transformers/static-similarity-mrl-multilingual-v1": 1024, | ||
| "manu/sentence_croissant_alpha_v0.3": 2048, | ||
| "BAAI/bge-small-en-v1.5": 512, | ||
| "BAAI/bge-small-en-v1.5": 384, |
There was a problem hiding this comment.
ff2fa85 to
8c48f15
Compare
|
I just found that the model is cached in a wrong dir in docker as if the TRANSFORMERS_CACHE is not respected. I'll investigate and fix it later UPD: fixed |
8c48f15 to
a94ffab
Compare
8620ac9 to
eea0a76
Compare
| import { TransformersJSEmbeddings } from "./TransformersJSEmbeddings"; | ||
|
|
| ModelConfigurationError, | ||
| UnsupportedProviderError, | ||
| } from "./embeddings/EmbeddingFactory"; | ||
| import { FixedDimensionEmbeddings } from "./embeddings/FixedDimensionEmbeddings"; | ||
| import { TransformersJSEmbeddings } from "./embeddings/TransformersJSEmbeddings"; | ||
| import { |
| // For Transformers.js, use the built-in dimension detection with timeout | ||
| const dimensionPromise = this.embeddings.getVectorDimension(); | ||
| let timeoutId: NodeJS.Timeout | undefined; | ||
| const timeoutPromise = new Promise<never>((_, reject) => { | ||
| timeoutId = setTimeout(() => { | ||
| reject( | ||
| new Error( | ||
| `Embedding service connection timed out after ${this.embeddingInitTimeoutMs / 1000} seconds`, | ||
| ), | ||
| ); | ||
| }, this.embeddingInitTimeoutMs); |
| // Smart default for vectorDimension based on embedding provider | ||
| // Only applies when DOCS_MCP_EMBEDDINGS_VECTOR_DIMENSION is not explicitly set | ||
| // and the model is a transformers provider (avoids 1536d padding for 384d/768d models) | ||
| if ( | ||
| !process.env.DOCS_MCP_EMBEDDINGS_VECTOR_DIMENSION && | ||
| getAtPath(mergedInput, ["embeddings", "vectorDimension"]) === | ||
| DEFAULT_CONFIG.embeddings.vectorDimension | ||
| ) { | ||
| const modelSpec = getAtPath(mergedInput, ["app", "embeddingModel"]); | ||
| if (typeof modelSpec === "string" && modelSpec) { | ||
| const normalizedSpec = normalizeEnvValue(modelSpec); | ||
| const colonIndex = normalizedSpec.indexOf(":"); | ||
| const provider = | ||
| colonIndex === -1 ? "openai" : normalizedSpec.substring(0, colonIndex); | ||
|
|
||
| if (provider === "transformers") { | ||
| const model = | ||
| colonIndex === -1 ? normalizedSpec : normalizedSpec.substring(colonIndex + 1); | ||
| const knownDims = EmbeddingConfig.getKnownModelDimensions(model); | ||
| if (knownDims !== null) { | ||
| setAtPath(mergedInput, ["embeddings", "vectorDimension"], knownDims); | ||
| } | ||
| } | ||
| } | ||
| } |
| vi.mock("./embeddings/TransformersJSEmbeddings", async () => { | ||
| const actual = await vi.importActual< | ||
| typeof import("./embeddings/TransformersJSEmbeddings") | ||
| >("./embeddings/TransformersJSEmbeddings"); | ||
| return { | ||
| TransformersJSEmbeddings: class extends (actual as any).TransformersJSEmbeddings { | ||
| async getVectorDimension() { | ||
| return mockGetVectorDimension(); | ||
| } | ||
| async embedQuery(text: string) { | ||
| return mockEmbedQuery(text); | ||
| } |
| | `requestTimeoutMs` | `30000` | Timeout for each embedding API request (ms). | | ||
| | `initTimeoutMs` | `30000` | Timeout for the initial test embedding during model initialization (ms). | | ||
| | `vectorDimension` | `1536` | Dimension of the vector space. Must be a positive integer (minimum 1). Override with `DOCS_MCP_EMBEDDINGS_VECTOR_DIMENSION`. Changing this value triggers a model change confirmation on next startup. | | ||
| | `vectorDimension` | `1536` (or model-specific for `transformers:` models) | Dimension of the vector space. Must be a positive integer (minimum 1). For Transformers.js models, the server auto-detects the model's native dimension (e.g., 384 for `bge-small-en-v1.5`). Override with `DOCS_MCP_EMBEDDINGS_VECTOR_DIMENSION`. Changing this value triggers a model change confirmation on next startup. | |
| ### Vector Dimension Override | ||
|
|
||
| The vector dimension defaults to the model's native dimension (e.g., 1536 for `text-embedding-3-small`). You can override it with `embeddings.vectorDimension` in the config file or `DOCS_MCP_EMBEDDINGS_VECTOR_DIMENSION` as an environment variable. The value must be a positive integer (minimum 1). | ||
| The vector dimension defaults to the model's native dimension (e.g., 1536 for `text-embedding-3-small`, 384 for `BAAI/bge-small-en-v1.5`). For Transformers.js models not from the list of known models, it is auto-detected via inference. For other providers, it is discovered by embedding a test string. You can override it with `embeddings.vectorDimension` in the config file or `DOCS_MCP_EMBEDDINGS_VECTOR_DIMENSION` as an environment variable. The value must be a positive integer (minimum 1). |
| Embeddings stored as BLOB in documents table: | ||
|
|
||
| - 1536-dimensional vectors by default (configurable via `embeddings.vectorDimension`) | ||
| - 1536-dimensional vectors by default or model-specific for `transformers:` models, e.g., 384 for `BAAI/bge-small-en-v1.5`, (configurable via `embeddings.vectorDimension`) |
| **Note:** When using Transformers.js embeddings, models are cached inside the container in /models. Mount the volume to persist the models cache: | ||
| ```bash | ||
| -v docs-mcp-models:/models |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Add Transformers.js support for completely local, offline embedding inference on CPU-only servers without requiring Ollama or external APIs.
This PR adds native Transformers.js support, enabling: