Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,7 @@
"expanded": true,
"pages": [
"tutorials/agents/index",
"tutorials/agents/nvidia-rag-blueprint/index",
"tutorials/agents/time-travel-rag/index",
"tutorials/agents/multimodal-agent/index"
]
Expand Down
1 change: 1 addition & 0 deletions docs/tutorials/agents/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ and agent applications built with LanceDB.
| Project | Description |
|:----------|:------------|
| **Contextual RAG** <br /> <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/Contextual-RAG/Anthropic_Contextual_RAG.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" /></a>[View on GitHub](https://github.com/lancedb/vectordb-recipes/tree/main/examples/Contextual-RAG) | Improves retrieval by combatting the "lost in the middle" problem. This technique uses an LLM to generate succinct context for each document chunk, then prepends that context to the chunk before embedding, leading to more accurate retrieval. |
| **NVIDIA RAG Blueprint with LanceDB** <br /> [Read the tutorial](/tutorials/agents/nvidia-rag-blueprint/)<br />[View on GitHub](https://github.com/lancedb/vectordb-recipes/tree/main/examples/nvidia-rag-blueprint-lancedb) | Shows how to use LanceDB as the retrieval layer for NVIDIA RAG Blueprint with a Docker-first, retrieval-only integration path that includes hybrid search and pluggable rerankers. |
| **Matryoshka Embeddings** <br /> <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/tutorials/RAG-with_MatryoshkaEmbed-Llamaindex/main.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" /></a>[View on GitHub](https://github.com/lancedb/vectordb-recipes/tree/main/tutorials/RAG-with_MatryoshkaEmbed-Llamaindex) | Demonstrates a RAG pipeline using Matryoshka Embeddings with LanceDB and LlamaIndex. This method allows for efficient storage and retrieval of nested, variable-sized embeddings. |
| **HyDE (Hypothetical Document Embeddings)** <br /> <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/Advance-RAG-with-HyDE/main.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" /></a>[View on GitHub](https://github.com/lancedb/vectordb-recipes/tree/main/examples/Advance-RAG-with-HyDE) | An advanced RAG technique that uses an LLM to generate a "hypothetical" document in response to a query. This hypothetical document is then used to retrieve actual, similar documents, improving relevance. |
| **Late Chunking** <br /> <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/Advanced_RAG_Late_Chunking/Late_Chunking_(Chunked_Pooling).ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" /></a>[View on GitHub](https://github.com/lancedb/vectordb-recipes/tree/main/examples/Advanced_RAG_Late_Chunking) | An advanced RAG method where documents are retrieved first, and then chunking is performed on the retrieved documents just before synthesis. This helps maintain context that might be lost with pre-chunking. |
Expand Down
175 changes: 175 additions & 0 deletions docs/tutorials/agents/nvidia-rag-blueprint/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
---
title: "NVIDIA RAG Blueprint with LanceDB"
sidebarTitle: "NVIDIA RAG Blueprint"
description: "Use LanceDB as the retrieval layer for NVIDIA RAG Blueprint with a Docker-first, retrieval-only reference integration."
---

## What this tutorial shows

If you are using [NVIDIA RAG Blueprints](https://build.nvidia.com/blueprints) and want to evaluate LanceDB in that stack, this tutorial gives you a concrete starting point. It shows how to use LanceDB as the retrieval layer for a Docker-based NVIDIA RAG deployment with a small, script-driven reference integration where LanceDB OSS is embedded directly in the NVIDIA containers, the collection is prepared ahead of time, and the RAG server retrieves from it for search and generation. The example is intentionally retrieval-only, but it also includes hybrid search and reranker selection so you can see how LanceDB fits into a realistic NVIDIA retrieval workflow.

<Tip>
The runnable example for this tutorial lives in the
[VectorDB recipes repository](https://github.com/lancedb/vectordb-recipes/tree/main/examples/nvidia-rag-blueprint-lancedb).
</Tip>


## How NVIDIA organizes vector databases

NVIDIA's [RAG Blueprint documentation](https://docs.nvidia.com/rag/latest/readme.html) effectively describes three different patterns for vector database support.
1. There are built-in backends such as Milvus, where NVIDIA already owns both ingestion and
retrieval.
2. There are built-in alternatives such as Elasticsearch, where NVIDIA still owns
the end-to-end flow but switches the backend through configuration.
3. Then, there is the custom vector database path, where you implement a `VDBRag` backend yourself and register it in NVIDIA's
factory.

The LanceDB example shown below fits into the third category. More specifically, it follows NVIDIA's
**retrieval-only** custom backend path: the data is prepared in LanceDB ahead of time, and NVIDIA
RAG Blueprint is then pointed at that existing collection for search and generation. It does not
yet teach NVIDIA's ingestor how to write new documents into LanceDB automatically.

## Deployment model

This reference integration uses **LanceDB OSS as an embedded retrieval library**, not as a separate
database service. In practice, `APP_VECTORSTORE_NAME` is set to `lancedb`, `APP_VECTORSTORE_URL`
points to a local filesystem path inside the NVIDIA containers, the LanceDB collection is prepared
ahead of time, and the NVIDIA RAG server loads the LanceDB adapter to retrieve directly from that
local dataset.

## What the recipe contains

The recipe at
[`examples/nvidia-rag-blueprint-lancedb`](https://github.com/lancedb/vectordb-recipes/tree/main/examples/nvidia-rag-blueprint-lancedb)
is organized around a small number of practical pieces. The data-prep script builds a demo
LanceDB collection from scratch, generates embeddings through the LanceDB embedding registry, and
creates a full-text index so hybrid retrieval works immediately. The adapter file shows the
retrieval-only integration point for NVIDIA RAG Blueprint, while the Docker override and NVIDIA
change guide show the minimal configuration and source changes needed to run the example against
NVIDIA's containers.

## End-to-end flow

### 1. Prepare the LanceDB collection

From the [recipe directory](https://github.com/lancedb/vectordb-recipes/tree/main/examples/nvidia-rag-blueprint-lancedb):

```bash
uv sync
uv run prepare_lancedb.py --embedder demo-keyword --reranker mrr
```

That script creates:

- a local LanceDB dataset under `data/`
- a collection named `nvidia_blueprint_demo`
- automatic embeddings generated at ingest time
- an FTS index for hybrid search

The default embedder is an offline demo embedder so the example stays easy to run. If you want a
more realistic setup, the same script can switch to a sentence-transformers embedder.

### 2. Patch the NVIDIA blueprint

Follow the instructions in the recipe's
[`nvidia_blueprint_changes.md`](https://github.com/lancedb/vectordb-recipes/tree/main/examples/nvidia-rag-blueprint-lancedb/nvidia_blueprint_changes.md).
The essential changes are:

- add LanceDB dependencies to the NVIDIA environment
- copy `lancedb_vdb.py` into the NVIDIA source tree
- register the `lancedb` branch in NVIDIA's VDB factory

NVIDIA's [RAG blueprint documentation](https://docs.nvidia.com/rag/latest/readme.html) and custom-VDB guide provide useful background if you want more context before applying the LanceDB-specific changes.

### 3. Start the Docker deployment

Set the absolute path to the recipe directory:

```bash
export LANCEDB_RECIPE_DIR=/absolute/path/to/vectordb-recipes/examples/nvidia-rag-blueprint-lancedb
```

Then from the NVIDIA repo root:

```bash
docker compose \
-f deploy/compose/docker-compose-rag-server.yaml \
-f "$LANCEDB_RECIPE_DIR"/docker-compose.override.yml \
up -d --build

docker compose \
-f deploy/compose/docker-compose-ingestor-server.yaml \
-f "$LANCEDB_RECIPE_DIR"/docker-compose.override.yml \
up -d --build
```

The key environment values are:

- `APP_VECTORSTORE_NAME=lancedb`
- `APP_VECTORSTORE_URL=/opt/lancedb-recipe/data`
- `COLLECTION_NAME=nvidia_blueprint_demo`
- `APP_VECTORSTORE_SEARCHTYPE=hybrid`
- `LANCEDB_RERANKER=mrr`

## Verifying the integration

### Search

```bash
curl -X POST http://localhost:8081/v1/search \
-H 'Content-Type: application/json' \
-d '{
"query": "How do I replace Milvus in the NVIDIA RAG blueprint with LanceDB?",
"use_knowledge_base": true,
"collection_names": ["nvidia_blueprint_demo"],
"vdb_top_k": 3,
"reranker_top_k": 0
}'
```

### Generate

```bash
curl -N -X POST http://localhost:8081/v1/generate \
-H 'Content-Type: application/json' \
-d '{
"messages": [{"role":"user","content":"Summarize the LanceDB integration approach."}],
"use_knowledge_base": true,
"collection_names": ["nvidia_blueprint_demo"],
"vdb_top_k": 3,
"reranker_top_k": 0
}'
```

## Hybrid retrieval and rerankers

This example is meant to prove more than a trivial vector lookup.

- LanceDB hybrid retrieval combines vector search with full-text search
- the recipe creates the FTS index as part of dataset prep
- the adapter supports `RRFReranker`, `MRRReranker`, and `CrossEncoderReranker`
- the default example uses `MRRReranker`, not a plain weighted linear combination

That matters for NVIDIA partner workloads because product names, storage platforms, and technical
jargon often need exact lexical matching as well as semantic retrieval.

## How this can be extended

The current example follows NVIDIA's **custom retrieval-only backend** path. In practice, that
means the LanceDB collection is created ahead of time and NVIDIA RAG Blueprint is then pointed at
that existing collection for search and generation. The sample data in `prepare_lancedb.py` exists
only to make that flow runnable end to end: it creates a small local collection, inserts a few
documents, generates embeddings, and builds an FTS index so the NVIDIA side has something real to
query.

A fuller integration is possible. NVIDIA's custom `VDBRag` interface also supports the pattern used
by built-in backends such as Milvus and Elasticsearch, where NVIDIA owns both ingestion and
retrieval. To make LanceDB work that way, a complete LanceDB backend would need to implement the
ingestion methods NVIDIA documents, especially `create_collection` and `write_to_index`, along with
the retrieval and collection-management methods expected by the rest of the stack.

The open work is in defining how NVIDIA's ingestor should write
records into LanceDB, how that storage is shared between the ingestor and the RAG server, and how
document and collection metadata should be exposed so the broader NVIDIA APIs behave correctly.
Until those pieces exist, this example should be read as: prepare LanceDB first, then let NVIDIA
retrieve from it.
Loading