Skip to content

feat: add LightRAG + DocumentDB playground#362

Open
hossain-rayhan wants to merge 8 commits intodocumentdb:mainfrom
hossain-rayhan:rayhan/lightrag-playground
Open

feat: add LightRAG + DocumentDB playground#362
hossain-rayhan wants to merge 8 commits intodocumentdb:mainfrom
hossain-rayhan:rayhan/lightrag-playground

Conversation

@hossain-rayhan
Copy link
Copy Markdown
Collaborator

Summary

Adds a self-contained playground under documentdb-playground/lightrag/ that deploys LightRAG — a graph-based RAG engine — backed by DocumentDB as its MongoDB-compatible storage layer, with Ollama for in-cluster LLM and embedding inference.

What's included

  • Sample DocumentDB instance (documentdb.yaml) sized for the playground, pinned to operator images 0.110.0 (required for the _id lookup fix).
  • Local Helm chart (helm/lightrag/) with an init container that no-ops three mongo_impl.py index-creation methods that DocumentDB doesn't support (createIndex with collation, $listSearchIndexes, Atlas $vectorSearch). LightRAG runs unmodified otherwise.
  • Ollama deployment (helm/ollama.yaml) with a 10 GiB PVC for model persistence, sized for qwen2.5:3b + nomic-embed-text on a Standard_D4-class node.
  • scripts/deploy.sh / scripts/cleanup.sh — pulls the DocumentDB connection string from status.connectionString, resolves embedded credential lookups, swaps ClusterIP for in-cluster DNS, strips the conflicting replicaSet=rs0 parameter, and helm upgrade --installs the chart.
  • README.md — architecture diagram, quick start, OpenAI swap, slow-CPU tuning section, compatibility-patches table, verification, and troubleshooting for the three failure modes I hit during validation (replica-set conflict, Ollama OOM, LLM_TIMEOUT).

Verification

End-to-end deploy on AKS (2× Standard_D4, k8s 1.35, DocumentDB operator 0.2.0, images 0.110.0):

  • DocumentDB cluster healthy
  • helm install lightrag succeeds (revision 3)
  • Document insertion → entity extraction → graph build → query verified working through the WebUI
  • helm lint clean

Storage choices

Store Backend Notes
KV MongoKVStorage Documents, chunks, entities
Graph MongoGraphStorage Knowledge graph nodes and edges
Doc status MongoDocStatusStorage Per-document processing state
Vectors NanoVectorDBStorage Local file (PVC) — DocumentDB has no $vectorSearch

Scope

  • Adds files under documentdb-playground/lightrag/ only.
  • No changes to the operator, CRDs, Helm chart, or CI workflows.
  • The helm/lightrag chart is intentionally local-only — playground sample, not a published product chart.

Checklist

  • helm lint passes
  • bash -n clean on both scripts
  • No secrets / personal identifiers in tree
  • DCO sign-off on all commits
  • Verified end-to-end on AKS

German and others added 7 commits April 30, 2026 11:09
Add a self-contained playground demonstrating LightRAG (graph-based RAG
engine) using DocumentDB as its MongoDB-compatible storage backend.

Includes:
- Helm chart with init-container that patches LightRAG for DocumentDB
  compatibility (skips unsupported createIndex with collation)
- Ollama deployment manifest for in-cluster LLM inference
- Automated deploy.sh and cleanup.sh scripts
- Comprehensive README with architecture, setup, configuration,
  DocumentDB compatibility matrix, and troubleshooting guide

Storage mapping:
- KV, Graph, DocStatus → MongoKVStorage/MongoGraphStorage (DocumentDB)
- Vectors → NanoVectorDBStorage (local, since DocumentDB lacks $vectorSearch)

Tested end-to-end on Kind with DocumentDB Kubernetes Operator.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace manual service/secret lookups with the connection string from
the DocumentDB resource status field, matching the pattern documented
in the official networking docs. The status field contains embedded
kubectl commands that are resolved via eval.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Document the three patched methods, why each is needed for DocumentDB
compatibility, how the init container applies the patches, and the
impact on functionality.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Findings from AKS E2E testing of the KEDA playground apply here too:
- Fix eval quoting: use eval "echo \"...\"" to prevent & in
  connection string query params from being interpreted as shell
  background operator
- Replace ClusterIP with DNS name for cross-namespace service
  resolution (status.connectionString uses ClusterIP)
- Update README eval example to use the corrected two-step pattern

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Rayhan Hossain <rhossain@microsoft.com>
Signed-off-by: Rayhan Hossain <rhossain@microsoft.com>
Copilot AI review requested due to automatic review settings April 30, 2026 18:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a self-contained LightRAG playground under documentdb-playground/lightrag/ to demonstrate LightRAG running on Kubernetes with DocumentDB as the MongoDB-compatible backend and Ollama for in-cluster inference.

Changes:

  • Adds deploy/cleanup scripts to stand up Ollama + LightRAG and wire LightRAG to DocumentDB.status.connectionString.
  • Introduces a local-only Helm chart for LightRAG (Service/Deployment/PVC/Secret + compatibility patch logic).
  • Adds a sample DocumentDB manifest and a detailed README for setup, validation, and troubleshooting.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
documentdb-playground/lightrag/scripts/deploy.sh Deploy script that applies Ollama, pulls models, derives MONGO_URI from DocumentDB status, and installs the LightRAG chart.
documentdb-playground/lightrag/scripts/cleanup.sh Cleanup script for uninstalling the Helm release and deleting associated resources.
documentdb-playground/lightrag/helm/ollama.yaml Standalone Ollama Namespace/PVC/Deployment/Service manifest.
documentdb-playground/lightrag/helm/lightrag/Chart.yaml Helm chart metadata for the local LightRAG chart.
documentdb-playground/lightrag/helm/lightrag/values.yaml Default values for the local LightRAG chart (image, resources, env, persistence).
documentdb-playground/lightrag/helm/lightrag/templates/deployment.yaml LightRAG Deployment (init-container “patch” + main container + volumes).
documentdb-playground/lightrag/helm/lightrag/templates/service.yaml Service exposing the LightRAG HTTP port.
documentdb-playground/lightrag/helm/lightrag/templates/secret.yaml Secret providing a .env file to the pod.
documentdb-playground/lightrag/helm/lightrag/templates/pvc.yaml PVCs for rag storage and inputs.
documentdb-playground/lightrag/helm/lightrag/templates/_helpers.tpl Helper templates for names/labels and .env rendering.
documentdb-playground/lightrag/helm/lightrag/templates/NOTES.txt Post-install instructions for accessing and testing LightRAG.
documentdb-playground/lightrag/helm/lightrag-values.yaml Playground-specific override values (timeouts, storage mapping, default MONGO_URI placeholder).
documentdb-playground/lightrag/documentdb.yaml Sample DocumentDB CR + credentials secret pinned to 0.110.0 images.
documentdb-playground/lightrag/README.md End-to-end documentation, architecture, configuration, verification, and troubleshooting.

- Helm v3.0+ and kubectl configured for your cluster
- ~10 GiB free memory on a single node (Ollama needs 4–8 GiB for the default
model, LightRAG needs ~2 GiB)
- Python with `pymongo` if you want to run the verification script in
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prerequisites mention "Python with pymongo" to run a verification script, but the Verification section only shows curl/jq commands and there is no verification script checked into this playground directory. Either add the referenced script or adjust the prerequisite text to match what's actually provided (e.g., require jq instead).

Suggested change
- Python with `pymongo` if you want to run the verification script in
- `jq` if you want to run the command-line checks in

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +51
containers:
- name: ollama
image: ollama/ollama:latest
ports:
- containerPort: 11434
resources:
requests:
cpu: 500m
memory: 4Gi
limits:
cpu: "4"
memory: 8Gi
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Ollama image is pinned to ollama/ollama:latest, which makes the playground non-reproducible and can break unexpectedly as upstream updates the tag. Consider pinning to a tested version (and optionally documenting how to override it) so the README's "tested version" claims remain accurate over time.

Copilot uses AI. Check for mistakes.
Comment on lines +13 to +25
NAMESPACE="${LIGHTRAG_NAMESPACE:-lightrag}"
DOCUMENTDB_NAMESPACE="${DOCUMENTDB_NAMESPACE:-documentdb-test}"
DOCUMENTDB_CLUSTER="${DOCUMENTDB_CLUSTER:-documentdb-cluster}"

echo "=== LightRAG + DocumentDB Deployment ==="

# 1. Create namespace and deploy Ollama
echo ""
echo "--- Step 1: Deploy Ollama ---"
kubectl apply -f "$OLLAMA_MANIFEST"
echo "Waiting for Ollama pod to be ready (first pull of ollama image can take a few minutes)..."
kubectl wait --for=condition=Ready pod -l app=ollama -n "$NAMESPACE" --timeout=300s

Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LIGHTRAG_NAMESPACE is advertised as configurable, but kubectl apply -f helm/ollama.yaml hard-codes metadata.namespace: lightrag for the PVC/Deployment/Service (and creates Namespace/lightrag). If LIGHTRAG_NAMESPACE is set to anything else, the script will wait in the wrong namespace and helm upgrade -n $NAMESPACE will fail unless the namespace already exists. Consider generating/applying the Ollama resources with the selected namespace (e.g., create the namespace in the script and template/sed-substitute the manifest, or move Ollama into the Helm chart so it inherits .Release.Namespace).

Copilot uses AI. Check for mistakes.
Comment on lines +74 to +78
helm upgrade --install lightrag "$CHART_DIR" \
-n "$NAMESPACE" \
-f "$VALUES_FILE" \
--set "env.MONGO_URI=$MONGO_URI" \
--wait --timeout 5m
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the connection string can't be read, MONGO_URI is set to an empty string, but the Helm install still forces --set env.MONGO_URI=$MONGO_URI, which overwrites any manually configured env.MONGO_URI in the values file with empty. If the intent is to let users edit helm/lightrag-values.yaml when auto-detection fails, the script should skip the --set env.MONGO_URI=... override when MONGO_URI is empty (or fail fast and require the user to provide it).

Suggested change
helm upgrade --install lightrag "$CHART_DIR" \
-n "$NAMESPACE" \
-f "$VALUES_FILE" \
--set "env.MONGO_URI=$MONGO_URI" \
--wait --timeout 5m
HELM_ARGS=(
upgrade --install lightrag "$CHART_DIR"
-n "$NAMESPACE"
-f "$VALUES_FILE"
--wait --timeout 5m
)
if [ -n "$MONGO_URI" ]; then
HELM_ARGS+=(--set "env.MONGO_URI=$MONGO_URI")
fi
helm "${HELM_ARGS[@]}"

Copilot uses AI. Check for mistakes.
Comment on lines +74 to +77
helm upgrade --install lightrag "$CHART_DIR" \
-n "$NAMESPACE" \
-f "$VALUES_FILE" \
--set "env.MONGO_URI=$MONGO_URI" \
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing the MongoDB URI via helm --set env.MONGO_URI=... is fragile because Helm treats commas as key separators and some passwords/URIs can contain characters that require escaping. To make this robust, prefer --set-string plus escaping commas, or write a small temporary values YAML and pass it via -f so arbitrary URIs/passwords work reliably.

Suggested change
helm upgrade --install lightrag "$CHART_DIR" \
-n "$NAMESPACE" \
-f "$VALUES_FILE" \
--set "env.MONGO_URI=$MONGO_URI" \
MONGO_VALUES_FILE="$(mktemp)"
trap 'rm -f "$MONGO_VALUES_FILE"' EXIT
ESCAPED_MONGO_URI="$(printf '%s' "$MONGO_URI" | sed "s/'/''/g")"
cat >"$MONGO_VALUES_FILE" <<EOF
env:
MONGO_URI: '$ESCAPED_MONGO_URI'
EOF
helm upgrade --install lightrag "$CHART_DIR" \
-n "$NAMESPACE" \
-f "$VALUES_FILE" \
-f "$MONGO_VALUES_FILE" \

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +24
NAMESPACE="${LIGHTRAG_NAMESPACE:-lightrag}"

echo "=== Cleaning up LightRAG deployment ==="

echo "Uninstalling LightRAG Helm release..."
helm uninstall lightrag -n "$NAMESPACE" 2>/dev/null || true

echo "Deleting PVCs..."
kubectl delete pvc -l app.kubernetes.io/name=lightrag -n "$NAMESPACE" 2>/dev/null || true

echo "Deleting Ollama..."
kubectl delete deployment ollama -n "$NAMESPACE" 2>/dev/null || true
kubectl delete service ollama -n "$NAMESPACE" 2>/dev/null || true
# ollama-models PVC is not labeled app.kubernetes.io/name=lightrag, so the
# label-selector delete above skips it. Delete it explicitly so a Ctrl-C
# before the namespace delete does not orphan the PV.
kubectl delete pvc ollama-models -n "$NAMESPACE" 2>/dev/null || true

echo "Deleting namespace..."
kubectl delete namespace "$NAMESPACE" 2>/dev/null || true
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script has the same namespace override problem as deploy.sh: it deletes resources from $NAMESPACE, but helm/ollama.yaml hard-codes namespace: lightrag. If LIGHTRAG_NAMESPACE is set to a different value, this cleanup will leave the Ollama resources (and possibly the lightrag namespace/PVC) behind. Either keep the namespace fixed everywhere, or make Ollama's namespace configurable and ensure cleanup targets the actual namespace used.

Copilot uses AI. Check for mistakes.
Comment on lines +246 to +247
| Kubernetes | 1.30+ |

Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs state the operator requires Kubernetes 1.35+ (because it uses ImageVolume). This README lists Kubernetes 1.30+ in the version compatibility table, which is inconsistent with the operator's documented prerequisites and could mislead users. Update this to 1.35+ (and/or call out that the operator, not LightRAG, drives the minimum cluster version).

Suggested change
| Kubernetes | 1.30+ |
| Kubernetes | 1.35+ |
The minimum Kubernetes version for this playground is driven by the
DocumentDB operator requirement for ImageVolume support, not by LightRAG
itself.

Copilot uses AI. Check for mistakes.

image:
repository: ghcr.io/hkuds/lightrag
tag: latest
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both this values file and the chart defaults use image.tag: latest for LightRAG. For a "last verified" playground, using latest makes deployments drift over time and can break without changes in this repo. Consider pinning the tag to a known-good version/digest (and document how to override) so the environment is reproducible.

Suggested change
tag: latest
# Pin to a known-good release for reproducible "last verified" playground
# deployments. Override explicitly at deploy time if you need a newer image.
tag: "<pin-to-known-good-version-or-digest>"

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +72
# Init container patches LightRAG's MongoDB storage layer for DocumentDB
# compatibility. DocumentDB does not support createIndex with collation,
# $listSearchIndexes, or secondary index creation (it hangs). The patch
# stubs out these calls so initialization completes cleanly.
initContainers:
- name: patch-for-documentdb
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
command: ["/app/.venv/bin/python3", "-B", "-c"]
args:
- |
import os, glob

targets = [
"/app/lightrag/kg/mongo_impl.py",
"/app/.venv/lib/python3.12/site-packages/lightrag/kg/mongo_impl.py",
]
for fp in targets:
if not os.path.exists(fp):
continue
with open(fp) as f:
content = f.read()
if "Skipping index creation (DocumentDB)" in content:
print(f"Already patched: {fp}")
continue
patched = content
patched = patched.replace(
"async def create_and_migrate_indexes_if_not_exists(self):",
"async def create_and_migrate_indexes_if_not_exists(self):\n"
" logger.info(f'[{self.workspace}] Skipping index creation (DocumentDB)')\n"
" return\n"
" async def _orig_create_indexes(self):",
)
patched = patched.replace(
"async def create_search_index_if_not_exists(self):",
"async def create_search_index_if_not_exists(self):\n"
" logger.info(f'[{self.workspace}] Skipping search index (DocumentDB)')\n"
" return\n"
" async def _orig_create_search_index(self):",
)
patched = patched.replace(
"async def create_vector_index_if_not_exists(self):",
"async def create_vector_index_if_not_exists(self):\n"
" logger.info('Skipping vector index (DocumentDB)')\n"
" return\n"
" async def _orig_create_vector_index(self):",
)
with open(fp, "w") as f:
f.write(patched)
cache_dir = os.path.join(os.path.dirname(fp), "__pycache__")
for cf in glob.glob(os.path.join(cache_dir, "mongo_impl*")):
os.remove(cf)
print(f"Patched: {fp}")
print("DocumentDB compatibility patches applied")
containers:
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The init container is patching /app/.../mongo_impl.py inside its own container filesystem, but the main container does not share that filesystem (only volumes are shared between containers). As written, these patches will not be visible to the lightrag container, so any DocumentDB compatibility issues the patch is meant to address will still occur. To make this work, apply the patch in the main container entrypoint before starting lightrag-server, or mount a shared volume at the target path(s) and have the init container write the patched file(s) into that shared volume (e.g., mount an emptyDir/projected volume and overlay the specific file via subPath).

Copilot uses AI. Check for mistakes.
Comment on lines +22 to +25
{{- range $key, $val := .Values.env -}}
{{- if not $first -}}{{- "\n" -}}{{- end -}}
{{- $first = false -}}
{{ $key }}={{ $val }}
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

range $key, $val := .Values.env iterates over a map, which is not ordered. That makes the generated .env content non-deterministic, and it also makes checksum/config in the Deployment flap between upgrades even when values don't change, causing unnecessary rollouts. Consider iterating over keys .Values.env | sortAlpha and then indexing into the map for values, so both the Secret content and checksum are stable.

Suggested change
{{- range $key, $val := .Values.env -}}
{{- if not $first -}}{{- "\n" -}}{{- end -}}
{{- $first = false -}}
{{ $key }}={{ $val }}
{{- $env := .Values.env -}}
{{- range $key := keys $env | sortAlpha -}}
{{- if not $first -}}{{- "\n" -}}{{- end -}}
{{- $first = false -}}
{{ $key }}={{ index $env $key }}

Copilot uses AI. Check for mistakes.
@documentdb-triage-tool documentdb-triage-tool Bot added documentation Improvements or additions to documentation ecosystem enhancement New feature or request labels Apr 30, 2026
@documentdb-triage-tool
Copy link
Copy Markdown

🤖 Auto-triaged by documentdb-triage-tool.

Applied: ecosystem, documentation, enhancement
Project fields suggested: Component playground · Priority P2 · Effort L · Status Needs Review
Confidence: 0.92 (mixed)

Reasoning

component from path globs (playground, docs); effort from diff stats (848+0 LOC, 14 files); LLM: Adds a substantial new self-contained LightRAG playground under documentdb-playground/ with Helm charts, deploy scripts, and documentation, touching multiple files within the playground component.

If a label is wrong, remove it manually and ping @patty-chow so the rules can be tuned. The bot will not re-label items that already have component labels.

Signed-off-by: Rayhan Hossain <rhossain@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ecosystem enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants