feat: add LightRAG + DocumentDB playground#362
feat: add LightRAG + DocumentDB playground#362hossain-rayhan wants to merge 8 commits intodocumentdb:mainfrom
Conversation
Add a self-contained playground demonstrating LightRAG (graph-based RAG engine) using DocumentDB as its MongoDB-compatible storage backend. Includes: - Helm chart with init-container that patches LightRAG for DocumentDB compatibility (skips unsupported createIndex with collation) - Ollama deployment manifest for in-cluster LLM inference - Automated deploy.sh and cleanup.sh scripts - Comprehensive README with architecture, setup, configuration, DocumentDB compatibility matrix, and troubleshooting guide Storage mapping: - KV, Graph, DocStatus → MongoKVStorage/MongoGraphStorage (DocumentDB) - Vectors → NanoVectorDBStorage (local, since DocumentDB lacks $vectorSearch) Tested end-to-end on Kind with DocumentDB Kubernetes Operator. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace manual service/secret lookups with the connection string from the DocumentDB resource status field, matching the pattern documented in the official networking docs. The status field contains embedded kubectl commands that are resolved via eval. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Document the three patched methods, why each is needed for DocumentDB compatibility, how the init container applies the patches, and the impact on functionality. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Findings from AKS E2E testing of the KEDA playground apply here too: - Fix eval quoting: use eval "echo \"...\"" to prevent & in connection string query params from being interpreted as shell background operator - Replace ClusterIP with DNS name for cross-namespace service resolution (status.connectionString uses ClusterIP) - Update README eval example to use the corrected two-step pattern Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Rayhan Hossain <rhossain@microsoft.com>
Signed-off-by: Rayhan Hossain <rhossain@microsoft.com>
There was a problem hiding this comment.
Pull request overview
Adds a self-contained LightRAG playground under documentdb-playground/lightrag/ to demonstrate LightRAG running on Kubernetes with DocumentDB as the MongoDB-compatible backend and Ollama for in-cluster inference.
Changes:
- Adds deploy/cleanup scripts to stand up Ollama + LightRAG and wire LightRAG to
DocumentDB.status.connectionString. - Introduces a local-only Helm chart for LightRAG (Service/Deployment/PVC/Secret + compatibility patch logic).
- Adds a sample
DocumentDBmanifest and a detailed README for setup, validation, and troubleshooting.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| documentdb-playground/lightrag/scripts/deploy.sh | Deploy script that applies Ollama, pulls models, derives MONGO_URI from DocumentDB status, and installs the LightRAG chart. |
| documentdb-playground/lightrag/scripts/cleanup.sh | Cleanup script for uninstalling the Helm release and deleting associated resources. |
| documentdb-playground/lightrag/helm/ollama.yaml | Standalone Ollama Namespace/PVC/Deployment/Service manifest. |
| documentdb-playground/lightrag/helm/lightrag/Chart.yaml | Helm chart metadata for the local LightRAG chart. |
| documentdb-playground/lightrag/helm/lightrag/values.yaml | Default values for the local LightRAG chart (image, resources, env, persistence). |
| documentdb-playground/lightrag/helm/lightrag/templates/deployment.yaml | LightRAG Deployment (init-container “patch” + main container + volumes). |
| documentdb-playground/lightrag/helm/lightrag/templates/service.yaml | Service exposing the LightRAG HTTP port. |
| documentdb-playground/lightrag/helm/lightrag/templates/secret.yaml | Secret providing a .env file to the pod. |
| documentdb-playground/lightrag/helm/lightrag/templates/pvc.yaml | PVCs for rag storage and inputs. |
| documentdb-playground/lightrag/helm/lightrag/templates/_helpers.tpl | Helper templates for names/labels and .env rendering. |
| documentdb-playground/lightrag/helm/lightrag/templates/NOTES.txt | Post-install instructions for accessing and testing LightRAG. |
| documentdb-playground/lightrag/helm/lightrag-values.yaml | Playground-specific override values (timeouts, storage mapping, default MONGO_URI placeholder). |
| documentdb-playground/lightrag/documentdb.yaml | Sample DocumentDB CR + credentials secret pinned to 0.110.0 images. |
| documentdb-playground/lightrag/README.md | End-to-end documentation, architecture, configuration, verification, and troubleshooting. |
| - Helm v3.0+ and kubectl configured for your cluster | ||
| - ~10 GiB free memory on a single node (Ollama needs 4–8 GiB for the default | ||
| model, LightRAG needs ~2 GiB) | ||
| - Python with `pymongo` if you want to run the verification script in |
There was a problem hiding this comment.
Prerequisites mention "Python with pymongo" to run a verification script, but the Verification section only shows curl/jq commands and there is no verification script checked into this playground directory. Either add the referenced script or adjust the prerequisite text to match what's actually provided (e.g., require jq instead).
| - Python with `pymongo` if you want to run the verification script in | |
| - `jq` if you want to run the command-line checks in |
| containers: | ||
| - name: ollama | ||
| image: ollama/ollama:latest | ||
| ports: | ||
| - containerPort: 11434 | ||
| resources: | ||
| requests: | ||
| cpu: 500m | ||
| memory: 4Gi | ||
| limits: | ||
| cpu: "4" | ||
| memory: 8Gi |
There was a problem hiding this comment.
The Ollama image is pinned to ollama/ollama:latest, which makes the playground non-reproducible and can break unexpectedly as upstream updates the tag. Consider pinning to a tested version (and optionally documenting how to override it) so the README's "tested version" claims remain accurate over time.
| NAMESPACE="${LIGHTRAG_NAMESPACE:-lightrag}" | ||
| DOCUMENTDB_NAMESPACE="${DOCUMENTDB_NAMESPACE:-documentdb-test}" | ||
| DOCUMENTDB_CLUSTER="${DOCUMENTDB_CLUSTER:-documentdb-cluster}" | ||
|
|
||
| echo "=== LightRAG + DocumentDB Deployment ===" | ||
|
|
||
| # 1. Create namespace and deploy Ollama | ||
| echo "" | ||
| echo "--- Step 1: Deploy Ollama ---" | ||
| kubectl apply -f "$OLLAMA_MANIFEST" | ||
| echo "Waiting for Ollama pod to be ready (first pull of ollama image can take a few minutes)..." | ||
| kubectl wait --for=condition=Ready pod -l app=ollama -n "$NAMESPACE" --timeout=300s | ||
|
|
There was a problem hiding this comment.
LIGHTRAG_NAMESPACE is advertised as configurable, but kubectl apply -f helm/ollama.yaml hard-codes metadata.namespace: lightrag for the PVC/Deployment/Service (and creates Namespace/lightrag). If LIGHTRAG_NAMESPACE is set to anything else, the script will wait in the wrong namespace and helm upgrade -n $NAMESPACE will fail unless the namespace already exists. Consider generating/applying the Ollama resources with the selected namespace (e.g., create the namespace in the script and template/sed-substitute the manifest, or move Ollama into the Helm chart so it inherits .Release.Namespace).
| helm upgrade --install lightrag "$CHART_DIR" \ | ||
| -n "$NAMESPACE" \ | ||
| -f "$VALUES_FILE" \ | ||
| --set "env.MONGO_URI=$MONGO_URI" \ | ||
| --wait --timeout 5m |
There was a problem hiding this comment.
When the connection string can't be read, MONGO_URI is set to an empty string, but the Helm install still forces --set env.MONGO_URI=$MONGO_URI, which overwrites any manually configured env.MONGO_URI in the values file with empty. If the intent is to let users edit helm/lightrag-values.yaml when auto-detection fails, the script should skip the --set env.MONGO_URI=... override when MONGO_URI is empty (or fail fast and require the user to provide it).
| helm upgrade --install lightrag "$CHART_DIR" \ | |
| -n "$NAMESPACE" \ | |
| -f "$VALUES_FILE" \ | |
| --set "env.MONGO_URI=$MONGO_URI" \ | |
| --wait --timeout 5m | |
| HELM_ARGS=( | |
| upgrade --install lightrag "$CHART_DIR" | |
| -n "$NAMESPACE" | |
| -f "$VALUES_FILE" | |
| --wait --timeout 5m | |
| ) | |
| if [ -n "$MONGO_URI" ]; then | |
| HELM_ARGS+=(--set "env.MONGO_URI=$MONGO_URI") | |
| fi | |
| helm "${HELM_ARGS[@]}" |
| helm upgrade --install lightrag "$CHART_DIR" \ | ||
| -n "$NAMESPACE" \ | ||
| -f "$VALUES_FILE" \ | ||
| --set "env.MONGO_URI=$MONGO_URI" \ |
There was a problem hiding this comment.
Passing the MongoDB URI via helm --set env.MONGO_URI=... is fragile because Helm treats commas as key separators and some passwords/URIs can contain characters that require escaping. To make this robust, prefer --set-string plus escaping commas, or write a small temporary values YAML and pass it via -f so arbitrary URIs/passwords work reliably.
| helm upgrade --install lightrag "$CHART_DIR" \ | |
| -n "$NAMESPACE" \ | |
| -f "$VALUES_FILE" \ | |
| --set "env.MONGO_URI=$MONGO_URI" \ | |
| MONGO_VALUES_FILE="$(mktemp)" | |
| trap 'rm -f "$MONGO_VALUES_FILE"' EXIT | |
| ESCAPED_MONGO_URI="$(printf '%s' "$MONGO_URI" | sed "s/'/''/g")" | |
| cat >"$MONGO_VALUES_FILE" <<EOF | |
| env: | |
| MONGO_URI: '$ESCAPED_MONGO_URI' | |
| EOF | |
| helm upgrade --install lightrag "$CHART_DIR" \ | |
| -n "$NAMESPACE" \ | |
| -f "$VALUES_FILE" \ | |
| -f "$MONGO_VALUES_FILE" \ |
| NAMESPACE="${LIGHTRAG_NAMESPACE:-lightrag}" | ||
|
|
||
| echo "=== Cleaning up LightRAG deployment ===" | ||
|
|
||
| echo "Uninstalling LightRAG Helm release..." | ||
| helm uninstall lightrag -n "$NAMESPACE" 2>/dev/null || true | ||
|
|
||
| echo "Deleting PVCs..." | ||
| kubectl delete pvc -l app.kubernetes.io/name=lightrag -n "$NAMESPACE" 2>/dev/null || true | ||
|
|
||
| echo "Deleting Ollama..." | ||
| kubectl delete deployment ollama -n "$NAMESPACE" 2>/dev/null || true | ||
| kubectl delete service ollama -n "$NAMESPACE" 2>/dev/null || true | ||
| # ollama-models PVC is not labeled app.kubernetes.io/name=lightrag, so the | ||
| # label-selector delete above skips it. Delete it explicitly so a Ctrl-C | ||
| # before the namespace delete does not orphan the PV. | ||
| kubectl delete pvc ollama-models -n "$NAMESPACE" 2>/dev/null || true | ||
|
|
||
| echo "Deleting namespace..." | ||
| kubectl delete namespace "$NAMESPACE" 2>/dev/null || true |
There was a problem hiding this comment.
This script has the same namespace override problem as deploy.sh: it deletes resources from $NAMESPACE, but helm/ollama.yaml hard-codes namespace: lightrag. If LIGHTRAG_NAMESPACE is set to a different value, this cleanup will leave the Ollama resources (and possibly the lightrag namespace/PVC) behind. Either keep the namespace fixed everywhere, or make Ollama's namespace configurable and ensure cleanup targets the actual namespace used.
| | Kubernetes | 1.30+ | | ||
|
|
There was a problem hiding this comment.
The docs state the operator requires Kubernetes 1.35+ (because it uses ImageVolume). This README lists Kubernetes 1.30+ in the version compatibility table, which is inconsistent with the operator's documented prerequisites and could mislead users. Update this to 1.35+ (and/or call out that the operator, not LightRAG, drives the minimum cluster version).
| | Kubernetes | 1.30+ | | |
| | Kubernetes | 1.35+ | | |
| The minimum Kubernetes version for this playground is driven by the | |
| DocumentDB operator requirement for ImageVolume support, not by LightRAG | |
| itself. |
|
|
||
| image: | ||
| repository: ghcr.io/hkuds/lightrag | ||
| tag: latest |
There was a problem hiding this comment.
Both this values file and the chart defaults use image.tag: latest for LightRAG. For a "last verified" playground, using latest makes deployments drift over time and can break without changes in this repo. Consider pinning the tag to a known-good version/digest (and document how to override) so the environment is reproducible.
| tag: latest | |
| # Pin to a known-good release for reproducible "last verified" playground | |
| # deployments. Override explicitly at deploy time if you need a newer image. | |
| tag: "<pin-to-known-good-version-or-digest>" |
| # Init container patches LightRAG's MongoDB storage layer for DocumentDB | ||
| # compatibility. DocumentDB does not support createIndex with collation, | ||
| # $listSearchIndexes, or secondary index creation (it hangs). The patch | ||
| # stubs out these calls so initialization completes cleanly. | ||
| initContainers: | ||
| - name: patch-for-documentdb | ||
| image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}" | ||
| command: ["/app/.venv/bin/python3", "-B", "-c"] | ||
| args: | ||
| - | | ||
| import os, glob | ||
|
|
||
| targets = [ | ||
| "/app/lightrag/kg/mongo_impl.py", | ||
| "/app/.venv/lib/python3.12/site-packages/lightrag/kg/mongo_impl.py", | ||
| ] | ||
| for fp in targets: | ||
| if not os.path.exists(fp): | ||
| continue | ||
| with open(fp) as f: | ||
| content = f.read() | ||
| if "Skipping index creation (DocumentDB)" in content: | ||
| print(f"Already patched: {fp}") | ||
| continue | ||
| patched = content | ||
| patched = patched.replace( | ||
| "async def create_and_migrate_indexes_if_not_exists(self):", | ||
| "async def create_and_migrate_indexes_if_not_exists(self):\n" | ||
| " logger.info(f'[{self.workspace}] Skipping index creation (DocumentDB)')\n" | ||
| " return\n" | ||
| " async def _orig_create_indexes(self):", | ||
| ) | ||
| patched = patched.replace( | ||
| "async def create_search_index_if_not_exists(self):", | ||
| "async def create_search_index_if_not_exists(self):\n" | ||
| " logger.info(f'[{self.workspace}] Skipping search index (DocumentDB)')\n" | ||
| " return\n" | ||
| " async def _orig_create_search_index(self):", | ||
| ) | ||
| patched = patched.replace( | ||
| "async def create_vector_index_if_not_exists(self):", | ||
| "async def create_vector_index_if_not_exists(self):\n" | ||
| " logger.info('Skipping vector index (DocumentDB)')\n" | ||
| " return\n" | ||
| " async def _orig_create_vector_index(self):", | ||
| ) | ||
| with open(fp, "w") as f: | ||
| f.write(patched) | ||
| cache_dir = os.path.join(os.path.dirname(fp), "__pycache__") | ||
| for cf in glob.glob(os.path.join(cache_dir, "mongo_impl*")): | ||
| os.remove(cf) | ||
| print(f"Patched: {fp}") | ||
| print("DocumentDB compatibility patches applied") | ||
| containers: |
There was a problem hiding this comment.
The init container is patching /app/.../mongo_impl.py inside its own container filesystem, but the main container does not share that filesystem (only volumes are shared between containers). As written, these patches will not be visible to the lightrag container, so any DocumentDB compatibility issues the patch is meant to address will still occur. To make this work, apply the patch in the main container entrypoint before starting lightrag-server, or mount a shared volume at the target path(s) and have the init container write the patched file(s) into that shared volume (e.g., mount an emptyDir/projected volume and overlay the specific file via subPath).
| {{- range $key, $val := .Values.env -}} | ||
| {{- if not $first -}}{{- "\n" -}}{{- end -}} | ||
| {{- $first = false -}} | ||
| {{ $key }}={{ $val }} |
There was a problem hiding this comment.
range $key, $val := .Values.env iterates over a map, which is not ordered. That makes the generated .env content non-deterministic, and it also makes checksum/config in the Deployment flap between upgrades even when values don't change, causing unnecessary rollouts. Consider iterating over keys .Values.env | sortAlpha and then indexing into the map for values, so both the Secret content and checksum are stable.
| {{- range $key, $val := .Values.env -}} | |
| {{- if not $first -}}{{- "\n" -}}{{- end -}} | |
| {{- $first = false -}} | |
| {{ $key }}={{ $val }} | |
| {{- $env := .Values.env -}} | |
| {{- range $key := keys $env | sortAlpha -}} | |
| {{- if not $first -}}{{- "\n" -}}{{- end -}} | |
| {{- $first = false -}} | |
| {{ $key }}={{ index $env $key }} |
|
🤖 Auto-triaged by documentdb-triage-tool. Applied: Reasoningcomponent from path globs (playground, docs); effort from diff stats (848+0 LOC, 14 files); LLM: Adds a substantial new self-contained LightRAG playground under documentdb-playground/ with Helm charts, deploy scripts, and documentation, touching multiple files within the playground component. If a label is wrong, remove it manually and ping |
Signed-off-by: Rayhan Hossain <rhossain@microsoft.com>
Summary
Adds a self-contained playground under
documentdb-playground/lightrag/that deploys LightRAG — a graph-based RAG engine — backed by DocumentDB as its MongoDB-compatible storage layer, with Ollama for in-cluster LLM and embedding inference.What's included
documentdb.yaml) sized for the playground, pinned to operator images0.110.0(required for the_idlookup fix).helm/lightrag/) with an init container that no-ops threemongo_impl.pyindex-creation methods that DocumentDB doesn't support (createIndexwith collation,$listSearchIndexes, Atlas$vectorSearch). LightRAG runs unmodified otherwise.helm/ollama.yaml) with a 10 GiB PVC for model persistence, sized forqwen2.5:3b+nomic-embed-texton a Standard_D4-class node.scripts/deploy.sh/scripts/cleanup.sh— pulls the DocumentDB connection string fromstatus.connectionString, resolves embedded credential lookups, swaps ClusterIP for in-cluster DNS, strips the conflictingreplicaSet=rs0parameter, andhelm upgrade --installs the chart.README.md— architecture diagram, quick start, OpenAI swap, slow-CPU tuning section, compatibility-patches table, verification, and troubleshooting for the three failure modes I hit during validation (replica-set conflict, Ollama OOM,LLM_TIMEOUT).Verification
End-to-end deploy on AKS (2× Standard_D4, k8s 1.35, DocumentDB operator 0.2.0, images 0.110.0):
helm install lightragsucceeds (revision 3)helm lintcleanStorage choices
MongoKVStorageMongoGraphStorageMongoDocStatusStorageNanoVectorDBStorage$vectorSearchScope
documentdb-playground/lightrag/only.helm/lightragchart is intentionally local-only — playground sample, not a published product chart.Checklist
helm lintpassesbash -nclean on both scripts