feat: add LightRAG + DocumentDB playground by hossain-rayhan · Pull Request #362 · documentdb/documentdb-kubernetes-operator

hossain-rayhan · 2026-04-30T18:18:57Z

Summary

Adds a self-contained playground under documentdb-playground/lightrag/ that deploys LightRAG — a graph-based RAG engine — backed by DocumentDB as its MongoDB-compatible storage layer, with Ollama for in-cluster LLM and embedding inference.

What's included

Sample DocumentDB instance (documentdb.yaml) sized for the playground, pinned to operator images 0.110.0 (required for the _id lookup fix).
Local Helm chart (helm/lightrag/) with an init container that no-ops three mongo_impl.py index-creation methods that DocumentDB doesn't support (createIndex with collation, $listSearchIndexes, Atlas $vectorSearch). LightRAG runs unmodified otherwise.
Ollama deployment (helm/ollama.yaml) with a 10 GiB PVC for model persistence, sized for qwen2.5:3b + nomic-embed-text on a Standard_D4-class node.
scripts/deploy.sh / scripts/cleanup.sh — pulls the DocumentDB connection string from status.connectionString, resolves embedded credential lookups, swaps ClusterIP for in-cluster DNS, strips the conflicting replicaSet=rs0 parameter, and helm upgrade --installs the chart.
README.md — architecture diagram, quick start, OpenAI swap, slow-CPU tuning section, compatibility-patches table, verification, and troubleshooting for the three failure modes I hit during validation (replica-set conflict, Ollama OOM, LLM_TIMEOUT).

Verification

End-to-end deploy on AKS (2× Standard_D4, k8s 1.35, DocumentDB operator 0.2.0, images 0.110.0):

DocumentDB cluster healthy
helm install lightrag succeeds (revision 3)
Document insertion → entity extraction → graph build → query verified working through the WebUI
helm lint clean

Storage choices

Store	Backend	Notes
KV	`MongoKVStorage`	Documents, chunks, entities
Graph	`MongoGraphStorage`	Knowledge graph nodes and edges
Doc status	`MongoDocStatusStorage`	Per-document processing state
Vectors	`NanoVectorDBStorage`	Local file (PVC) — DocumentDB has no `$vectorSearch`

Scope

Adds files under documentdb-playground/lightrag/ only.
No changes to the operator, CRDs, Helm chart, or CI workflows.
The helm/lightrag chart is intentionally local-only — playground sample, not a published product chart.

Checklist

helm lint passes
bash -n clean on both scripts
No secrets / personal identifiers in tree
DCO sign-off on all commits
Verified end-to-end on AKS

Add a self-contained playground demonstrating LightRAG (graph-based RAG engine) using DocumentDB as its MongoDB-compatible storage backend. Includes: - Helm chart with init-container that patches LightRAG for DocumentDB compatibility (skips unsupported createIndex with collation) - Ollama deployment manifest for in-cluster LLM inference - Automated deploy.sh and cleanup.sh scripts - Comprehensive README with architecture, setup, configuration, DocumentDB compatibility matrix, and troubleshooting guide Storage mapping: - KV, Graph, DocStatus → MongoKVStorage/MongoGraphStorage (DocumentDB) - Vectors → NanoVectorDBStorage (local, since DocumentDB lacks $vectorSearch) Tested end-to-end on Kind with DocumentDB Kubernetes Operator. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace manual service/secret lookups with the connection string from the DocumentDB resource status field, matching the pattern documented in the official networking docs. The status field contains embedded kubectl commands that are resolved via eval. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Document the three patched methods, why each is needed for DocumentDB compatibility, how the init container applies the patches, and the impact on functionality. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Findings from AKS E2E testing of the KEDA playground apply here too: - Fix eval quoting: use eval "echo \"...\"" to prevent & in connection string query params from being interpreted as shell background operator - Replace ClusterIP with DNS name for cross-namespace service resolution (status.connectionString uses ClusterIP) - Update README eval example to use the corrected two-step pattern Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Signed-off-by: Rayhan Hossain <rhossain@microsoft.com>

Copilot

Pull request overview

Adds a self-contained LightRAG playground under documentdb-playground/lightrag/ to demonstrate LightRAG running on Kubernetes with DocumentDB as the MongoDB-compatible backend and Ollama for in-cluster inference.

Changes:

Adds deploy/cleanup scripts to stand up Ollama + LightRAG and wire LightRAG to DocumentDB.status.connectionString.
Introduces a local-only Helm chart for LightRAG (Service/Deployment/PVC/Secret + compatibility patch logic).
Adds a sample DocumentDB manifest and a detailed README for setup, validation, and troubleshooting.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
documentdb-playground/lightrag/scripts/deploy.sh	Deploy script that applies Ollama, pulls models, derives MONGO_URI from DocumentDB status, and installs the LightRAG chart.
documentdb-playground/lightrag/scripts/cleanup.sh	Cleanup script for uninstalling the Helm release and deleting associated resources.
documentdb-playground/lightrag/helm/ollama.yaml	Standalone Ollama Namespace/PVC/Deployment/Service manifest.
documentdb-playground/lightrag/helm/lightrag/Chart.yaml	Helm chart metadata for the local LightRAG chart.
documentdb-playground/lightrag/helm/lightrag/values.yaml	Default values for the local LightRAG chart (image, resources, env, persistence).
documentdb-playground/lightrag/helm/lightrag/templates/deployment.yaml	LightRAG Deployment (init-container “patch” + main container + volumes).
documentdb-playground/lightrag/helm/lightrag/templates/service.yaml	Service exposing the LightRAG HTTP port.
documentdb-playground/lightrag/helm/lightrag/templates/secret.yaml	Secret providing a `.env` file to the pod.
documentdb-playground/lightrag/helm/lightrag/templates/pvc.yaml	PVCs for rag storage and inputs.
documentdb-playground/lightrag/helm/lightrag/templates/_helpers.tpl	Helper templates for names/labels and `.env` rendering.
documentdb-playground/lightrag/helm/lightrag/templates/NOTES.txt	Post-install instructions for accessing and testing LightRAG.
documentdb-playground/lightrag/helm/lightrag-values.yaml	Playground-specific override values (timeouts, storage mapping, default MONGO_URI placeholder).
documentdb-playground/lightrag/documentdb.yaml	Sample DocumentDB CR + credentials secret pinned to 0.110.0 images.
documentdb-playground/lightrag/README.md	End-to-end documentation, architecture, configuration, verification, and troubleshooting.

Copilot · 2026-04-30T18:26:04Z

+- Helm v3.0+ and kubectl configured for your cluster
+- ~10 GiB free memory on a single node (Ollama needs 4–8 GiB for the default
+  model, LightRAG needs ~2 GiB)
+- Python with `pymongo` if you want to run the verification script in


Prerequisites mention "Python with pymongo" to run a verification script, but the Verification section only shows curl/jq commands and there is no verification script checked into this playground directory. Either add the referenced script or adjust the prerequisite text to match what's actually provided (e.g., require jq instead).

Suggested change

- Python with `pymongo` if you want to run the verification script in

- `jq` if you want to run the command-line checks in

Copilot · 2026-04-30T18:26:05Z

+      containers:
+      - name: ollama
+        image: ollama/ollama:latest
+        ports:
+        - containerPort: 11434
+        resources:
+          requests:
+            cpu: 500m
+            memory: 4Gi
+          limits:
+            cpu: "4"
+            memory: 8Gi


The Ollama image is pinned to ollama/ollama:latest, which makes the playground non-reproducible and can break unexpectedly as upstream updates the tag. Consider pinning to a tested version (and optionally documenting how to override it) so the README's "tested version" claims remain accurate over time.

Copilot · 2026-04-30T18:26:05Z

+NAMESPACE="${LIGHTRAG_NAMESPACE:-lightrag}"
+DOCUMENTDB_NAMESPACE="${DOCUMENTDB_NAMESPACE:-documentdb-test}"
+DOCUMENTDB_CLUSTER="${DOCUMENTDB_CLUSTER:-documentdb-cluster}"
+
+echo "=== LightRAG + DocumentDB Deployment ==="
+
+# 1. Create namespace and deploy Ollama
+echo ""
+echo "--- Step 1: Deploy Ollama ---"
+kubectl apply -f "$OLLAMA_MANIFEST"
+echo "Waiting for Ollama pod to be ready (first pull of ollama image can take a few minutes)..."
+kubectl wait --for=condition=Ready pod -l app=ollama -n "$NAMESPACE" --timeout=300s
+


LIGHTRAG_NAMESPACE is advertised as configurable, but kubectl apply -f helm/ollama.yaml hard-codes metadata.namespace: lightrag for the PVC/Deployment/Service (and creates Namespace/lightrag). If LIGHTRAG_NAMESPACE is set to anything else, the script will wait in the wrong namespace and helm upgrade -n $NAMESPACE will fail unless the namespace already exists. Consider generating/applying the Ollama resources with the selected namespace (e.g., create the namespace in the script and template/sed-substitute the manifest, or move Ollama into the Helm chart so it inherits .Release.Namespace).

Copilot · 2026-04-30T18:26:06Z

+helm upgrade --install lightrag "$CHART_DIR" \
+    -n "$NAMESPACE" \
+    -f "$VALUES_FILE" \
+    --set "env.MONGO_URI=$MONGO_URI" \
+    --wait --timeout 5m


When the connection string can't be read, MONGO_URI is set to an empty string, but the Helm install still forces --set env.MONGO_URI=$MONGO_URI, which overwrites any manually configured env.MONGO_URI in the values file with empty. If the intent is to let users edit helm/lightrag-values.yaml when auto-detection fails, the script should skip the --set env.MONGO_URI=... override when MONGO_URI is empty (or fail fast and require the user to provide it).

Suggested change

helm upgrade --install lightrag "$CHART_DIR" \

-n "$NAMESPACE" \

-f "$VALUES_FILE" \

--set "env.MONGO_URI=$MONGO_URI" \

--wait --timeout 5m

HELM_ARGS=(

upgrade --install lightrag "$CHART_DIR"

-n "$NAMESPACE"

-f "$VALUES_FILE"

--wait --timeout 5m

)

if [ -n "$MONGO_URI" ]; then

HELM_ARGS+=(--set "env.MONGO_URI=$MONGO_URI")

fi

helm "${HELM_ARGS[@]}"

Copilot · 2026-04-30T18:26:06Z

+helm upgrade --install lightrag "$CHART_DIR" \
+    -n "$NAMESPACE" \
+    -f "$VALUES_FILE" \
+    --set "env.MONGO_URI=$MONGO_URI" \


Passing the MongoDB URI via helm --set env.MONGO_URI=... is fragile because Helm treats commas as key separators and some passwords/URIs can contain characters that require escaping. To make this robust, prefer --set-string plus escaping commas, or write a small temporary values YAML and pass it via -f so arbitrary URIs/passwords work reliably.

Suggested change

helm upgrade --install lightrag "$CHART_DIR" \

-n "$NAMESPACE" \

-f "$VALUES_FILE" \

--set "env.MONGO_URI=$MONGO_URI" \

MONGO_VALUES_FILE="$(mktemp)"

trap 'rm -f "$MONGO_VALUES_FILE"' EXIT

ESCAPED_MONGO_URI="$(printf '%s' "$MONGO_URI" | sed "s/'/''/g")"

cat >"$MONGO_VALUES_FILE" <<EOF

env:

MONGO_URI: '$ESCAPED_MONGO_URI'

EOF

helm upgrade --install lightrag "$CHART_DIR" \

-n "$NAMESPACE" \

-f "$VALUES_FILE" \

-f "$MONGO_VALUES_FILE" \

Copilot · 2026-04-30T18:26:06Z

+NAMESPACE="${LIGHTRAG_NAMESPACE:-lightrag}"
+
+echo "=== Cleaning up LightRAG deployment ==="
+
+echo "Uninstalling LightRAG Helm release..."
+helm uninstall lightrag -n "$NAMESPACE" 2>/dev/null || true
+
+echo "Deleting PVCs..."
+kubectl delete pvc -l app.kubernetes.io/name=lightrag -n "$NAMESPACE" 2>/dev/null || true
+
+echo "Deleting Ollama..."
+kubectl delete deployment ollama -n "$NAMESPACE" 2>/dev/null || true
+kubectl delete service ollama -n "$NAMESPACE" 2>/dev/null || true
+# ollama-models PVC is not labeled app.kubernetes.io/name=lightrag, so the
+# label-selector delete above skips it. Delete it explicitly so a Ctrl-C
+# before the namespace delete does not orphan the PV.
+kubectl delete pvc ollama-models -n "$NAMESPACE" 2>/dev/null || true
+
+echo "Deleting namespace..."
+kubectl delete namespace "$NAMESPACE" 2>/dev/null || true


This script has the same namespace override problem as deploy.sh: it deletes resources from $NAMESPACE, but helm/ollama.yaml hard-codes namespace: lightrag. If LIGHTRAG_NAMESPACE is set to a different value, this cleanup will leave the Ollama resources (and possibly the lightrag namespace/PVC) behind. Either keep the namespace fixed everywhere, or make Ollama's namespace configurable and ensure cleanup targets the actual namespace used.

Copilot · 2026-04-30T18:26:06Z

+| Kubernetes          | 1.30+                         |
+


The docs state the operator requires Kubernetes 1.35+ (because it uses ImageVolume). This README lists Kubernetes 1.30+ in the version compatibility table, which is inconsistent with the operator's documented prerequisites and could mislead users. Update this to 1.35+ (and/or call out that the operator, not LightRAG, drives the minimum cluster version).

Suggested change

| Kubernetes | 1.30+ |

| Kubernetes | 1.35+ |

The minimum Kubernetes version for this playground is driven by the

DocumentDB operator requirement for ImageVolume support, not by LightRAG

itself.

Copilot · 2026-04-30T18:26:07Z

+
+image:
+  repository: ghcr.io/hkuds/lightrag
+  tag: latest


Both this values file and the chart defaults use image.tag: latest for LightRAG. For a "last verified" playground, using latest makes deployments drift over time and can break without changes in this repo. Consider pinning the tag to a known-good version/digest (and document how to override) so the environment is reproducible.

Suggested change

tag: latest

# Pin to a known-good release for reproducible "last verified" playground

# deployments. Override explicitly at deploy time if you need a newer image.

tag: "<pin-to-known-good-version-or-digest>"

Copilot · 2026-04-30T18:26:07Z

+      # Init container patches LightRAG's MongoDB storage layer for DocumentDB
+      # compatibility. DocumentDB does not support createIndex with collation,
+      # $listSearchIndexes, or secondary index creation (it hangs). The patch
+      # stubs out these calls so initialization completes cleanly.
+      initContainers:
+        - name: patch-for-documentdb
+          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
+          command: ["/app/.venv/bin/python3", "-B", "-c"]
+          args:
+            - |
+              import os, glob
+
+              targets = [
+                  "/app/lightrag/kg/mongo_impl.py",
+                  "/app/.venv/lib/python3.12/site-packages/lightrag/kg/mongo_impl.py",
+              ]
+              for fp in targets:
+                  if not os.path.exists(fp):
+                      continue
+                  with open(fp) as f:
+                      content = f.read()
+                  if "Skipping index creation (DocumentDB)" in content:
+                      print(f"Already patched: {fp}")
+                      continue
+                  patched = content
+                  patched = patched.replace(
+                      "async def create_and_migrate_indexes_if_not_exists(self):",
+                      "async def create_and_migrate_indexes_if_not_exists(self):\n"
+                      "        logger.info(f'[{self.workspace}] Skipping index creation (DocumentDB)')\n"
+                      "        return\n"
+                      "    async def _orig_create_indexes(self):",
+                  )
+                  patched = patched.replace(
+                      "async def create_search_index_if_not_exists(self):",
+                      "async def create_search_index_if_not_exists(self):\n"
+                      "        logger.info(f'[{self.workspace}] Skipping search index (DocumentDB)')\n"
+                      "        return\n"
+                      "    async def _orig_create_search_index(self):",
+                  )
+                  patched = patched.replace(
+                      "async def create_vector_index_if_not_exists(self):",
+                      "async def create_vector_index_if_not_exists(self):\n"
+                      "        logger.info('Skipping vector index (DocumentDB)')\n"
+                      "        return\n"
+                      "    async def _orig_create_vector_index(self):",
+                  )
+                  with open(fp, "w") as f:
+                      f.write(patched)
+                  cache_dir = os.path.join(os.path.dirname(fp), "__pycache__")
+                  for cf in glob.glob(os.path.join(cache_dir, "mongo_impl*")):
+                      os.remove(cf)
+                  print(f"Patched: {fp}")
+              print("DocumentDB compatibility patches applied")
+      containers:


The init container is patching /app/.../mongo_impl.py inside its own container filesystem, but the main container does not share that filesystem (only volumes are shared between containers). As written, these patches will not be visible to the lightrag container, so any DocumentDB compatibility issues the patch is meant to address will still occur. To make this work, apply the patch in the main container entrypoint before starting lightrag-server, or mount a shared volume at the target path(s) and have the init container write the patched file(s) into that shared volume (e.g., mount an emptyDir/projected volume and overlay the specific file via subPath).

Copilot · 2026-04-30T18:26:07Z

+{{- range $key, $val := .Values.env -}}
+{{- if not $first -}}{{- "\n" -}}{{- end -}}
+{{- $first = false -}}
+{{ $key }}={{ $val }}


range $key, $val := .Values.env iterates over a map, which is not ordered. That makes the generated .env content non-deterministic, and it also makes checksum/config in the Deployment flap between upgrades even when values don't change, causing unnecessary rollouts. Consider iterating over keys .Values.env | sortAlpha and then indexing into the map for values, so both the Secret content and checksum are stable.

Suggested change

{{- range $key, $val := .Values.env -}}

{{- if not $first -}}{{- "\n" -}}{{- end -}}

{{- $first = false -}}

{{ $key }}={{ $val }}

{{- $env := .Values.env -}}

{{- range $key := keys $env | sortAlpha -}}

{{- if not $first -}}{{- "\n" -}}{{- end -}}

{{- $first = false -}}

{{ $key }}={{ index $env $key }}

documentdb-triage-tool · 2026-04-30T19:47:50Z

🤖 Auto-triaged by documentdb-triage-tool.

Applied: ecosystem, documentation, enhancement
Project fields suggested: Component playground · Priority P2 · Effort L · Status Needs Review
Confidence: 0.92 (mixed)

Reasoning

component from path globs (playground, docs); effort from diff stats (848+0 LOC, 14 files); LLM: Adds a substantial new self-contained LightRAG playground under documentdb-playground/ with Helm charts, deploy scripts, and documentation, touching multiple files within the playground component.

If a label is wrong, remove it manually and ping @patty-chow so the rules can be tuned. The bot will not re-label items that already have component labels.

Signed-off-by: Rayhan Hossain <rhossain@microsoft.com>

German and others added 7 commits April 30, 2026 11:09

docs: add init-container patch explanation to README

4058428

Document the three patched methods, why each is needed for DocumentDB compatibility, how the init container applies the patches, and the impact on functionality. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

LightRAG with DocumentDB custom build

7f49da2

Signed-off-by: Rayhan Hossain <rhossain@microsoft.com>

Use latest gateway image and tune config values

29298bb

Signed-off-by: Rayhan Hossain <rhossain@microsoft.com>

Revert accidental changes to build_documentdb_images.yml

deb2e8d

Copilot AI review requested due to automatic review settings April 30, 2026 18:18

hossain-rayhan requested review from WentingWu666666, alaye-ms and xgerman as code owners April 30, 2026 18:18

Copilot started reviewing on behalf of hossain-rayhan April 30, 2026 18:21 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

documentdb-triage-tool Bot added documentation Improvements or additions to documentation ecosystem enhancement New feature or request labels Apr 30, 2026

Remove unused mongo implementation patch init container

12c9d88

Signed-off-by: Rayhan Hossain <rhossain@microsoft.com>

hossain-rayhan mentioned this pull request May 1, 2026

k8s-deploy: add DocumentDB backend (MongoDB-compatible) HKUDS/LightRAG#3009

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add LightRAG + DocumentDB playground#362

feat: add LightRAG + DocumentDB playground#362
hossain-rayhan wants to merge 8 commits intodocumentdb:mainfrom
hossain-rayhan:rayhan/lightrag-playground

hossain-rayhan commented Apr 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

documentdb-triage-tool Bot commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	- Python with `pymongo` if you want to run the verification script in
	- `jq` if you want to run the command-line checks in

-| Kubernetes          | 1.30+                         |
+| Kubernetes          | 1.35+                         |
+The minimum Kubernetes version for this playground is driven by the
+DocumentDB operator requirement for ImageVolume support, not by LightRAG
+itself.

-  tag: latest
+  # Pin to a known-good release for reproducible "last verified" playground
+  # deployments. Override explicitly at deploy time if you need a newer image.
+  tag: "<pin-to-known-good-version-or-digest>"

Conversation

hossain-rayhan commented Apr 30, 2026

Summary

What's included

Verification

Storage choices

Scope

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

documentdb-triage-tool Bot commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants