weaviate · parikshitiiitb · Apr 29, 2026 · Apr 29, 2026 · Apr 29, 2026 · Apr 29, 2026
diff --git a/integrations/llm-agent-frameworks/langchain/rag-with-multi-tenancy/.env.example b/integrations/llm-agent-frameworks/langchain/rag-with-multi-tenancy/.env.example
@@ -0,0 +1,2 @@
+# No keys needed for local development
+# Run with: jupyter notebook
diff --git a/integrations/llm-agent-frameworks/langchain/rag-with-multi-tenancy/README.md b/integrations/llm-agent-frameworks/langchain/rag-with-multi-tenancy/README.md
@@ -0,0 +1,34 @@
+# Enterprise Multi-Tenant RAG with Weaviate + LangChain
+
+A production pattern for RAG systems serving multiple isolated tenants 
+from a single Weaviate collection: the architecture behind enterprise 
+internal knowledge bases.
+
+## What this covers
+
+- Multi tenant collection setup with per-tenant isolation
+- Scoped document ingestion per tenant
+- Tenant aware hybrid search (semantic + BM25)
+- Minimal LangChain integration demonstrating tenant aware retrieval
+- Basic handling of empty tenant queries
+
+## When to use this pattern
+
+Use multi tenancy when:
+- Multiple teams/BUs share infrastructure but need data isolation
+- You can't afford separate Weaviate instances per team
+- You need tenant-level access control
+
+## Setup
+
+### Local (default)
+1. `pip install -r requirements.txt`
+2. Run the notebook top to bottom
+
+### Optional: Production (Weaviate Cloud + OpenAI)
+See the "MODE 2" section in the notebook for migration steps.
+
+## Author
+
+[Parikshit Sharma](https://github.com/parikshitiiitb) — 
+Principal ML Engineer, production RAG systems
diff --git a/...ns/llm-agent-frameworks/langchain/rag-with-multi-tenancy/enterprise_multitenant_rag.ipynb b/...ns/llm-agent-frameworks/langchain/rag-with-multi-tenancy/enterprise_multitenant_rag.ipynb
@@ -0,0 +1,309 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# RUNNING THIS NOTEBOOK\n",
+        "\n",
+        "## MODE 1 - Local (default, no setup needed)\n",
+        "Uses Weaviate Embedded + sentence-transformers. Runs on CPU. Free.\n",
+        "\n",
+        "Run:\n",
+        "`jupyter notebook`\n",
+        "\n",
+        "## MODE 2 - Production (Weaviate Cloud + OpenAI)\n",
+        "1. Create a free cluster at console.weaviate.cloud, then copy your cluster URL and API key.\n",
+        "2. Get an OpenAI API key at platform.openai.com.\n",
+        "3. Replace Cell 1 with:\n",
+        "\n",
+        "```python\n",
+        "client = weaviate.connect_to_weaviate_cloud(\n",
+        "    cluster_url=\"YOUR_WEAVIATE_URL\",\n",
+        "    auth_credentials=weaviate.auth.AuthApiKey(\"YOUR_WEAVIATE_API_KEY\"),\n",
+        "    headers={\"X-OpenAI-Api-Key\": \"YOUR_OPENAI_API_KEY\"},\n",
+        ")\n",
+        "```\n",
+        "\n",
+        "4. Replace collection creation in Cell 2 with:\n",
+        "\n",
+        "```python\n",
+        "client.collections.create(\n",
+        "    name=\"EnterpriseKnowledgeBase\",\n",
+        "    multi_tenancy_config=Configure.multi_tenancy(enabled=True),\n",
+        "    vectorizer_config=Configure.Vectorizer.text2vec_openai(),\n",
+        "    ...\n",
+        ")\n",
+        "```\n",
+        "\n",
+        "5. Replace `tenant_search()` in Cell 5 with LangChain `WeaviateVectorStore` (see langchain-weaviate docs for a full example)."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# Enterprise Multi-Tenant RAG with Weaviate + LangChain\n",
+        "\n",
+        "This notebook demonstrates a **production-style multi-tenant RAG system** using Weaviate.\n",
+        "\n",
+        "Key features:\n",
+        "- Tenant isolation\n",
+        "- Hybrid search (vector + BM25)\n",
+        "- Local embeddings (no API keys)\n",
+        "- LangChain-style retrieval"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## 1. Setup Weaviate (Local Embedded) + Create Collection"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import weaviate\n",
+        "from weaviate.embedded import EmbeddedOptions\n",
+        "from weaviate.classes.config import Configure, Property, DataType\n",
+        "\n",
+        "client = weaviate.WeaviateClient(\n",
+        "    embedded_options=EmbeddedOptions()\n",
+        ")\n",
+        "\n",
+        "client.connect()\n",
+        "print(\"Connected:\", client.is_ready())\n",
+        "\n",
+        "\n",
+        "#Create Collection\n",
+        "client.collections.delete(\"EnterpriseKnowledgeBase\")\n",
+        "\n",
+        "client.collections.create(\n",
+        "    name=\"EnterpriseKnowledgeBase\",\n",
+        "    multi_tenancy_config=Configure.multi_tenancy(enabled=True),\n",
+        "    properties=[\n",
+        "        Property(name=\"content\", data_type=DataType.TEXT),\n",
+        "        Property(name=\"source\", data_type=DataType.TEXT),\n",
+        "        Property(name=\"department\", data_type=DataType.TEXT),\n",
+        "    ]\n",
+        ")\n",
+        "\n",
+        "print(\"Collection created\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## 2. Create Tenants (Isolation Layer)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "from weaviate.classes.tenants import Tenant\n",
+        "\n",
+        "collection = client.collections.get(\"EnterpriseKnowledgeBase\")\n",
+        "existing = collection.tenants.get()\n",
+        "\n",
+        "if not existing:\n",
+        "    collection.tenants.create([\n",
+        "        Tenant(name=\"engineering\"),\n",
+        "        Tenant(name=\"finance\"),\n",
+        "        Tenant(name=\"legal\"),\n",
+        "    ])\n",
+        "\n",
+        "print(\"Tenants ensured\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## 3. Local Embeddings (No API Key Needed)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "from sentence_transformers import SentenceTransformer\n",
+        "\n",
+        "model = SentenceTransformer(\"all-MiniLM-L6-v2\")\n",
+        "\n",
+        "def embed(text):\n",
+        "    return model.encode(text).tolist()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## 4. Ingest Documents per Tenant\n",
+        "\n",
+        "Each tenant gets completely isolated data."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "engineering_docs = [\n",
+        "    {\"content\": \"RAG uses semantic chunking.\", \"source\": \"eng\", \"department\": \"engineering\"},\n",
+        "    {\"content\": \"Pipeline serves 5M predictions/day.\", \"source\": \"eng\", \"department\": \"engineering\"},\n",
+        "]\n",
+        "\n",
+        "finance_docs = [\n",
+        "    {\"content\": \"Budget allocated $2.4M.\", \"source\": \"fin\", \"department\": \"finance\"},\n",
+        "    {\"content\": \"Cloud cost reduced 38%.\", \"source\": \"fin\", \"department\": \"finance\"},\n",
+        "]\n",
+        "\n",
+        "eng = collection.with_tenant(\"engineering\")\n",
+        "fin = collection.with_tenant(\"finance\")\n",
+        "\n",
+        "with eng.batch.dynamic() as batch:\n",
+        "    for doc in engineering_docs:\n",
+        "        batch.add_object(\n",
+        "            properties=doc,\n",
+        "            vector=embed(doc[\"content\"])\n",
+        "        )\n",
+        "\n",
+        "with fin.batch.dynamic() as batch:\n",
+        "    for doc in finance_docs:\n",
+        "        batch.add_object(\n",
+        "            properties=doc,\n",
+        "            vector=embed(doc[\"content\"])\n",
+        "        )\n",
+        "\n",
+        "print(\"Data ingested\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## 5. Hybrid Search (Vector + Keyword)\n",
+        "\n",
+        "This is critical for production RAG."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "def tenant_search(tenant_name, query, top_k=2):\n",
+        "    col = client.collections.get(\"EnterpriseKnowledgeBase\")\n",
+        "    tenant_col = col.with_tenant(tenant_name)\n",
+        "\n",
+        "    query_vec = embed(query)\n",
+        "\n",
+        "    results = tenant_col.query.hybrid(\n",
+        "        query=query,\n",
+        "        vector=query_vec,\n",
+        "        alpha=0.5,\n",
+        "        limit=top_k,\n",
+        "        return_properties=[\"content\", \"source\", \"department\"]\n",
+        "    )\n",
+        "\n",
+        "    if not results.objects:\n",
+        "        print(f\"No results for tenant: {tenant_name}\")\n",
+        "        return []\n",
+        "\n",
+        "    return results.objects\n",
+        "\n",
+        "results = tenant_search(\"engineering\", \"RAG system\")\n",
+        "for r in results:\n",
+        "    print(r.properties[\"content\"])"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## 6. Metadata Filtering (Enterprise Control Layer)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "from weaviate.classes.query import Filter\n",
+        "\n",
+        "def filtered_search(tenant_name, query):\n",
+        "    col = client.collections.get(\"EnterpriseKnowledgeBase\")\n",
+        "    tenant_col = col.with_tenant(tenant_name)\n",
+        "\n",
+        "    results = tenant_col.query.hybrid(\n",
+        "        query=query,\n",
+        "        vector=embed(query),\n",
+        "        alpha=0.5,\n",
+        "        filters=Filter.by_property(\"department\").equal(tenant_name),\n",
+        "        limit=2,\n",
+        "        return_properties=[\"content\"]\n",
+        "    )\n",
+        "\n",
+        "    return results.objects\n",
+        "\n",
+        "results = filtered_search(\"engineering\", \"pipeline\")\n",
+        "for r in results:\n",
+        "    print(r.properties[\"content\"])"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## 7. LangChain-style Retriever (Minimal)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "from langchain_core.documents import Document\n",
+        "\n",
+        "def langchain_retriever(tenant, query):\n",
+        "    results = tenant_search(tenant, query)\n",
+        "\n",
+        "    return [\n",
+        "        Document(page_content=r.properties[\"content\"], metadata=r.properties)\n",
+        "        for r in results\n",
+        "    ]\n",
+        "\n",
+        "docs = langchain_retriever(\"engineering\", \"RAG\")\n",
+        "for d in docs:\n",
+        "    print(d.page_content)"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": ".venv-review",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python",
+      "version": "3.13.2"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 2
+}
diff --git a/integrations/llm-agent-frameworks/langchain/rag-with-multi-tenancy/requirements.txt b/integrations/llm-agent-frameworks/langchain/rag-with-multi-tenancy/requirements.txt
@@ -0,0 +1,4 @@
+weaviate-client>=4.0.0
+sentence-transformers>=2.7.0
+langchain>=0.1.0
+notebook>=7.0.0
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		# No keys needed for local development
		# Run with: jupyter notebook