Skip to content

arango-solutions/lightrag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LightRAG + ArangoDB: Multi-Hop Reasoning with Knowledge Graphs

A complete implementation of LightRAG using ArangoDB as the storage backend, demonstrating how knowledge graphs enable multi-hop reasoning that traditional RAG systems cannot achieve.

🎯 What This Project Demonstrates

Traditional RAG retrieves documents based on vector similarity alone. It struggles with questions that require connecting information across multiple documents.

LightRAG + ArangoDB builds a knowledge graph from your documents, enabling:

  • Multi-hop reasoning: Answer questions that require traversing relationships between entities
  • Better context retrieval: Find relevant information through graph connections, not just semantic similarity
  • Unified storage: Documents, embeddings, and relationships all stored in ArangoDB

🚀 Quick Start

Prerequisites

  • Docker & Docker Compose
  • OpenAI API key (for embeddings and LLM)

1. Clone and Configure

git clone https://github.com/YOUR_USERNAME/lightrag-arangodb.git
cd lightrag-arangodb

# Create environment file
cp .env.example .env

# Add your OpenAI API key to .env
nano .env

2. Start Services

# Start ArangoDB and the LightRAG container
docker compose up -d --build

# Create the multihop database for the demo
docker exec arangodb-single arangosh \
  --server.password openSesame \
  --javascript.execute-string "db._createDatabase('multihop');"

3. Run the Multi-Hop Demo

docker exec lightrag-app python multi_hop_demo.py

4. Re-run Queries Without Ingestion (Optional)

After running the full demo once, you can re-run queries against the existing database without re-ingesting:

# Copy query script to container
docker cp query_multihop.py lightrag-app:/app/

# Run queries only (saves results to JSON)
docker exec lightrag-app python query_multihop.py

# Get the results file
docker cp lightrag-app:/app/multihop_results.json ./

📊 Understanding Multi-Hop Reasoning

The multi_hop_demo.py uses completely fabricated data about a fictional company called "Nexova Technologies" to prove that answers come from the knowledge graph—not the LLM's training data.

Example: Information Spread Across Documents

Document 1: "PJ Kowalski is the lead architect at Nexova Technologies. He reports directly to Daniel Chen."

Document 2: "Daniel Chen is the VP of Engineering. His office is on the 4th floor."

Query: "What floor is the office of the person who hired PJ Kowalski?"

Reasoning Path: PJ Kowalski → reports to Daniel Chen → office on 4th floor

A traditional vector search might not connect these documents, but the knowledge graph traverses:

  1. Find "PJ Kowalski" entity
  2. Follow relationship to "Daniel Chen" (reports to)
  3. Find "4th floor" connected to Daniel Chen's office

📁 Project Structure

├── arangodb_impl.py      # ArangoDB storage implementations for LightRAG
├── demo.py               # Basic demo with sample documents
├── multi_hop_demo.py     # Multi-hop reasoning demonstration (ingest + query)
├── query_multihop.py     # Query-only script (no ingestion, saves results to JSON)
├── docker-compose.yml    # Docker services configuration
├── Dockerfile            # LightRAG container build
├── requirements.txt      # Python dependencies
└── .env.example          # Environment template

📊 Results Output

The query_multihop.py script saves detailed results to multihop_results.json:

{
  "timestamp": "2026-02-25T...",
  "summary": {
    "naive_correct": 3,
    "hybrid_correct": 7,
    "total_queries": 8,
    "winner": "hybrid"
  },
  "queries": [
    {
      "query": "What color is the car driven by the manager of employee NX-4472?",
      "expected": "red",
      "reasoning": "NX-4472 = PJ → PJ's manager = Daniel → Daniel's car = cherry red",
      "hops": 3,
      "naive": { "response": "...", "correct": false },
      "hybrid": { "response": "...", "correct": true }
    }
  ]
}

🔧 Storage Implementations

This project implements all four LightRAG storage backends for ArangoDB:

Storage Type Class Purpose
Graph ArangoDBStorage Entity nodes and relationship edges
KV ArangoDBKVStorage Document chunks and metadata
Vector ArangoDBVectorStorage Embeddings for semantic search
Doc Status ArangoDBDocStatusStorage Document processing tracking

🔍 Query Modes

LightRAG supports different query modes:

from lightrag import LightRAG, QueryParam

# Vector-only search (like traditional RAG)
result = await rag.aquery(query, param=QueryParam(mode="naive"))

# Knowledge graph + vector search (multi-hop reasoning)
result = await rag.aquery(query, param=QueryParam(mode="hybrid", enable_rerank=False))

🌐 Accessing ArangoDB

Once running, access the ArangoDB web interface:

Explore the knowledge graph visually to see entities and their relationships!

📝 Configuration

Environment Variables

# Required
OPENAI_API_KEY=sk-your-key-here

# ArangoDB (defaults work with docker-compose)
ARANGO_HOST=http://arangodb:8529
ARANGO_USERNAME=root
ARANGO_PASSWORD=openSesame
ARANGO_DATABASE=multihop

📚 Learn More

📄 License

MIT License - feel free to use this for your own projects!


Built with ❤️ using LightRAG and ArangoDB

About

LightRAG integration with ArangoDB

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors