Skip to content

datainsightat/knowledge_base_rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local Knowledge Base (RAG) Template

This template provides a simple, local Retrieval-Augmented Generation (RAG) system for your Data Engineering documents. It ingests PDFs, Markdown, Text files, and Jupyter Notebooks into a local vector database (ChromaDB) for querying.

Features

  • Multi-format Support: Handles .pdf, .md, .txt, and .ipynb files.
  • Local Embeddings: Uses sentence-transformers/all-MiniLM-L6-v2 (runs entirely on your CPU/GPU, no API keys required).
  • Persistent Storage: Saves the vector database locally in ./chroma_db.
  • Easy Querying: Simple CLI to ask questions against your document set.

Setup

  1. Install Dependencies:

    pip install -r requirements.txt
  2. Prepare Your Documents: Place your documents in a folder (e.g., data/my_docs).

  3. Ingest Data: Run the ingestion script to process your documents and build the database.

    python ingest.py --source_dir /path/to/your/documents
  4. Query the Knowledge Base: Ask questions about your data.

    python query.py "What are the best practices for dbt macros?"

Requirements

See requirements.txt.

About

Data Engineers juggle scattered PDFs, Markdown, and Notebooks. AI agents lack this internal context. The goal: build a Local "Second Brain" that ingests these files, allowing agents to answer questions based strictly on internal knowledge without sending sensitive data to the cloud.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors