Skip to content

A full-stack PDF document query system that allows users to upload PDF documents and query their content using AI-powered semantic search. Built with FastAPI backend and React frontend.

Notifications You must be signed in to change notification settings

SanjaySinghRajpoot/pdf-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PDF RAG Application

A full-stack PDF document query system that allows users to upload PDF documents and query their content using AI-powered semantic search. Built with FastAPI backend and React frontend.

πŸš€ Features

  • PDF Upload & Processing: Securely upload PDF documents for processing
  • Semantic Search: Query documents using natural language
  • AI-Powered Responses: Get accurate, contextual answers from your documents
  • Vector Embeddings: Uses OpenAI embeddings for intelligent document chunking
  • PostgreSQL with pgvector: Efficient similarity search capabilities
  • Modern UI: Clean, responsive interface built with React and Tailwind CSS

πŸ—οΈ Architecture

  • Backend: FastAPI with Python
  • Frontend: React with TypeScript and Vite
  • Database: PostgreSQL with pgvector extension
  • AI/ML: OpenAI API for embeddings and text generation
  • Styling: Tailwind CSS with custom design system

πŸ“‹ Prerequisites

  • Python 3.10+
  • Node.js 18+
  • PostgreSQL 14+ with pgvector extension
  • OpenAI API key

πŸ› οΈ Installation & Setup

1. Clone the Repository

git clone <repository-url>
cd pdf-rag

2. Backend Setup

Navigate to backend directory

cd backend

Create Python virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Environment configuration

  1. Copy the sample environment file:
cp sample.env .env
  1. Edit .env file with your configuration:
# Database
DATABASE_URL=postgresql://username:password@localhost:5432/pdf_rag_db

# OpenAI
OPENAI_API_KEY=your_openai_api_key_here

# App Settings
ENVIRONMENT=development
LOG_LEVEL=INFO

Database setup

  1. Create PostgreSQL database:
CREATE DATABASE pdf_rag_db;
  1. Install pgvector extension:
CREATE EXTENSION vector;

Run the backend

# From backend directory
python main.py

The backend will be available at http://localhost:8000

3. Frontend Setup

Navigate to frontend directory

cd frontend

Install dependencies

npm install
# or using bun
bun install

Environment configuration

Create .env file in frontend directory:

VITE_API_BASE_URL=http://localhost:8000

Run the frontend

npm run dev
# or using bun
bun dev

The frontend will be available at http://localhost:8080

🐳 Docker Setup (Alternative)

You can also run the application using Docker:

# Build and run with docker-compose
docker-compose up --build

This will start both backend and frontend services with the database.

πŸ”§ Development

Backend Development

  • API documentation: http://localhost:8000/docs
  • Health check: http://localhost:8000/api/v1/health
  • Statistics: http://localhost:8000/api/v1/stats

Frontend Development

  • Built with Vite for fast hot reloading
  • Tailwind CSS for styling
  • TypeScript for type safety
  • React Query for API state management

Key Backend Files

Key Frontend Files

πŸ“‘ API Endpoints

  • POST /api/v1/ingest - Upload and process PDF documents
  • POST /api/v1/query - Query documents with natural language
  • GET /api/v1/health - Health check endpoint
  • GET /api/v1/stats - System statistics

🎨 UI Components

The frontend uses a custom design system with:

  • Responsive layout
  • Dark theme optimized
  • Custom gradients and animations
  • Accessible components built with Radix UI

πŸ§ͺ Testing

Backend Testing

cd backend
python -m pytest

Frontend Testing

cd frontend
npm test

πŸš€ Production Deployment

  1. Set environment variables for production
  2. Configure PostgreSQL with proper security settings
  3. Set up reverse proxy (nginx recommended)
  4. Enable SSL/TLS certificates
  5. Configure CORS settings appropriately

πŸ“ Environment Variables

Backend

  • DATABASE_URL - PostgreSQL connection string
  • OPENAI_API_KEY - OpenAI API key for embeddings
  • ENVIRONMENT - deployment environment (development/production)
  • LOG_LEVEL - logging level

Frontend

  • VITE_API_BASE_URL - Backend API base URL

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License.

About

A full-stack PDF document query system that allows users to upload PDF documents and query their content using AI-powered semantic search. Built with FastAPI backend and React frontend.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published