Skip to content

tsejavhaa/Multi_engine_OCR

Repository files navigation

OCR Suite — Multi-Engine Text Extraction

A collection of OCR implementations powered by Tesseract, EasyOCR, PaddleOCR, Docling, and Mistral OCR, plus a unified FastAPI hub that lets you switch between engines on the fly.

OCR Models

Tesseract — Tessaract/

Pytesseract binding for Google's Tesseract OCR engine. Fast, local, works offline.

Interface File Port
Desktop (PyQt5) pyqt.py
Web (Flask) web_app.py 5000

Metrics: confidence %, word/char/line count, quality label (Excellent/Good/Fair/Poor). Formats: PNG, JPG, PDF, BMP, WEBP, GIF, TIFF.

EasyOCR — EasyOCR/

Deep-learning OCR (PyTorch) with per-block confidence. No system dependencies.

Interface File Port
Web (Flask) app.py 5005

Metrics: mean/min/median confidence, 10-bin histogram, CER/WER (with ground truth). Formats: PNG, JPG, PDF, BMP, GIF, TIFF, WEBP.

PaddleOCR — PaddleOCR/

Vision-Language Model (0.9B params) with document layout detection, table/chart extraction, and markdown output. Heavy — requires ~1.8 GB disk + 8 GB RAM.

Interface File Port
Web (Flask) app.py 5002
Notebook paddle_ocr_m1.ipynb

Formats: PNG, JPG, PDF, BMP, TIFF.

Docling — DoclingOCR/

IBM's Docling document understanding pipeline. Combines OCR with layout analysis, exports plain text + markdown.

Interface File Port
Web (Flask) flask_app/app.py 5003
Notebook docling.ipynb

Formats: PDF, PNG, JPG, TIFF, BMP, DOCX.

Mistral OCR — Notebooks/Mistral/

Cloud-based OCR via Mistral API. Requires MISTRAL_API_KEY. Supports tables, math, mixed-language documents.

Interface File
Notebook mistral_ocr.ipynb

Formats: URL, local image (base64), PDF (Files API upload → signed URL).

FastAPI Hub — fastapi_app/

Unified web interface that switches between Tesseract, EasyOCR, Docling, and Mistral without restarting.

fastapi_app/
├── app.py              # FastAPI backend (port 8000)
├── templates/
│   └── index.html      # Minimalist UI
└── uploads/            # Temp files (auto-cleaned)

Features: drag-drop upload, image preview, metadata grid, settings modal for engine selection, Mistral API key input, copy-to-clipboard.

Run:

cd fastapi_app
uvicorn app:app --reload --port 8000

Project Structure

OCR/
├── README.md                   # This file
├── .env                        # API keys (add to .gitignore if not already)
├── .gitignore
├── .python-version
├── requirements.txt            # Frozen Python dependencies
├── uploads/                    # Shared temp upload directory
│
├── Tessaract/                  # Tesseract OCR (PyQt5 + Flask)
│   ├── pyqt.py
│   ├── web_app.py
│   ├── templates/index.html
│   └── READMD.md               # (note: filename typo)
│
├── EasyOCR/                    # EasyOCR (Flask)
│   ├── app.py
│   ├── templates/index.html
│   ├── model/                  # Cached .pth files
│   ├── output/
│   └── uploads/
│
├── PaddleOCR/                  # PaddleOCR-VL (Flask + Notebook)
│   ├── app.py
│   ├── templates/index.html
│   ├── paddle_ocr_m1.ipynb
│   ├── model/
│   ├── input/
│   ├── output/
│   └── uploads/
│
├── DoclingOCR/                 # Docling (Flask + Notebook)
│   ├── flask_app/app.py
│   ├── templates/index.html
│   ├── docling.ipynb
│   ├── Docling/input/
│   ├── Docling/output/
│   └── DoclingOCR/model/
│
├── Notebooks/Mistral/          # Mistral OCR (Notebook)
│   ├── mistral_ocr.ipynb
│   └── data/                   # Sample invoice files
│
└── fastapi_app/                # Unified FastAPI hub
    ├── app.py
    ├── templates/index.html
    └── uploads/

Quick Comparison

Engine Type Speed Internet Weight Best For
Tesseract System binary ★★★★ No 15 MB Quick, reliable OCR
EasyOCR DL (PyTorch) ★★★ No ~100 MB Per-block confidence
PaddleOCR VL model ★★ No ~1.8 GB Layout + table extraction
Docling Pipeline ★★ No* ~770 MB Structured documents
Mistral Cloud API ★★★★★ Yes Complex docs, math, tables

*First run downloads models; subsequent runs are offline.

About

Multi-engine OCR suite with Tesseract, EasyOCR, PaddleOCR, Docling, and Mistral OCR — desktop apps, web UIs, Jupyter notebooks, and a unified FastAPI hub for document text extraction.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors