A collection of OCR implementations powered by Tesseract, EasyOCR, PaddleOCR, Docling, and Mistral OCR, plus a unified FastAPI hub that lets you switch between engines on the fly.
Pytesseract binding for Google's Tesseract OCR engine. Fast, local, works offline.
| Interface | File | Port |
|---|---|---|
| Desktop (PyQt5) | pyqt.py |
— |
| Web (Flask) | web_app.py |
5000 |
Metrics: confidence %, word/char/line count, quality label (Excellent/Good/Fair/Poor). Formats: PNG, JPG, PDF, BMP, WEBP, GIF, TIFF.
Deep-learning OCR (PyTorch) with per-block confidence. No system dependencies.
| Interface | File | Port |
|---|---|---|
| Web (Flask) | app.py |
5005 |
Metrics: mean/min/median confidence, 10-bin histogram, CER/WER (with ground truth). Formats: PNG, JPG, PDF, BMP, GIF, TIFF, WEBP.
Vision-Language Model (0.9B params) with document layout detection, table/chart extraction, and markdown output. Heavy — requires ~1.8 GB disk + 8 GB RAM.
| Interface | File | Port |
|---|---|---|
| Web (Flask) | app.py |
5002 |
| Notebook | paddle_ocr_m1.ipynb |
— |
Formats: PNG, JPG, PDF, BMP, TIFF.
IBM's Docling document understanding pipeline. Combines OCR with layout analysis, exports plain text + markdown.
| Interface | File | Port |
|---|---|---|
| Web (Flask) | flask_app/app.py |
5003 |
| Notebook | docling.ipynb |
— |
Formats: PDF, PNG, JPG, TIFF, BMP, DOCX.
Cloud-based OCR via Mistral API. Requires MISTRAL_API_KEY. Supports tables, math, mixed-language documents.
| Interface | File |
|---|---|
| Notebook | mistral_ocr.ipynb |
Formats: URL, local image (base64), PDF (Files API upload → signed URL).
Unified web interface that switches between Tesseract, EasyOCR, Docling, and Mistral without restarting.
fastapi_app/
├── app.py # FastAPI backend (port 8000)
├── templates/
│ └── index.html # Minimalist UI
└── uploads/ # Temp files (auto-cleaned)
Features: drag-drop upload, image preview, metadata grid, settings modal for engine selection, Mistral API key input, copy-to-clipboard.
Run:
cd fastapi_app
uvicorn app:app --reload --port 8000OCR/
├── README.md # This file
├── .env # API keys (add to .gitignore if not already)
├── .gitignore
├── .python-version
├── requirements.txt # Frozen Python dependencies
├── uploads/ # Shared temp upload directory
│
├── Tessaract/ # Tesseract OCR (PyQt5 + Flask)
│ ├── pyqt.py
│ ├── web_app.py
│ ├── templates/index.html
│ └── READMD.md # (note: filename typo)
│
├── EasyOCR/ # EasyOCR (Flask)
│ ├── app.py
│ ├── templates/index.html
│ ├── model/ # Cached .pth files
│ ├── output/
│ └── uploads/
│
├── PaddleOCR/ # PaddleOCR-VL (Flask + Notebook)
│ ├── app.py
│ ├── templates/index.html
│ ├── paddle_ocr_m1.ipynb
│ ├── model/
│ ├── input/
│ ├── output/
│ └── uploads/
│
├── DoclingOCR/ # Docling (Flask + Notebook)
│ ├── flask_app/app.py
│ ├── templates/index.html
│ ├── docling.ipynb
│ ├── Docling/input/
│ ├── Docling/output/
│ └── DoclingOCR/model/
│
├── Notebooks/Mistral/ # Mistral OCR (Notebook)
│ ├── mistral_ocr.ipynb
│ └── data/ # Sample invoice files
│
└── fastapi_app/ # Unified FastAPI hub
├── app.py
├── templates/index.html
└── uploads/
| Engine | Type | Speed | Internet | Weight | Best For |
|---|---|---|---|---|---|
| Tesseract | System binary | ★★★★ | No | 15 MB | Quick, reliable OCR |
| EasyOCR | DL (PyTorch) | ★★★ | No | ~100 MB | Per-block confidence |
| PaddleOCR | VL model | ★★ | No | ~1.8 GB | Layout + table extraction |
| Docling | Pipeline | ★★ | No* | ~770 MB | Structured documents |
| Mistral | Cloud API | ★★★★★ | Yes | — | Complex docs, math, tables |
*First run downloads models; subsequent runs are offline.
