An OCRmyPDF plugin that replaces Tesseract with Apple's native frameworks on Apple Silicon Macs. OCR runs on the Neural Engine via Vision, rasterization uses PDFKit and CoreGraphics, and optional post-processing uses Foundation Models (on-device LLM) — no cloud calls, no external binaries.
| Capability | Framework |
|---|---|
| Text recognition | Apple Vision (VNRecognizeTextRequest) |
| PDF rasterization | PDFKit + CoreGraphics |
| Text layer generation | CoreGraphics (CGPDFContext) |
| OCR error correction | Foundation Models (Apple Intelligence) |
| Language detection | NaturalLanguage (NLLanguageRecognizer) |
| Orientation detection | Vision (VNRecognizeTextRequest) |
The plugin hooks into OCRmyPDF's plugin system and completely replaces the Tesseract backend. The rest of OCRmyPDF's pipeline (page splitting, output assembly, metadata, encryption handling) runs unchanged.
Apple Vision runs approximately 2.8× faster than Tesseract end-to-end on Apple Silicon (M-series). The Neural Engine accelerates the accurate recognition level; no GPU or CPU tuning is needed.
- macOS 26 (Tahoe/Sequoia 26) or later — required for the full Vision and Foundation Models API surface
- Apple Silicon (M1 or later) — the Neural Engine path requires an M-series chip
- Python 3.13 or 3.14
- uv — the project uses uv for dependency management; plain pip works too
The plugin checks the macOS version at startup and exits with a clear error if the requirement is not met.
git clone https://github.com/yangm2/ocrmypdf-apple-sdk-plugin.git
cd ocrmypdf-apple-sdk-plugin
uv syncThe plugin auto-registers via its ocrmypdf entry point — no extra flags needed when using ocrmypdf through the same environment.
pip install .OCRmyPDF discovers the plugin automatically through the ocrmypdf entry point group.
Note: The
apple-fm-sdkdependency (for--apple-postprocess) is Apple's official Python SDK for Foundation Models. It is available at github.com/apple/python-apple-fm-sdk and is not yet on PyPI. If you are not using--apple-postprocess, you can omit it by removing theapple-fm-sdkline frompyproject.toml.
# Add a searchable text layer to a scanned PDF
ocrmypdf input.pdf output.pdf
# Force re-OCR even if the PDF already has text
ocrmypdf --force-ocr input.pdf output.pdf
# Skip pages that already have text (default behaviour)
ocrmypdf --skip-text input.pdf output.pdfBecause the plugin auto-loads from the entry point, no --plugin flag is required. If you are running from a uv project:
uv run ocrmypdf input.pdf output.pdfThe plugin works with image files directly — not just PDFs. OCRmyPDF accepts JPEG, PNG, TIFF, BMP, GIF, and WebP. The Vision Framework additionally handles HEIC/HEIF natively, so iPhone camera photos work without conversion.
# JPEG scan
ocrmypdf scan.jpg output.pdf
# PNG screenshot
ocrmypdf screenshot.png output.pdf
# HEIC photo from iPhone
ocrmypdf IMG_1234.heic output.pdfiPhone camera photos are a good use case. A few notes:
- EXIF orientation is respected automatically — Vision reads the orientation tag and sees the image right-side-up regardless of how the phone was held.
--rotate-pagesis not needed. - Resolution is high and contrast is generally good, so
--apple-preprocessis rarely needed. --image-dpimay be needed if the image lacks embedded DPI metadata, though iPhone photos typically embed it.
All Apple-specific options are grouped under Apple SDK in --help.
Controls the Vision recognition level.
| Level | Engine | Speed | Accuracy |
|---|---|---|---|
accurate (default) |
Neural Engine | ~650 ms/page | Highest — on par with commercial OCR |
fast |
CPU only | ~520 ms/page | Good for clear, well-scanned documents |
When to use fast:
- Batch jobs where throughput matters more than marginal quality gains
- Documents with clean, high-contrast scan quality (typed text, no handwriting)
- Interactive pipelines where the user is waiting
When to use accurate (default):
- Mixed documents, low-contrast scans, or anything with unusual fonts
- Final archival output where quality is the priority
- Documents with Asian scripts, Arabic, or other complex scripts
ocrmypdf --apple-ocr-level fast large_batch.pdf output.pdfApplies CoreImage image preprocessing to each page before OCR:
- Auto-levels (contrast stretch)
- Unsharp mask (edge sharpening)
This can improve recognition on degraded scans. It has no effect on already high-quality images (the operations are no-ops on uniform or near-uniform input).
When to use:
- Old or faded documents with low contrast
- Photocopies or faxes with grey backgrounds
- Any scan where
accuratemode still misses characters
When not to use:
- Clean, high-quality scans — preprocessing adds latency without benefit
- Documents where preserving exact pixel appearance matters (the preprocessing modifies the image passed to Vision, not the output PDF)
ocrmypdf --apple-preprocess faded_scan.pdf output.pdfRuns Foundation Models (Apple's on-device LLM, part of Apple Intelligence) on the extracted text to correct OCR recognition errors — digit/letter confusions, split words, missing punctuation.
Requirements:
- Apple Intelligence must be enabled in System Settings → Apple Intelligence & Siri
- The on-device model must be downloaded (happens automatically after enabling)
The plugin checks availability at runtime and silently skips post-processing if Foundation Models is not available, so it is safe to include the flag in scripts.
When to use:
- Documents with OCR errors that matter — forms, contracts, medical records
- Historical documents or non-standard typefaces prone to character confusion
- When the text layer will be used for full-text search or copy-paste
When not to use:
- Large batch jobs — Foundation Models adds latency per page (model inference on-device)
- Documents where exact spacing and layout must be preserved character-for-character
- When Apple Intelligence is not available on the system (the flag is silently ignored)
ocrmypdf --apple-postprocess important_document.pdf output.pdfOptions compose freely:
# Best quality: preprocess + accurate OCR + LLM correction
ocrmypdf --apple-preprocess --apple-postprocess degraded_scan.pdf output.pdf
# Maximum speed: fast OCR, no extras
ocrmypdf --apple-ocr-level fast large_batch.pdf output.pdf
# Fast batch with preprocessing, skip LLM
ocrmypdf --apple-ocr-level fast --apple-preprocess batch_input.pdf output.pdfRecommended. The plugin implements orientation detection using Vision Framework — it tries all four cardinal rotations and picks the one with the highest OCR confidence. This is likely more accurate than Tesseract's orientation detection, particularly for non-Latin scripts.
ocrmypdf --rotate-pages input.pdf output.pdfNo benefit — skip it. get_deskew() always returns 0.0; OCRmyPDF's deskew step (which uses Tesseract internally) is bypassed. Vision Framework is robust to minor skew anyway.
Situational. OCRmyPDF's --clean runs unpaper to remove scan artifacts such as border shadows, punch holes, and grey backgrounds. The plugin's --apple-preprocess already applies auto-contrast and unsharp mask before OCR, so --clean adds little for typical scans. It may help with very low-quality originals that have physical artifacts --apple-preprocess does not address.
The plugin accepts both Tesseract-style ISO 639-3 codes (e.g. eng, fra, jpn) and Vision BCP-47 codes (e.g. en-US, fr-FR, ja-JP). Specify languages with the standard -l / --language flag:
# Single language
ocrmypdf -l fra french_document.pdf output.pdf
# Multiple languages ('+' separator)
ocrmypdf -l eng+jpn mixed_document.pdf output.pdfSupported languages include: English, French, German, Spanish, Italian, Portuguese, Dutch, Danish, Norwegian, Swedish, Polish, Czech, Romanian, Turkish, Russian, Ukrainian, Arabic, Thai, Vietnamese, Indonesian, Malay, Japanese, Korean, Chinese Simplified, Chinese Traditional.
Run ocrmypdf --list-langs to see the authoritative list from Vision at runtime.
When no language is specified, Vision auto-detects the script. After OCR, NaturalLanguage identifies the dominant language and logs it at DEBUG level (--verbose 2).
ocrmypdf-apple-sdk requires macOS 26+
The plugin enforces a macOS 26 minimum. Earlier macOS versions lack required Vision and Foundation Models APIs.
Foundation Models post-processing silently skipped
Enable Apple Intelligence in System Settings → Apple Intelligence & Siri and wait for the on-device model to finish downloading. Run with --verbose 2 to see the reason logged.
PriorOcrFoundError
The input PDF already has a text layer. Use --force-ocr to re-OCR, or --redo-ocr to replace only the text layer.
Plugin loaded twice / ValueError: plugin already registered
Never pass plugins=["ocrmypdf_apple_sdk"] when calling ocrmypdf.ocr() in Python — the plugin loads automatically via its entry point.
ocrmypdf-AppleOCR (mkyt) is the other macOS-native plugin. Both use Apple Vision for OCR. The key differences:
| ocrmypdf-apple-sdk (this project) | ocrmypdf-AppleOCR (mkyt) | |
|---|---|---|
| Rasterization | PDFKit + CoreGraphics (no Ghostscript) | Ghostscript (unchanged) |
| Tesseract dependency | None — fully blocked | Required (for orientation detection) |
| OCR engine | VNRecognizeTextRequest (accurate/fast) |
VNRecognizeTextRequest + VKCImageAnalyzer (LiveText) |
| Vertical CJK text | Not supported | Supported via LiveText mode |
| Word-level boxes | Yes | Line-level only |
| Handwriting mode | --apple-handwriting |
Not supported |
| LLM post-processing | --apple-postprocess (Foundation Models) |
Not supported |
| Language detection | NaturalLanguage framework | Not supported |
| Min macOS | 26 | 12+ (13+ for LiveText) |
The main OCR capability this project lacks is AppleOCR's LiveText mode, which uses VisionKit's private VKCImageAnalyzer API instead of VNRecognizeTextRequest. LiveText provides two things the standard Vision API does not:
- Vertical CJK text layout:
VKCImageAnalyzerexposes alayoutDirection()on each text line. For vertically-set Japanese, Chinese, and Korean, it returns a vertical layout code that allows correct bounding-box orientation in the output PDF. WithVNRecognizeTextRequestonly, vertical CJK columns get axis-aligned boxes that don't reflect the actual text direction. - Quad geometry: results use four corner points (
quad()) instead of axis-aligned rectangles, enabling precise text layer placement for skewed or rotated lines.
The tradeoffs: LiveText is a private API (VKCImageAnalyzer is not in the public SDK), requires manual PyObjC metadata registration, needs a separate subprocess with a Cocoa event loop, and does not expose confidence scores.
- No custom parallelism — OCRmyPDF handles page-level parallelism via its worker pool. Apple frameworks dispatch internally to CPU/GPU/ANE. Adding extra threading would contend with both.
- Stateless engine —
OcrEnginemethods are static; no state is stored on the engine instance. Safe for use across forked processes. - pyobjc bridging — all Apple framework calls go through pyobjc. Objects are converted to plain Python types before returning across process boundaries (pyobjc objects are not picklable).
- No subprocess calls — the plugin uses framework APIs directly.
ocrmypdf's Ghostscript rasterizer is replaced by PDFKit for the defaultautorasterizer setting.
uv sync # install all deps including dev extras
uv run ruff format src/ tests/ # format
uv run ruff check src/ tests/ # lint
uv run pyrefly check # typecheck
uv run pytest tests/ # unit + integration + property tests
uv run pytest benchmarks/ --benchmark-only # performance benchmarksSee DEVELOPMENT.md for the full implementation plan, architecture decisions, test strategy, and benchmark methodology.