Skip to content

yangm2/ocrmypdf-apple-sdk-plugin

Repository files navigation

ocrmypdf-apple-sdk

CI

An OCRmyPDF plugin that replaces Tesseract with Apple's native frameworks on Apple Silicon Macs. OCR runs on the Neural Engine via Vision, rasterization uses PDFKit and CoreGraphics, and optional post-processing uses Foundation Models (on-device LLM) — no cloud calls, no external binaries.

What it does

Capability Framework
Text recognition Apple Vision (VNRecognizeTextRequest)
PDF rasterization PDFKit + CoreGraphics
Text layer generation CoreGraphics (CGPDFContext)
OCR error correction Foundation Models (Apple Intelligence)
Language detection NaturalLanguage (NLLanguageRecognizer)
Orientation detection Vision (VNRecognizeTextRequest)

The plugin hooks into OCRmyPDF's plugin system and completely replaces the Tesseract backend. The rest of OCRmyPDF's pipeline (page splitting, output assembly, metadata, encryption handling) runs unchanged.

Performance

Apple Vision runs approximately 2.8× faster than Tesseract end-to-end on Apple Silicon (M-series). The Neural Engine accelerates the accurate recognition level; no GPU or CPU tuning is needed.

Prerequisites

  • macOS 26 (Tahoe/Sequoia 26) or later — required for the full Vision and Foundation Models API surface
  • Apple Silicon (M1 or later) — the Neural Engine path requires an M-series chip
  • Python 3.13 or 3.14
  • uv — the project uses uv for dependency management; plain pip works too

The plugin checks the macOS version at startup and exits with a clear error if the requirement is not met.

Installation

From source (development)

git clone https://github.com/yangm2/ocrmypdf-apple-sdk-plugin.git
cd ocrmypdf-apple-sdk-plugin
uv sync

The plugin auto-registers via its ocrmypdf entry point — no extra flags needed when using ocrmypdf through the same environment.

Into an existing environment

pip install .

OCRmyPDF discovers the plugin automatically through the ocrmypdf entry point group.

Note: The apple-fm-sdk dependency (for --apple-postprocess) is Apple's official Python SDK for Foundation Models. It is available at github.com/apple/python-apple-fm-sdk and is not yet on PyPI. If you are not using --apple-postprocess, you can omit it by removing the apple-fm-sdk line from pyproject.toml.

Basic usage

# Add a searchable text layer to a scanned PDF
ocrmypdf input.pdf output.pdf

# Force re-OCR even if the PDF already has text
ocrmypdf --force-ocr input.pdf output.pdf

# Skip pages that already have text (default behaviour)
ocrmypdf --skip-text input.pdf output.pdf

Because the plugin auto-loads from the entry point, no --plugin flag is required. If you are running from a uv project:

uv run ocrmypdf input.pdf output.pdf

Image input

The plugin works with image files directly — not just PDFs. OCRmyPDF accepts JPEG, PNG, TIFF, BMP, GIF, and WebP. The Vision Framework additionally handles HEIC/HEIF natively, so iPhone camera photos work without conversion.

# JPEG scan
ocrmypdf scan.jpg output.pdf

# PNG screenshot
ocrmypdf screenshot.png output.pdf

# HEIC photo from iPhone
ocrmypdf IMG_1234.heic output.pdf

iPhone camera photos are a good use case. A few notes:

  • EXIF orientation is respected automatically — Vision reads the orientation tag and sees the image right-side-up regardless of how the phone was held. --rotate-pages is not needed.
  • Resolution is high and contrast is generally good, so --apple-preprocess is rarely needed.
  • --image-dpi may be needed if the image lacks embedded DPI metadata, though iPhone photos typically embed it.

Options

All Apple-specific options are grouped under Apple SDK in --help.

--apple-ocr-level {accurate|fast}

Controls the Vision recognition level.

Level Engine Speed Accuracy
accurate (default) Neural Engine ~650 ms/page Highest — on par with commercial OCR
fast CPU only ~520 ms/page Good for clear, well-scanned documents

When to use fast:

  • Batch jobs where throughput matters more than marginal quality gains
  • Documents with clean, high-contrast scan quality (typed text, no handwriting)
  • Interactive pipelines where the user is waiting

When to use accurate (default):

  • Mixed documents, low-contrast scans, or anything with unusual fonts
  • Final archival output where quality is the priority
  • Documents with Asian scripts, Arabic, or other complex scripts
ocrmypdf --apple-ocr-level fast large_batch.pdf output.pdf

--apple-preprocess

Applies CoreImage image preprocessing to each page before OCR:

  • Auto-levels (contrast stretch)
  • Unsharp mask (edge sharpening)

This can improve recognition on degraded scans. It has no effect on already high-quality images (the operations are no-ops on uniform or near-uniform input).

When to use:

  • Old or faded documents with low contrast
  • Photocopies or faxes with grey backgrounds
  • Any scan where accurate mode still misses characters

When not to use:

  • Clean, high-quality scans — preprocessing adds latency without benefit
  • Documents where preserving exact pixel appearance matters (the preprocessing modifies the image passed to Vision, not the output PDF)
ocrmypdf --apple-preprocess faded_scan.pdf output.pdf

--apple-postprocess

Runs Foundation Models (Apple's on-device LLM, part of Apple Intelligence) on the extracted text to correct OCR recognition errors — digit/letter confusions, split words, missing punctuation.

Requirements:

  • Apple Intelligence must be enabled in System Settings → Apple Intelligence & Siri
  • The on-device model must be downloaded (happens automatically after enabling)

The plugin checks availability at runtime and silently skips post-processing if Foundation Models is not available, so it is safe to include the flag in scripts.

When to use:

  • Documents with OCR errors that matter — forms, contracts, medical records
  • Historical documents or non-standard typefaces prone to character confusion
  • When the text layer will be used for full-text search or copy-paste

When not to use:

  • Large batch jobs — Foundation Models adds latency per page (model inference on-device)
  • Documents where exact spacing and layout must be preserved character-for-character
  • When Apple Intelligence is not available on the system (the flag is silently ignored)
ocrmypdf --apple-postprocess important_document.pdf output.pdf

Combining options

Options compose freely:

# Best quality: preprocess + accurate OCR + LLM correction
ocrmypdf --apple-preprocess --apple-postprocess degraded_scan.pdf output.pdf

# Maximum speed: fast OCR, no extras
ocrmypdf --apple-ocr-level fast large_batch.pdf output.pdf

# Fast batch with preprocessing, skip LLM
ocrmypdf --apple-ocr-level fast --apple-preprocess batch_input.pdf output.pdf

OCRmyPDF option compatibility

--rotate-pages

Recommended. The plugin implements orientation detection using Vision Framework — it tries all four cardinal rotations and picks the one with the highest OCR confidence. This is likely more accurate than Tesseract's orientation detection, particularly for non-Latin scripts.

ocrmypdf --rotate-pages input.pdf output.pdf

--deskew

No benefit — skip it. get_deskew() always returns 0.0; OCRmyPDF's deskew step (which uses Tesseract internally) is bypassed. Vision Framework is robust to minor skew anyway.

--clean

Situational. OCRmyPDF's --clean runs unpaper to remove scan artifacts such as border shadows, punch holes, and grey backgrounds. The plugin's --apple-preprocess already applies auto-contrast and unsharp mask before OCR, so --clean adds little for typical scans. It may help with very low-quality originals that have physical artifacts --apple-preprocess does not address.

Language support

The plugin accepts both Tesseract-style ISO 639-3 codes (e.g. eng, fra, jpn) and Vision BCP-47 codes (e.g. en-US, fr-FR, ja-JP). Specify languages with the standard -l / --language flag:

# Single language
ocrmypdf -l fra french_document.pdf output.pdf

# Multiple languages ('+' separator)
ocrmypdf -l eng+jpn mixed_document.pdf output.pdf

Supported languages include: English, French, German, Spanish, Italian, Portuguese, Dutch, Danish, Norwegian, Swedish, Polish, Czech, Romanian, Turkish, Russian, Ukrainian, Arabic, Thai, Vietnamese, Indonesian, Malay, Japanese, Korean, Chinese Simplified, Chinese Traditional.

Run ocrmypdf --list-langs to see the authoritative list from Vision at runtime.

When no language is specified, Vision auto-detects the script. After OCR, NaturalLanguage identifies the dominant language and logs it at DEBUG level (--verbose 2).

Troubleshooting

ocrmypdf-apple-sdk requires macOS 26+ The plugin enforces a macOS 26 minimum. Earlier macOS versions lack required Vision and Foundation Models APIs.

Foundation Models post-processing silently skipped Enable Apple Intelligence in System Settings → Apple Intelligence & Siri and wait for the on-device model to finish downloading. Run with --verbose 2 to see the reason logged.

PriorOcrFoundError The input PDF already has a text layer. Use --force-ocr to re-OCR, or --redo-ocr to replace only the text layer.

Plugin loaded twice / ValueError: plugin already registered Never pass plugins=["ocrmypdf_apple_sdk"] when calling ocrmypdf.ocr() in Python — the plugin loads automatically via its entry point.

Comparison with ocrmypdf-AppleOCR

ocrmypdf-AppleOCR (mkyt) is the other macOS-native plugin. Both use Apple Vision for OCR. The key differences:

ocrmypdf-apple-sdk (this project) ocrmypdf-AppleOCR (mkyt)
Rasterization PDFKit + CoreGraphics (no Ghostscript) Ghostscript (unchanged)
Tesseract dependency None — fully blocked Required (for orientation detection)
OCR engine VNRecognizeTextRequest (accurate/fast) VNRecognizeTextRequest + VKCImageAnalyzer (LiveText)
Vertical CJK text Not supported Supported via LiveText mode
Word-level boxes Yes Line-level only
Handwriting mode --apple-handwriting Not supported
LLM post-processing --apple-postprocess (Foundation Models) Not supported
Language detection NaturalLanguage framework Not supported
Min macOS 26 12+ (13+ for LiveText)

LiveText mode (VKCImageAnalyzer)

The main OCR capability this project lacks is AppleOCR's LiveText mode, which uses VisionKit's private VKCImageAnalyzer API instead of VNRecognizeTextRequest. LiveText provides two things the standard Vision API does not:

  • Vertical CJK text layout: VKCImageAnalyzer exposes a layoutDirection() on each text line. For vertically-set Japanese, Chinese, and Korean, it returns a vertical layout code that allows correct bounding-box orientation in the output PDF. With VNRecognizeTextRequest only, vertical CJK columns get axis-aligned boxes that don't reflect the actual text direction.
  • Quad geometry: results use four corner points (quad()) instead of axis-aligned rectangles, enabling precise text layer placement for skewed or rotated lines.

The tradeoffs: LiveText is a private API (VKCImageAnalyzer is not in the public SDK), requires manual PyObjC metadata registration, needs a separate subprocess with a Cocoa event loop, and does not expose confidence scores.

Architecture notes

  • No custom parallelism — OCRmyPDF handles page-level parallelism via its worker pool. Apple frameworks dispatch internally to CPU/GPU/ANE. Adding extra threading would contend with both.
  • Stateless engineOcrEngine methods are static; no state is stored on the engine instance. Safe for use across forked processes.
  • pyobjc bridging — all Apple framework calls go through pyobjc. Objects are converted to plain Python types before returning across process boundaries (pyobjc objects are not picklable).
  • No subprocess calls — the plugin uses framework APIs directly. ocrmypdf's Ghostscript rasterizer is replaced by PDFKit for the default auto rasterizer setting.

Development

uv sync                                     # install all deps including dev extras
uv run ruff format src/ tests/              # format
uv run ruff check src/ tests/              # lint
uv run pyrefly check                        # typecheck
uv run pytest tests/                        # unit + integration + property tests
uv run pytest benchmarks/ --benchmark-only  # performance benchmarks

See DEVELOPMENT.md for the full implementation plan, architecture decisions, test strategy, and benchmark methodology.

About

Plugin for ocrmypdf that leverages Apple SDKs and Apple Silicon

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages