ocrmypdf-apple-sdk

An OCRmyPDF plugin that replaces Tesseract with Apple's native frameworks on Apple Silicon Macs. OCR runs on the Neural Engine via Vision, rasterization uses PDFKit and CoreGraphics, and optional post-processing uses Foundation Models (on-device LLM) — no cloud calls, no external binaries.

What it does

Capability	Framework
Text recognition	Apple Vision (`VNRecognizeTextRequest`)
PDF rasterization	PDFKit + CoreGraphics
Text layer generation	CoreGraphics (`CGPDFContext`)
OCR error correction	Foundation Models (Apple Intelligence)
Language detection	NaturalLanguage (`NLLanguageRecognizer`)
Orientation detection	Vision (`VNRecognizeTextRequest`)

The plugin hooks into OCRmyPDF's plugin system and completely replaces the Tesseract backend. The rest of OCRmyPDF's pipeline (page splitting, output assembly, metadata, encryption handling) runs unchanged.

Performance

Apple Vision runs approximately 2.8× faster than Tesseract end-to-end on Apple Silicon (M-series). The Neural Engine accelerates the accurate recognition level; no GPU or CPU tuning is needed.

Prerequisites

macOS 26 (Tahoe/Sequoia 26) or later — required for the full Vision and Foundation Models API surface
Apple Silicon (M1 or later) — the Neural Engine path requires an M-series chip
Python 3.13 or 3.14
uv — the project uses uv for dependency management; plain pip works too

The plugin checks the macOS version at startup and exits with a clear error if the requirement is not met.

Installation

From source (development)

git clone https://github.com/yangm2/ocrmypdf-apple-sdk-plugin.git
cd ocrmypdf-apple-sdk-plugin
uv sync

The plugin auto-registers via its ocrmypdf entry point — no extra flags needed when using ocrmypdf through the same environment.

Into an existing environment

pip install .

OCRmyPDF discovers the plugin automatically through the ocrmypdf entry point group.

Note: The apple-fm-sdk dependency (for --apple-postprocess) is Apple's official Python SDK for Foundation Models. It is available at github.com/apple/python-apple-fm-sdk and is not yet on PyPI. If you are not using --apple-postprocess, you can omit it by removing the apple-fm-sdk line from pyproject.toml.

Basic usage

# Add a searchable text layer to a scanned PDF
ocrmypdf input.pdf output.pdf

# Force re-OCR even if the PDF already has text
ocrmypdf --force-ocr input.pdf output.pdf

# Skip pages that already have text (default behaviour)
ocrmypdf --skip-text input.pdf output.pdf

Because the plugin auto-loads from the entry point, no --plugin flag is required. If you are running from a uv project:

uv run ocrmypdf input.pdf output.pdf

Image input

The plugin works with image files directly — not just PDFs. OCRmyPDF accepts JPEG, PNG, TIFF, BMP, GIF, and WebP. The Vision Framework additionally handles HEIC/HEIF natively, so iPhone camera photos work without conversion.

# JPEG scan
ocrmypdf scan.jpg output.pdf

# PNG screenshot
ocrmypdf screenshot.png output.pdf

# HEIC photo from iPhone
ocrmypdf IMG_1234.heic output.pdf

iPhone camera photos are a good use case. A few notes:

EXIF orientation is respected automatically — Vision reads the orientation tag and sees the image right-side-up regardless of how the phone was held. --rotate-pages is not needed.
Resolution is high and contrast is generally good, so --apple-preprocess is rarely needed.
--image-dpi may be needed if the image lacks embedded DPI metadata, though iPhone photos typically embed it.

Options

All Apple-specific options are grouped under Apple SDK in --help.

`--apple-ocr-level {accurate|fast}`

Controls the Vision recognition level.

Level	Engine	Speed	Accuracy
`accurate` (default)	Neural Engine	~650 ms/page	Highest — on par with commercial OCR
`fast`	CPU only	~520 ms/page	Good for clear, well-scanned documents

When to use fast:

Batch jobs where throughput matters more than marginal quality gains
Documents with clean, high-contrast scan quality (typed text, no handwriting)
Interactive pipelines where the user is waiting

When to use accurate (default):

Mixed documents, low-contrast scans, or anything with unusual fonts
Final archival output where quality is the priority
Documents with Asian scripts, Arabic, or other complex scripts

ocrmypdf --apple-ocr-level fast large_batch.pdf output.pdf

`--apple-preprocess`

Applies CoreImage image preprocessing to each page before OCR:

Auto-levels (contrast stretch)
Unsharp mask (edge sharpening)

This can improve recognition on degraded scans. It has no effect on already high-quality images (the operations are no-ops on uniform or near-uniform input).

When to use:

Old or faded documents with low contrast
Photocopies or faxes with grey backgrounds
Any scan where accurate mode still misses characters

When not to use:

Clean, high-quality scans — preprocessing adds latency without benefit
Documents where preserving exact pixel appearance matters (the preprocessing modifies the image passed to Vision, not the output PDF)

ocrmypdf --apple-preprocess faded_scan.pdf output.pdf

`--apple-postprocess`

Runs Foundation Models (Apple's on-device LLM, part of Apple Intelligence) on the extracted text to correct OCR recognition errors — digit/letter confusions, split words, missing punctuation.

Requirements:

Apple Intelligence must be enabled in System Settings → Apple Intelligence & Siri
The on-device model must be downloaded (happens automatically after enabling)

The plugin checks availability at runtime and silently skips post-processing if Foundation Models is not available, so it is safe to include the flag in scripts.

When to use:

Documents with OCR errors that matter — forms, contracts, medical records
Historical documents or non-standard typefaces prone to character confusion
When the text layer will be used for full-text search or copy-paste

When not to use:

Large batch jobs — Foundation Models adds latency per page (model inference on-device)
Documents where exact spacing and layout must be preserved character-for-character
When Apple Intelligence is not available on the system (the flag is silently ignored)

ocrmypdf --apple-postprocess important_document.pdf output.pdf

Combining options

Options compose freely:

# Best quality: preprocess + accurate OCR + LLM correction
ocrmypdf --apple-preprocess --apple-postprocess degraded_scan.pdf output.pdf

# Maximum speed: fast OCR, no extras
ocrmypdf --apple-ocr-level fast large_batch.pdf output.pdf

# Fast batch with preprocessing, skip LLM
ocrmypdf --apple-ocr-level fast --apple-preprocess batch_input.pdf output.pdf

OCRmyPDF option compatibility

`--rotate-pages`

Recommended. The plugin implements orientation detection using Vision Framework — it tries all four cardinal rotations and picks the one with the highest OCR confidence. This is likely more accurate than Tesseract's orientation detection, particularly for non-Latin scripts.

ocrmypdf --rotate-pages input.pdf output.pdf

`--deskew`

No benefit — skip it. get_deskew() always returns 0.0; OCRmyPDF's deskew step (which uses Tesseract internally) is bypassed. Vision Framework is robust to minor skew anyway.

`--clean`

Situational. OCRmyPDF's --clean runs unpaper to remove scan artifacts such as border shadows, punch holes, and grey backgrounds. The plugin's --apple-preprocess already applies auto-contrast and unsharp mask before OCR, so --clean adds little for typical scans. It may help with very low-quality originals that have physical artifacts --apple-preprocess does not address.

Language support

The plugin accepts both Tesseract-style ISO 639-3 codes (e.g. eng, fra, jpn) and Vision BCP-47 codes (e.g. en-US, fr-FR, ja-JP). Specify languages with the standard -l / --language flag:

# Single language
ocrmypdf -l fra french_document.pdf output.pdf

# Multiple languages ('+' separator)
ocrmypdf -l eng+jpn mixed_document.pdf output.pdf

Supported languages include: English, French, German, Spanish, Italian, Portuguese, Dutch, Danish, Norwegian, Swedish, Polish, Czech, Romanian, Turkish, Russian, Ukrainian, Arabic, Thai, Vietnamese, Indonesian, Malay, Japanese, Korean, Chinese Simplified, Chinese Traditional.

Run ocrmypdf --list-langs to see the authoritative list from Vision at runtime.

When no language is specified, Vision auto-detects the script. After OCR, NaturalLanguage identifies the dominant language and logs it at DEBUG level (--verbose 2).

Troubleshooting

ocrmypdf-apple-sdk requires macOS 26+ The plugin enforces a macOS 26 minimum. Earlier macOS versions lack required Vision and Foundation Models APIs.

Foundation Models post-processing silently skipped Enable Apple Intelligence in System Settings → Apple Intelligence & Siri and wait for the on-device model to finish downloading. Run with --verbose 2 to see the reason logged.

PriorOcrFoundError The input PDF already has a text layer. Use --force-ocr to re-OCR, or --redo-ocr to replace only the text layer.

Plugin loaded twice / ValueError: plugin already registered Never pass plugins=["ocrmypdf_apple_sdk"] when calling ocrmypdf.ocr() in Python — the plugin loads automatically via its entry point.

Comparison with ocrmypdf-AppleOCR

ocrmypdf-AppleOCR (mkyt) is the other macOS-native plugin. Both use Apple Vision for OCR. The key differences:

	ocrmypdf-apple-sdk (this project)	ocrmypdf-AppleOCR (mkyt)
Rasterization	PDFKit + CoreGraphics (no Ghostscript)	Ghostscript (unchanged)
Tesseract dependency	None — fully blocked	Required (for orientation detection)
OCR engine	`VNRecognizeTextRequest` (accurate/fast)	`VNRecognizeTextRequest` + `VKCImageAnalyzer` (LiveText)
Vertical CJK text	Not supported	Supported via LiveText mode
Word-level boxes	Yes	Line-level only
Handwriting mode	`--apple-handwriting`	Not supported
LLM post-processing	`--apple-postprocess` (Foundation Models)	Not supported
Language detection	NaturalLanguage framework	Not supported
Min macOS	26	12+ (13+ for LiveText)

LiveText mode (VKCImageAnalyzer)

The main OCR capability this project lacks is AppleOCR's LiveText mode, which uses VisionKit's private VKCImageAnalyzer API instead of VNRecognizeTextRequest. LiveText provides two things the standard Vision API does not:

Vertical CJK text layout: VKCImageAnalyzer exposes a layoutDirection() on each text line. For vertically-set Japanese, Chinese, and Korean, it returns a vertical layout code that allows correct bounding-box orientation in the output PDF. With VNRecognizeTextRequest only, vertical CJK columns get axis-aligned boxes that don't reflect the actual text direction.
Quad geometry: results use four corner points (quad()) instead of axis-aligned rectangles, enabling precise text layer placement for skewed or rotated lines.

The tradeoffs: LiveText is a private API (VKCImageAnalyzer is not in the public SDK), requires manual PyObjC metadata registration, needs a separate subprocess with a Cocoa event loop, and does not expose confidence scores.

Architecture notes

No custom parallelism — OCRmyPDF handles page-level parallelism via its worker pool. Apple frameworks dispatch internally to CPU/GPU/ANE. Adding extra threading would contend with both.
Stateless engine — OcrEngine methods are static; no state is stored on the engine instance. Safe for use across forked processes.
pyobjc bridging — all Apple framework calls go through pyobjc. Objects are converted to plain Python types before returning across process boundaries (pyobjc objects are not picklable).
No subprocess calls — the plugin uses framework APIs directly. ocrmypdf's Ghostscript rasterizer is replaced by PDFKit for the default auto rasterizer setting.

Development

uv sync                                     # install all deps including dev extras
uv run ruff format src/ tests/              # format
uv run ruff check src/ tests/              # lint
uv run pyrefly check                        # typecheck
uv run pytest tests/                        # unit + integration + property tests
uv run pytest benchmarks/ --benchmark-only  # performance benchmarks

See DEVELOPMENT.md for the full implementation plan, architecture decisions, test strategy, and benchmark methodology.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.claude		.claude
.github/workflows		.github/workflows
benchmarks		benchmarks
src/ocrmypdf_apple_sdk		src/ocrmypdf_apple_sdk
tests		tests
.gitignore		.gitignore
.python-version		.python-version
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ocrmypdf-apple-sdk

What it does

Performance

Prerequisites

Installation

From source (development)

Into an existing environment

Basic usage

Image input

Options

`--apple-ocr-level {accurate|fast}`

`--apple-preprocess`

`--apple-postprocess`

Combining options

OCRmyPDF option compatibility

`--rotate-pages`

`--deskew`

`--clean`

Language support

Troubleshooting

Comparison with ocrmypdf-AppleOCR

LiveText mode (VKCImageAnalyzer)

Architecture notes

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ocrmypdf-apple-sdk

What it does

Performance

Prerequisites

Installation

From source (development)

Into an existing environment

Basic usage

Image input

Options

--apple-ocr-level {accurate|fast}

--apple-preprocess

--apple-postprocess

Combining options

OCRmyPDF option compatibility

--rotate-pages

--deskew

--clean

Language support

Troubleshooting

Comparison with ocrmypdf-AppleOCR

LiveText mode (VKCImageAnalyzer)

Architecture notes

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`--apple-ocr-level {accurate|fast}`

`--apple-preprocess`

`--apple-postprocess`

`--rotate-pages`

`--deskew`

`--clean`

Packages