feat: Add DeepSeek-OCR integration #2721

rafaeltuelho · 2025-12-04T01:34:01Z

Add DeepSeekOcrModel with automatic device detection (CUDA/MPS)
CUDA uses bfloat16 precision and flash_attention_2 (optimal)
MPS uses float16 precision and eager attention (Apple Silicon fallback)
Auto-switch to MPS-compatible model (Dogacel/DeepSeek-OCR-Metal-MPS)
Add PyTorch 2.7.0+ version validation for MPS support
Add clear error messages for device/version incompatibilities
Update test_e2e_ocr_conversion.py with CUDA/MPS device support
Add manual test script for DeepSeek-OCR validation
Update documentation with MPS support information

Note:

MPS support requires PyTorch 2.7.0+ and is currently blocked by a transformers version incompatibility in the community MPS model. See: https://huggingface.co/Dogacel/DeepSeek-OCR-Metal-MPS/discussions
Tested on Google Colab T4 using this Notebook https://colab.research.google.com/gist/rafaeltuelho/37dac562ee26ac291d08d288f3774e04/docling_deepseekocr_testing.ipynb

Resolves #2497

Checklist:

[y] Documentation has been updated, if necessary.
[y] Examples have been added, if necessary.
[y] Tests have been added, if necessary.

github-actions · 2025-12-04T01:34:12Z

✅ DCO Check Passed

Thanks @rafaeltuelho, all your commits are properly signed off. 🎉

dosubot · 2025-12-04T01:34:24Z

Related Documentation

Checked 4 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

mergify · 2025-12-04T01:34:36Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

codecov · 2025-12-04T10:02:14Z

Codecov Report

❌ Patch coverage is 59.56284% with 74 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
docling/models/deepseek_ocr_model.py	55.68%	74 Missing ⚠️

📢 Thoughts on this report? Let us know!

rafaeltuelho · 2025-12-04T20:05:11Z

It seems the reason for the lack of coverage in docling/models/deepseek_ocr_model.py is that the CI tests can't run DeepSeek-OCR tests because no GPU (CUDA/MPS) is available in the CI environment. Also, the DOCLING_TEST_DEEPSEECOCR environment variable is not set.

I had to use Google Colab (T4) to test it manually.

What is the recommended approach here?

- Add DeepSeekOcrModel with automatic device detection (CUDA → MPS → Error) - Add DeepSeekOcrOptions for configuring the OCR engine - Support CUDA with bfloat16 and flash_attention_2 (optimal performance) - Support MPS (Apple Silicon) with float16 and eager attention (requires PyTorch 2.7.0+) - Auto-switch to MPS-compatible model (Dogacel/DeepSeek-OCR-Metal-MPS) on Apple Silicon - Add clear error messages for unsupported configurations - Add mock-based unit tests for CI coverage without GPU hardware - Update E2E tests with DOCLING_TEST_DEEPSEECOCR environment variable guard Note: MPS support requires PyTorch 2.7.0+ for aten::_upsample_bicubic2d_aa operator. See: https://github.com/Dogacel/DeepSeek-OCR-Metal-MPS/discussions Signed-off-by: Rafael T. C. Soares <[email protected]>

dolfim-ibm · 2025-12-05T09:09:22Z

@rafaeltuelho Thanks for the starting the contribution. It is definitely something which was on our radar as well.

The key question we would like to assess is if this model should be exposed as OCR engine or as model in the VLM pipeline.

simonschoe · 2025-12-05T14:55:13Z

@rafaeltuelho not sure if that aligns with what @dolfim-ibm refers to: as a user it would be amazing to be able to integrate deepseek ocr as an external service, i.e., via api calls, instead of as a local model as part of the regular pipeline

dolfim-ibm · 2025-12-05T15:01:59Z

@rafaeltuelho not sure if that aligns with what @dolfim-ibm refers to: as a user it would be amazing to be able to integrate deepseek ocr as an external service, i.e., via api calls, instead of as a local model as part of the regular pipeline

@simonschoe Untested, but I think you could already use DeepSeekOCR with the markdown prompt for the VLM API Docling settings. https://docling-project.github.io/docling/examples/vlm_pipeline_api_model/

rafaeltuelho · 2025-12-05T15:17:23Z

The key question we would like to assess is if this model should be exposed as OCR engine or as model in the VLM pipeline.

@dolfim-ibm That's a good question. I remember I have only used/tested the VLM for Picture description. Is it possible to run the VLM pipeline for OCR-based conversion?

I tried to follow the same approach used by other OCR (eg: EasyOcr) already supported in Docling

rafaeltuelho force-pushed the feature/deepseek-ocr-integration branch from 74b3bcd to 076c3ad Compare December 4, 2025 02:02

rafaeltuelho force-pushed the feature/deepseek-ocr-integration branch from 076c3ad to 4e93020 Compare December 4, 2025 21:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add DeepSeek-OCR integration #2721

feat: Add DeepSeek-OCR integration #2721

rafaeltuelho commented Dec 4, 2025

Uh oh!

github-actions bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

dosubot bot commented Dec 4, 2025

Uh oh!

mergify bot commented Dec 4, 2025

Uh oh!

codecov bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

rafaeltuelho commented Dec 4, 2025

Uh oh!

dolfim-ibm commented Dec 5, 2025

Uh oh!

simonschoe commented Dec 5, 2025

Uh oh!

dolfim-ibm commented Dec 5, 2025

Uh oh!

rafaeltuelho commented Dec 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Add DeepSeek-OCR integration #2721

Are you sure you want to change the base?

feat: Add DeepSeek-OCR integration #2721

Conversation

rafaeltuelho commented Dec 4, 2025

Uh oh!

github-actions bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dosubot bot commented Dec 4, 2025

Uh oh!

mergify bot commented Dec 4, 2025

Merge Protections

🟢 Enforce conventional commit

Uh oh!

codecov bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rafaeltuelho commented Dec 4, 2025

Uh oh!

dolfim-ibm commented Dec 5, 2025

Uh oh!

simonschoe commented Dec 5, 2025

Uh oh!

dolfim-ibm commented Dec 5, 2025

Uh oh!

rafaeltuelho commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Dec 4, 2025 •

edited

Loading

codecov bot commented Dec 4, 2025 •

edited

Loading

rafaeltuelho commented Dec 5, 2025 •

edited

Loading