-
Notifications
You must be signed in to change notification settings - Fork 3.3k
feat: Add DeepSeek-OCR integration #2721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Add DeepSeek-OCR integration #2721
Conversation
|
✅ DCO Check Passed Thanks @rafaeltuelho, all your commits are properly signed off. 🎉 |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
74b3bcd to
076c3ad
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
It seems the reason for the lack of coverage in I had to use Google Colab (T4) to test it manually. What is the recommended approach here? |
- Add DeepSeekOcrModel with automatic device detection (CUDA → MPS → Error) - Add DeepSeekOcrOptions for configuring the OCR engine - Support CUDA with bfloat16 and flash_attention_2 (optimal performance) - Support MPS (Apple Silicon) with float16 and eager attention (requires PyTorch 2.7.0+) - Auto-switch to MPS-compatible model (Dogacel/DeepSeek-OCR-Metal-MPS) on Apple Silicon - Add clear error messages for unsupported configurations - Add mock-based unit tests for CI coverage without GPU hardware - Update E2E tests with DOCLING_TEST_DEEPSEECOCR environment variable guard Note: MPS support requires PyTorch 2.7.0+ for aten::_upsample_bicubic2d_aa operator. See: https://github.com/Dogacel/DeepSeek-OCR-Metal-MPS/discussions Signed-off-by: Rafael T. C. Soares <[email protected]>
076c3ad to
4e93020
Compare
|
@rafaeltuelho Thanks for the starting the contribution. It is definitely something which was on our radar as well. The key question we would like to assess is if this model should be exposed as OCR engine or as model in the VLM pipeline. |
|
@rafaeltuelho not sure if that aligns with what @dolfim-ibm refers to: as a user it would be amazing to be able to integrate deepseek ocr as an external service, i.e., via api calls, instead of as a local model as part of the regular pipeline |
@simonschoe Untested, but I think you could already use DeepSeekOCR with the markdown prompt for the VLM API Docling settings. https://docling-project.github.io/docling/examples/vlm_pipeline_api_model/ |
@dolfim-ibm That's a good question. I remember I have only used/tested the VLM for Picture description. Is it possible to run the VLM pipeline for OCR-based conversion? I tried to follow the same approach used by other OCR (eg: EasyOcr) already supported in Docling |
Note:
Resolves #2497
Checklist: