Skip to content

[AI] AI object mask tool#20378

Open
andriiryzhkov wants to merge 37 commits intodarktable-org:masterfrom
andriiryzhkov:split/ai-object-mask
Open

[AI] AI object mask tool#20378
andriiryzhkov wants to merge 37 commits intodarktable-org:masterfrom
andriiryzhkov:split/ai-object-mask

Conversation

@andriiryzhkov
Copy link
Contributor

@andriiryzhkov andriiryzhkov commented Feb 21, 2026

Adds a new mask tool that lets users select objects in the image by clicking. Built on the AI subsystem from #20322.

How it works

AI object mask is an interactive single-object selection tool. The user activates the object mask tool, waits for the image to be encoded (background thread), then clicks to place foreground/background point prompts. The model segments the object in real time. Right-click finalizes the selection by vectorizing the raster mask into Bézier path forms that integrate with darktable's existing mask system.

Architecture

  • Segmentation engine (src/ai/segmentation.c): Implements the two-stage encoder/decoder pipeline. Supports both SAM2.1 (multi-mask + IoU selection + low-res refinement) and SegNext (single mask, full-res refinement). Encoder outputs are cached so multiple clicks don't re-encode.

  • Object mask tool (src/develop/masks/object.c): Runs image encoding in a background thread to keep the UI responsive. Displays a "working..." overlay during encoding. Supports foreground clicks (label 1), background clicks (label 0), and box prompts (SAM only).

  • Raster-to-vector (src/common/ras2vect.c): Extended with cleanup (turdsize), smoothing (alphamax), and boundary sign output for hole detection.

Models

The segmentation engine supports both SegNext and SAM model architectures. SegNext is the default — it produces good enough results and is compliant with the Open Source AI Definition. Models are downloaded on demand by the AI subsystem from the model repository: https://github.com/andriiryzhkov/darktable-ai

Depends on #20322
Fixes #12295

@TurboGit
Copy link
Member

Thanks for this new implementation. I'll test soon.

@TurboGit
Copy link
Member

@andriiryzhkov : I have created the darktable-org/darktable-ai repository. You should be able to clone this repository and create a PR to initialize it. If needed I can initialize it with the current content of your darktable-ai repo.

@TurboGit TurboGit added this to the 5.6 milestone Feb 22, 2026
@TurboGit TurboGit added priority: low core features work as expected, only secondary/optional features don't feature: new new features to add difficulty: hard big changes across different parts of the code base scope: image processing correcting pixels labels Feb 22, 2026
@andriiryzhkov
Copy link
Contributor Author

@TurboGit I will create a PR with models that are ready, because I have some models just for next experiments. I will keep them separate.

@TurboGit
Copy link
Member

Sounds good to me.

@andriiryzhkov
Copy link
Contributor Author

@TurboGit can you initialize repository darktable-org/darktable-ai with some empty file, so I will be able to fork it?

@andriiryzhkov
Copy link
Contributor Author

After testing, I can confirm SegNext runs a little bit slower than SAM2.1, though I was able to optimize model loading and inference to make it reasonable.

Mask quality is lower than SAM2.1. I had to tweak mask post-processing — primarily to cut off small fragments around the mask edges.

That said, SegNext works reasonably well as an interactive tool: providing both foreground (positive) and background (negative) points produces decent results.

One additional limitation: SegNext does not support box prompts, which is a minor disadvantage compared to SAM.

@andriiryzhkov
Copy link
Contributor Author

Update

The AI object mask tool now uses a brush stroke for initial object selection instead of a single click. This was the best way to improve selection quality with SegNext model - a single click didn't always produce good results, and these model don't support box prompts. The brush stroke is resampled into evenly-spaced foreground points via arc-length parameterization, providing a much richer prompt for the segmentation decoder. From my experience, the results are significantly better.

How it works

The tool now has two stages:

Stage 1 - Brush selection:

  • Drag over the object to paint a selection stroke
  • A short click also counts as a completed stroke
  • Scroll to adjust brush size
  • Ctrl+scroll to adjust opacity

Stage 2 - Point refinement:

  • Click to add foreground points (+)
  • Shift+click to add background points (-)
  • Alt+click to clear and start over
  • Right-click to apply the final mask

All points (brush and refinement) are kept throughout the session and sent to the decoder together, so each refinement click builds on the full context.

Performance note

SegNext is a larger model than SAM, so on slower computers it may be less responsive. I had to turn off CoreML and DirectML acceleration because model load and conversion took longer than CPU inference for this model.

Some samples (all with SegNext default model)

mask_man mask_sunflower mask_monkey mask_bird

Object selection is not always perfect and you don't always get a good result on the first try. But overall it already feels quite capable. And this is with a model that meets open-source AI criteria.

@TurboGit
Copy link
Member

@andriiryzhkov : Where is the model to test? From dt it cannot be downloaded (not found) and I don't see it in the repo.

@andriiryzhkov
Copy link
Contributor Author

It is still from my repository

plugins/ai/repository=andriiryzhkov/darktable-ai

To download models from https://github.com/darktable-org/darktable-ai we need to create release there first with version like 5.5.0.x. This will trigger GitHub Action which converts, packs and attaches model packages as assets to release files.

@TurboGit
Copy link
Member

@andriiryzhkov : Maybe something messed up on my side then.

I do have mask-object-segnext-b2hq :

$ ls -d ~/.local/share/darktable/models/mask-ob*
/home/obry/.local/share/darktable/models/mask-object-segnext-b2hq

But on UI it is has the not downloaded status:

image

And if I try to download I get:

image

And if I try the object mask button:

image

So I'm stuck :) Any idea?

@andriiryzhkov
Copy link
Contributor Author

andriiryzhkov commented Feb 26, 2026

@TurboGit: I should have highlighted this change better - in order to improve models discoverability I decided to limit the models location to the config folder ~/.config/darktable/models/ (instead of ~/.local/share/darktable/models/).

So you just need to move your models:

mv ~/.local/share/darktable/models/mask-object-* ~/.config/darktable/models/

For download option you need to make sure parameter in darktablerc looks like this:

plugins/ai/repository=andriiryzhkov/darktable-ai

That is current default value in PR.

@TurboGit
Copy link
Member

~/.config/darktable/models/

Not a good change because the ~/.config/darktable is used also to store DB and people can have multiple database for different work (--configdir option) and we don't want people to have to download the AI models for each one.

The ~/.local/share/darktable/models/ (as the name implies) is a shared directory.

@andriiryzhkov
Copy link
Contributor Author

Not a good change because the ~/.config/darktable is used also to store DB and people can have multiple database for different work (--configdir option) and we don't want people to have to download the AI models for each one.

I agree. I run into a problem on Windows with ~/.local/share/darktable/models/ equivalent, so made this quick fix. Anyway, I need to research how to improve this.

@andriiryzhkov
Copy link
Contributor Author

@TurboGit: Fixed in #20322, merged here. Models now use g_get_user_data_dir() instead of config dir.

@TurboGit
Copy link
Member

Just tested a bit, the model is quite slower indeed and I have some issues with the refinement.

Use the integration test mire1.cr2 brush the bottle on the right. It misses some part on the left of the bottle, click to add this area but in fact the whole area shrink. See the video:

Capture.video.du.2026-02-26.18-28-06.mp4

I have first selected the bottle, and then 2 times clicked on the missed area. I have been able to reproduce on 2 images tested out of 3.

BTW, when the masks are computed we need a busy cursor.

@andriiryzhkov
Copy link
Contributor Author

SegNext and SAM 2.1 are different models with different design goals, so a direct quality comparison isn't entirely fair. In the context of AI object masking specifically, SegNext is a bit slower and less accurate than SAM 2.1 – though the difference is not huge.

In terms of model choice, we still have two options: proceed with SAM 2.1 (small), which is faster and produces better quality masks but carries Meta's data provenance concerns, or go with SegNext (base), which is cleaner in terms of training data transparency.

No model produces a perfect mask in every situation – and that's fine. After many years working with AI professionally, the question I always come back to is: does this save the user a significant amount of time to achieve the desired result? If yes, it's worth the trade-offs – even if the model isn't technically flawless. With AI object masking, the output is always a path mask, which means any inaccuracies can be refined manually. The user retains full control; the AI just removes the tedious baseline work.

Regarding the bottle selection in the demo – the missing part of the mask is near the edge where two dark bottles are close together. The model simply can't distinguish the boundary in that region. In such situations, no amount of refinement clicks will help: if the model can't perceive the edge, it won't mask it correctly. This is a known limitation of contrast-dependent segmentation, not a bug.

Personally, I would lean towards SAM 2.1 – but I don't want this choice to become a divisive topic for the community. I'm happy to go with whatever direction we reach consensus on.

BTW, when the masks are computed we need a busy cursor.

Noted.

@TurboGit
Copy link
Member

SegNext and SAM 2.1 are different models with different design goals, so a direct quality comparison isn't entirely fair.

I agree with that, but here I'm not comparing but more trying to see what could be the best for Darktable. I'll continue testing. At the moment the situation of SegNext is not compelling.

the question I always come back to is: does this save the user a significant amount of time to achieve the desired result? If yes, it's worth the trade-offs – even if the model isn't technically flawless.

Exactly, and for me at the moment I'm not sure SegNext qualify.

In terms of model choice, we still have two options: proceed with SAM 2.1 (small), which is faster and produces better quality masks but carries Meta's data provenance concerns

Agreed too, another alternative is to provides both and let user do the choice.

Personally, I would lean towards SAM 2.1

My feeling at the moment too.

Let me do some more tests. Again thanks for the hard work put on this.

@TurboGit
Copy link
Member

TurboGit commented Mar 6, 2026

Tested again, and to me the SegNext is of poor quality. I would like to be able to propose also from the AI preferences the SAM 2.1 small (not the default) but at least this will give a high quality AI masking option.

Having only SegNext will make Darktable looks really bad when compared to the alternatives and at the end this option will certainly not adopted by most people. This is in the spirit of what I want for Darktable, the freedom to use what they found OK for their work depending on their sensitivity about AI and models.

@andriiryzhkov
Copy link
Contributor Author

I will prepare PR to add SAM 2.1 models to https://github.com/darktable-org/darktable-ai repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

difficulty: hard big changes across different parts of the code base feature: new new features to add priority: low core features work as expected, only secondary/optional features don't scope: image processing correcting pixels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AI Masks

3 participants