Demosaic fixes/regressions#20428
Demosaic fixes/regressions#20428jenshannoschwalm wants to merge 13 commits intodarktable-org:masterfrom
Conversation
|
When doing the diff-dump with complete the first image showing the diff is EDIT: just checked,
Before going into this i will check all the other demosaicers ... |
44761f1 to
a4d248a
Compare
|
@TurboGit a lot of tedious work, i think i have now fixed a large bunch of demosaicing related issues, many/most have been there for quite long. Hopefully a lot of the border issues are gone now. I would give these fixes a go for now. |
|
@jenshannoschwalm : Thanks, I have to analyze the following tests, it will take some time and will report back if I find something suspicious: |
|
I first check the simple rcd demosaicer, that already shows new artefacts due to clip&zoom happening here in demosaicer. |
|
EDIT: converted again to draft, there are still border issues at very strong blavk/white transitions for PPG and RCD (they share that part) you have to wait a bit longer :-) |
|
Ok, anyway here is the log of the integration tests run: |
516178d to
3838996
Compare
|
@TurboGit finally this all is ok to me. Heavy stuff for reviewing i guess :-) A lot of fixes done, notably for all images doing VNG, PPG, LMMSE, RCD there will be differences also for CPU code as the border calculations have all been fixed so that a) differences between CPU and GPU are marginal and b) choosing a better algo (only borders). All xtrans CPU code for example got the better algo also for markje, that was simply missing since 2016 i think. Differences bewtween CPU & OpenCL have been largely reduced for green equalizing and color smoothing. I tested all this with pipe differences in demosaic with hq processing on to see what happens in demosaic and what happens while scrop&scale in demosaic. there are differences which i couldn't track down yet. BTW a) for all interpolators and b) also after ensuring demosaic output stays above zero. How to proceed? the crop&scale is the obvious next important topic to check, don't know is oncl OpenCL code is bad or CPU or even both. Do you have any idea about this? For sure we can't update the integration expected images now as we have to understand the issue before doing so. At least discrepances while scaling seem to be pretty heavy and are the cause of most differences. Tested on AMD RustiCL and ROCm btw. |
3838996 to
9ba5df1
Compare
|
I think i got the interpolator bugs :-) |
9ba5df1 to
166e8e9
Compare
|
Before you go into review&testing i'll do a cleanup, hold on |
|
Understood, thanks. |
166e8e9 to
089aefc
Compare
1. The internal tiling for RCD CPU code requires a border of 10 to avoid subtle instabilities at the overlapping parts. Fixed comments accordingly. 2. The PPG demosaicer definitely requires data to be unclipped in internal algorithms thus output has been improved both for CPU and OpenCL code, almost no detectable differences any more. 3. As RCD used a slightly modified variant of PPG (we don't need to calculate data outside the border), code could be refactored for CPU and OpenCL. 4. In RCD and LMMSE we normalize data so we down/upscale to ensure this. We now use safer scalers reducing artifacts and possibly don't loose precision in CPU/OpenCL code. For RCD the normalizing is not strictly required ... 5. Demosaicers don't have to clip output to non-negatives 6. Introduce and first use of read_imagef helper inline functions
1. Fix OpenCL VNG full interpolation, we must not touch the outermost 2 pixels, they have been interpolated already in the linear phase. 2. Ensure identical results for VNG initial borders. Checked results for two possible algorithms, average vs ppg-like original. There are subtle differences, but overall the ppg-like seems to be better and we now use it for both code paths. 3. Fix VNG final green mixing for bayer4 sensors. Do this at the end of processing for both CPU and GPU code. 4. We don't avoid negative input for slightly better results.
Don't touch outermost one-pixels, they are used for next/previous row/column, smoothing over-the-edge leads to clear artifacts as data are mirrored from the opposite border.
1. Use doubles to accumulate for full image green averaging in OpenCL sampling, the second reduce kernel uses kahan sum for increased precision for less CPU vs OpenCL difference, both don't fully fix differences but less diffs. 2. In OpenCL local greens averaging we correct the same lines as we do in cpu code.
Don't ensure non-negatives for this preliminary demosaicing code, the demosaicers will have to handle this.
Pass icoeffs as a float4 instead of cl_mem, use inlines for reading.
1. The CPU code deserves the better border calculation as we do for OpenCL since very long. There is a small performance penalty but with clearly better results. I was likely just overseen for many years. 2. Some minor fixes for initalizing data for far less differences between CPU OpenCL. 3. The VNG linear interpolation code needed some modifications to handle borders. Reminder: The markesteijn algorithm does very heavy maths with a lot of intermediate results leading to more-than-usual OpenCL vs CPU differences even without special 'native' instructions.
When doing - dt_interpolation_resample() dt_interpolation_resample_roi() - dt_interpolation_resample_cl() dt_interpolation_resample_roi_cl() - dt_interpolation_resample_1c() we don't attempt to keep data above zero, if this is required we must correct outside these interpolators. Notes: - the cpu code did badly on intermediate results - the fast copy modes behave accordingly as keeping data as they are
Demosaicer input has negatives, also it's output can have negs due to internal algorithms even if input didn't. All OpenCL demosaicers read data safely (leaving -FLT_MAX) even if the image was not correctly initialized, we had such issues on AMD systems with bad things following, nvidia/intel seemed to return zero in such cases. If things in the pixelpipe work fine all should be good even with negs fed into the pipe, following code in color maths must handle that. Remember there are colorsoaces with neg values so the interpolaters should not try to keep data non-negative. We can possibly add code prohibiting negative demosaicing data by uncommenting DT_DEMOSAIC_POSITIVE
1. Calculation of Y0 mask is done minimally different leading to a slightly better stability in regions with any channal below zero. 2. The CPU code now works exactly as OpenCL with subtle p3erf gains as we multiply instead of devide depending on compiler. qwerv
76f74bf to
68fc6a7
Compare
|
@TurboGit let me describe my current situation.
|
|
@jenshannoschwalm : If you are able to get the images I suppose the issue is with the Let's check this: For 0001-exposure as you have output.png, just do that: What's the output on your side? You also need
Again, what is the output on your side for this command? |
|
|
Ah so This tool come from |
|
And just upgraded to 7.1.2.16 and still working fine. |
|
Here it it 7.1.1.47, the build has png16 included. |
|
And with :
|
|
For hazeremoval it's I supected some number-interpretation error in python code and commented the line 256 |
|
Looks like the 2 numbers on your case are inverted (the number of diff pixels are in parenthesis, on my side it is the first number). Let me rework this to not use |
|
Maybe there is some formatting option? |
|
I haven't found one. Here are two scripts, just copy them without the .txt into the integration test root directory. You need numpy for the count-diff-pixels script. (I have renamed run.sh to just run) |
|
after making both executable this works fine! |
|
@TurboGit @kofa73 i checked this back to 69077b9 (some earlier commits i couldn't compile any more) and all versions showed the large discrepancies ... i tested and logged as included You will find the bash script i used to test the relevant sets, all logs are named 2056526 Can you confirm this finding (the bug being that old? |
|
I may only have time during the weekend, but sure, I'll try to help if I can by checking. |
|
Just confirming that we have the regression since so long would be good. |
|
@jenshannoschwalm : Yes, I'm pretty sure the diff between CPU & GPU is there since a long time. |
|
For CE there is a lot of matrix math happening i'll .check again. Also i spotted some trivial box filter. Lot of pending work ... |


Fix RCD
overlapping parts.
We has these issues shown in diff-dumps

Fix VNG
@TurboGit these fix the most annyoing problems with CPU vs GPU diffs.
There is also a PPG issue i am still investigation following in another PR.
We will need integration updates as for almost all xtrans, vng and rcd related tests there will be desired differences :-)