Skip to content

Automatic duplicate removal#16

Merged
patrickbr merged 23 commits intomasterfrom
duplicate-removal
Mar 6, 2026
Merged

Automatic duplicate removal#16
patrickbr merged 23 commits intomasterfrom
duplicate-removal

Conversation

@patrickbr
Copy link
Copy Markdown
Member

@patrickbr patrickbr commented Feb 13, 2026

With this PR, spatialjoin automatically detects duplicate geometry parts above a threshold size (number of anchor points). This works across multi geometries. Duplicates are replaced by references to the original geometry. Duplicate removal is done by iterating over the event list once (sorted by left x coordinate) and checking duplicates in blocks of equal x coordinates.

Also changed along the way: previously, if at least one reference geometry was present, every geometry was first compared to itself as a duplicate to resolve potential references. This added a small, but measurable time overhead. This is now replaced by special "self-check" events in the event list which are added for geometries which are referenced somewhere and which trigger such a self check.

@patrickbr
Copy link
Copy Markdown
Member Author

patrickbr commented Mar 5, 2026

As references are now added automatically for duplicates, reference support for the DE9IM computation and --within-distance was required for this to be merged into master. This is now finished.

I also fixed a very subtle bug in the duplicate removal process which only manifested itself on Mac machines (it was only triggered by a very specific sorting order of events with the same X coordinate in the sweep event queue) and was impossible to reproduce on non-Mac machines.

@patrickbr patrickbr merged commit 7063cf2 into master Mar 6, 2026
4 checks passed
@patrickbr patrickbr deleted the duplicate-removal branch March 6, 2026 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant