Fix SWO issue in ICL core templates#57
Conversation
|
Re-reading #54 I see that it tackles the same problem in a more subtle, technical way (sidestepping SWO through heterogeneous lookups). I honestly had no idea that there exist lookups that are different in nature. You're right about the right_open_interval, that is something I overlooked. Maybe a the best solution is to combine the approaches and use heterogeneous point lookups. For me those point queries are easier to grasp... |
|
This commit fixes the problem @joaquintides raised (right_open_interval). Two points remain:
For the performance evaluation my expectation would be as follows: The descent complexity stays O(log(N)), but the kiss correction adds a fixed 2 comparisons at the end. So when only counting comparisons the #54 is slightly cheaper. I think the reason libc++ switched to single root-to-leaf descent is higher performance (both lookup and insert/erase) and lower memory consumption. What performance evaluations (discrete or dynamic/static and continuous, what compilers, library versions) did you have in mind? Honestly I think they are very, very close in practice. |
…guards numeric_minimum relied solely on is_numeric<T>::value to decide whether to query std::numeric_limits<T>. For types where is_numeric is true but numeric_limits is not specialized (e.g. boost::rational), min()/max() silently fall back to a default-constructed T(), producing wrong bound checks (boostorg#58). It also used min() for the lower extreme, which for non-integral types is the smallest *positive* value rather than the most negative one. Changes: - Guard all bound checks on std::numeric_limits<T>::is_specialized; unspecialized numeric domains now impose no constraint instead of comparing against a bogus default-constructed bound. - Use lowest() (not min()) for the lower extreme. - Add a numeric_maximum trait mirroring numeric_minimum to detect domain_next overflow, and assert it wherever an interval boundary is built via domain_next (singleton, unit_trail, hull) and in first(), making first()/last() symmetric. - Add regression tests for the bound traits (int, double, rational) and for first()/last() across all bound flavors. Fixes boostorg#58
The previous commit added the SWO-safe lower/upper_bound wrappers but left several call sites still using the raw container directly: find(), the BOOST_ASSERT in gap_insert, and the collision-range detection in the add/insert loops. Also adds the regression tests from PR boostorg#53 for broader coverage of the fixed accessors.
This commit fixes the problem that @joaquintides raised (right_open_interval). Continuous domains have no singleton to anchor on, so on >=C++14 the accessors now anchors on the domain point via heterogeneous lookup, with a single forward/back correction for an open-bound "kiss". To enable this, exclusive_less_than gains is_transparent plus interval-vs-point and point-vs-interval operator() overloads. One open question is wheter one includes a C++11 fallback with boost::container. The breakage occurs with libc++22 which implies an existing C++14 toolchain. interval_base_map and interval_base_set now return end() instead of m.lower_bound/upper_bound(interval) for empty queries.
Passing intervals overlapping stored elements violates
the comparator's SWO precondition (formally UB). In practice every standard
library tested tolerates it: a unique insert's single descent lands on the
overlapping run and rejects {pos,false}, the collision signal add() relies on,
and the hinted overload falls back to the same search. No tested library
misplaces or corrupts, at any hint position.
So this commit is hardening only, not a bug fix.
- Add _swo_insert (un-hinted + hinted) to interval_base_set/_map: probe via
the SWO-safe member lower_bound and insert only a disjoint key with a
correct hint, so the container only ever compares against disjoint
neighbours regardless of its algorithm; on collision report the colliding
position without mutating.
- Route add/insert_at/gap_insert through it. The residual/gap inserts in
add_front/add_segment/add_rear stay raw (disjoint-by-construction).
- Guard the C++11 singleton-anchored lower/upper_bound against the domain
min/max, where singleton() is undefined (left_open/open) or silently wraps
(right_open/open): fall back to begin()/end(), and back-correct with the
interval-vs-point comparator so it is valid at the extreme.
- Document the SWO container invariant atop interval_base_set.
- Add regression tests for overlapping un-hinted and hinted adds on
interval_set and interval_map.
|
Last commit hardens ICL inserts against SWO comparator violations. Rebased on PR #59. This is required as the hardening triggers a bug with boost::rational and numeric_minimum. |
The point-anchored lower_bound/upper_bound/equal_range accessors and the _swo_insert hardening were duplicated, near-identically, in interval_base_set and interval_base_map. Extract them into a single home, detail/interval_lookup.hpp, parameterized by a key projection (identity_key for sets, the value's interval key for maps). Both containers now forward their lookup and insert members to the shared helper, and the full strict-weak-ordering rationale lives in one place instead of being copied across two files. No behavioral change.
|
The SWO-safe interval lookup has been refactored and is now centralized. Two points remain: |
Overview
LLVM 22 libc++ switched from the classic search to an improved single root-to-leaf descent for std::set/std::map. The change is standard conforming and on a balanced tree of height H it almost halves the number of comparisons to find an element. This requires the std::set/std::map comparators to satisfy Strict Weak Ordering (SWO).
Boost.ICL stores intervals in a std::set/std::map keyed by exclusive_less_than (let's call it$\prec$ ). For two intervals A and B the relation $A \prec B \iff A$ lies entirely, exclusively, to the left of $B$ without a shared point. The relation it induces $A \approx B \iff \lnot (A \prec B) \land \lnot (B \prec A)$ means "A and B overlap or touch inclusively".
A SWO requires that the relation$\approx$ to be an equivalence relation (particularly transitive). But overlap isn't transitive:
$A=[0,3]$ , $B=[1,5]$ and $C=[4,5]$ we have $A \approx B$ and $B \approx C$ but $A \not \approx B$ . Transitivity fails, so $\prec$ is not a SWO. This is intrinsic to any "overlap" comparator and not a fixable corner case.
This PR fixes the SWO issue from a mathematical point of view focussing on minimal changes in the ICL library.
Why LLVM 22 breaks Boost.ICL
A red-black tree's invariant (every nodes key sorts after its left subtree, before its right subtree) only needs the comparator to be consistent to the keys actually stored. ICL stores pairwise-disjoint intervals (and on this pairwise disjoint set$\prec$ is a perfectly good total order - disjoint intervals are linearly arranged on the line. This means that the tree itself is always well-formed and the problem isn't in the stored intervals. The danger lies in querying the red-black tree with a non-stored key Q. The correctness of the lower_bound(Q)/upper_bound(Q) descend in the tree comparing the query Q against stored nodes. The correctness of the descent depends on transitivity property between the query and the stored keys. Q can be $\approx$ to two stored intervals that are themselves $\prec$ ordered.
TLDR: the container is fine, the interval-keyed query is the non-SWO operation and the optimized descent of LLVM 22 exposes that.
Fix
A single domain point p, compared against a set of pairwise-disjoint intervals, induces a relation that is a strict weak ordering. p lies in at most one of them, every other stored interval is strictly left of p or strictly right of p. This means left/right of a point is a total order consistent with the intervals own order. The fix is therefore only ask the tree point questions.
For lower_bound(Q) we anchor on lower(Q), for upper_bound(Q) we anchor on upper(Q). We have to be careful if our query's own open/closed bound at the endpoint.
lower_bound: It can happen that we need to step RIGHT, at most once. The closed anchor point can land on an interval that the query interval only kisses at an open lower bound e.g. stored [...,2] and we query (2,...]. This means the intervals don't overlap and we need to do one step forward to find genuinely overlapping interval.
upper_bound: It can happen that we need to step LEFT (back), at most once. The upper_point(point) returns the first element strictly right of upper(Q). The same reasoning as for the lower_bound applies and we need to exclude a kiss at an open upper bound.
TLDR: Get a correct starting position using a point search. Afterwards fix for kissing open bounds.
Fixes #51