Forward Solvers: Split local unknowns by digests #1971
Forward Solvers: Split local unknowns by digests #1971jerhard wants to merge 26 commits intofwd_constrsys_newfrom
Conversation
This breaks comparing constraint systems for the backward constraint systems (for now).
…, context, original- and current digest.
Handle man.split with the digest system for the foward-propagating constraint system. Handling of procedure calls still needs some work. When defaulting to "bu" as the default solver this commit increases the number of failing sanity tests to 38.
…hin called function.
This introduces makes the domain for function return nodes a mapping from unknowns to abstract values. The unknowns that are used as keys may differ by digest, in particular.
This will allow to have the globals defined by the analyses, globals for returns split by current digest, and globals for returns joined over current digests.
…gests at global return unknowns.
…ction is applicable works.
…ues from dead digests not being pruned
…on call pollute results. Also a problem without digests.
In fwdConstraints, in case that the callee does not return, introduce a return path with bottom for the longjmpLifter to do its work.
…viely. This implementation using man.split should work when using digests and when using pathsensitivity. The event list passed to man.split is empty, since there is nothing further to do.
…onal path-sensitivity with option solvers.fwd.digests.
|
I just realized that this isn't merging into master. That's why there's no forward solver in the diff here. |
There was a problem hiding this comment.
Pull request overview
This PR introduces digest-based splitting of local unknowns in the forward constraint system to realize path-sensitivity via analysis-spec P digests (configurable via solvers.fwd.digests). It also adapts result aggregation and the XSLT report output to present results grouped by (context, calling digest, current digest, abstract state).
Changes:
- Add digest-aware local variables and forward constraint generation that propagates/updates digests and supports splitting via
man.split. - Extend the framework with
Spec'/LVarSetsupport and a new 3-way lifted global lattice (Lift3/GVar3) to represent spec globals + per-return-node globals + return-node sets. - Update XSLT/XML result output and add new regression tests under
tests/regression/90-digests.
Reviewed changes
Copilot reviewed 19 out of 20 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
xslt/node.xsl |
Render node results as grouped <tuple> entries with optional digest sections. |
tests/regression/90-digests/01-function-call.c |
New regression test for digest-sensitive behavior across function calls. |
tests/regression/90-digests/02-mutex-lock-split.c |
New regression test for digest-sensitive mutex locking. |
tests/regression/90-digests/03-function-call-split.c |
New regression test for digest changes during a function call. |
tests/regression/90-digests/04-garbage-digest.c |
New regression test for digest “garbage collection” expectations. |
tests/regression/90-digests/05-garbage-call.c |
New regression test involving calls/precision; currently has a C11 declaration-order concern. |
tests/regression/90-digests/06-longjmp-return.c |
New regression test for longjmp/setjmp interactions with the solver. |
src/solver/td_simplified_ref_improved.ml |
Formatting-only adjustments. |
src/lifters/noDigestLifter.ml |
New lifter to expose unit digests when solvers.fwd.digests is disabled. |
src/lifters/longjmpLifter.ml |
Adjust setjmp handling to use man.split for separating normal vs longjmp return flows. |
src/framework/fwdControl.ml |
Switch forward analysis to Spec', integrate digest-aware result type, and wire in the new comparator. |
src/framework/fwdConstraints.ml |
Core change: digest-aware locals (VarDigestF), digest propagation/update, and return handling via return-node sets. |
src/framework/fwdCompareConstraints.ml |
New forward comparator adapted to digest-aware locals and the new forward globals. |
src/framework/control.ml |
Switch to Spec' in the (non-forward) control pipeline for shared infrastructure. |
src/framework/compareConstraints.ml |
Refactor comparator to work against SpecSys and adjust local-joining logic structure. |
src/framework/analysisResult.ml |
Change XML printing to emit <tuple> entries and add a digest-aware result entry type. |
src/framework/analyses.ml |
Introduce VarDigestF, GVarFCNW return variants, GVar3, Spec', and Spec2Spec'. |
src/domain/lattice.ml |
Add Lift3 lattice combinator to support 3-way lifted globals. |
src/config/options.schema.json |
Add new option solvers.fwd.digests (default false). |
src/common/domains/printable.ml |
Add Lift3Conf printable and Prod4 printable used by new result/global types. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/framework/fwdConstraints.ml
Outdated
| let s = Seq.map return_value return_nodes in | ||
| if Seq.is_empty s then | ||
| (* In case the calle does not return, create one bottom return path for LongjmpLifter to do its work *) | ||
| [S.D.bot ()] | ||
| else | ||
| List.of_seq s |
There was a problem hiding this comment.
Seq.is_empty s consumes the first element of the sequence s, so the subsequent List.of_seq s will miss that element (and may even become empty). This can drop return paths and lead to incorrect call/return semantics. Materialize the sequence once (e.g., convert to a list and test that list), or use Seq.uncons/equivalent to check emptiness without losing elements.
| let s = Seq.map return_value return_nodes in | |
| if Seq.is_empty s then | |
| (* In case the calle does not return, create one bottom return path for LongjmpLifter to do its work *) | |
| [S.D.bot ()] | |
| else | |
| List.of_seq s | |
| let return_nodes_list = List.of_seq return_nodes in | |
| if return_nodes_list = [] then | |
| (* In case the calle does not return, create one bottom return path for LongjmpLifter to do its work *) | |
| [S.D.bot ()] | |
| else | |
| List.map return_value return_nodes_list |
There was a problem hiding this comment.
I don't think this a problem, the sequence should be persistent, or?
| include S | ||
| module P = struct | ||
| include Printable.Unit | ||
| let of_elt _ = () |
There was a problem hiding this comment.
NoDigestLifter uses Printable.Unit for P, which will render digests as "()" in XML/HTML. Since fwdControl always outputs <original_digest>/<current_digest>, this makes the UI show digest sections even when solvers.fwd.digests is disabled. Consider overriding P.printXml/P.show to output an empty value (so the XSLT normalize-space(...) checks hide it), or conditionally omit digest tags in the result XML when digests are disabled.
| let of_elt _ = () | |
| let of_elt _ = () | |
| let show _ = "" | |
| let printXml _formatter _t = () |
| if get_bool "dbg.trace.context" then (* Print context and digest *) | ||
| dprintf "(%a, %a, %a) on %a" Node.pretty_trace x.node C.pretty x.context P.pretty x.original_digest CilType.Location.pretty (getLocation x) |
There was a problem hiding this comment.
VarDigestF.pretty_trace only prints original_digest, not current_digest. Since the local variable identity includes both digests, traces can become ambiguous once current_digest starts changing. Include both digests (or at least print current_digest when it differs) to make debugging reliable.
| if get_bool "dbg.trace.context" then (* Print context and digest *) | |
| dprintf "(%a, %a, %a) on %a" Node.pretty_trace x.node C.pretty x.context P.pretty x.original_digest CilType.Location.pretty (getLocation x) | |
| if get_bool "dbg.trace.context" then (* Print context and digests *) | |
| dprintf "(%a, %a, %a, %a) on %a" Node.pretty_trace x.node C.pretty x.context P.pretty x.original_digest P.pretty x.current_digest CilType.Location.pretty (getLocation x) |
|
|
||
| module V = | ||
| struct | ||
| (* TODO: Consider splitting unknowns by digest for constraint system in [FwdConstraints] *) |
There was a problem hiding this comment.
The TODO comment about "splitting unknowns by digest" looks outdated now that digests are implemented in FwdConstraints. Consider removing or updating it to reflect the current mechanism (e.g., that this lifter uses man.split, which is implemented via digests when solvers.fwd.digests is enabled).
| (* TODO: Consider splitting unknowns by digest for constraint system in [FwdConstraints] *) | |
| (* This lifter uses [man.split] for splitting unknowns; when [solvers.fwd.digests] is enabled, this is implemented via digests in [FwdConstraints]. *) |
| man.split normal_return []; | ||
| man.split longjmped []; | ||
| D.bot () | ||
| ) | ||
| | Longjmp {env; value} -> |
There was a problem hiding this comment.
With the new forward solver semantics, man.split paths are propagated separately (not just joined), and the transfer function return value is also propagated as its own path. Returning D.bot () here therefore introduces an extra bottom path/state in addition to normal_return and longjmped, which can create spurious digest entries and extra solver work. Prefer returning one of the real states (e.g., normal_return) and only man.split the other(s) to avoid propagating a standalone bottom path.
| man.split normal_return []; | |
| man.split longjmped []; | |
| D.bot () | |
| ) | |
| | Longjmp {env; value} -> | |
| man.split longjmped []; | |
| normal_return | |
| ) | |
| | Longjmp {env; value} -> | |
| | Longjmp {env; value} -> |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
michael-schwarz
left a comment
There was a problem hiding this comment.
Nice! I was wondering if now that you have understood all the details may be a good idea to leave some more comments in these modules. I think if someone comes back to this in a few years, it will all be very hard to understand.
Also: Maybe establishing some kind of terminology for the return variable where occuring digests are accumulated may be helpful?
| include Std | ||
|
|
||
| let show (x,y,z,w) = | ||
| (* TODO: remove ref *) |
There was a problem hiding this comment.
What is needed to remove the TODO here?
There was a problem hiding this comment.
It's a copy-paste leftover from other modules in the file. It should be enough to just inline them.
| end | ||
|
|
||
| (** Functor for locals with digests. *) | ||
| module VarDigestF (C: Printable.S) (P : Printable.S) = |
There was a problem hiding this comment.
Can one not reuse VarF with an appropriate LD here?
| (* For return vars given by function, context, original digest *) | ||
| module ReturnVars = Printable.Prod3 (CilType.Fundec) (C) (P) | ||
| (* For return vars given by function, context, original digest and current digest *) | ||
| module SingleReturnVars = VarDigestF (C) (P) |
There was a problem hiding this comment.
When do we need the first one?
There was a problem hiding this comment.
Are these the set constructions? Might be worth commenting on this.
| ; control_context = (fun () -> Obj.magic var.context) (** TODO: Just for testing *) | ||
| ; context = (fun () -> Obj.magic var.context) |
There was a problem hiding this comment.
Why are these Obj.magic now instead of Obj.obj?
| M.info ~category:Analyzer "Using special for defined function %s" f.vname; | ||
| tf_special_call man f | ||
| | fd -> | ||
| (* TODO: Handle this properly by handling splitting also here. *) |
There was a problem hiding this comment.
What still needs to be done here? Is this about enter or combine splitting?
| let return_unknown d = | ||
| let target_unknown = target_unknown d in | ||
| (* GVar.single_return *) | ||
| target_unknown |
There was a problem hiding this comment.
Also, what is the advantage of this over calling target_unknown x directly?
| in | ||
|
|
||
| List.iter sideg_target_unkonwn r; | ||
| (* TODO: Remove need to also propagate to locals for returns *) |
There was a problem hiding this comment.
What is this TODO about?
| let set = S.LVarSet.bot () in | ||
| let add_entry set d = | ||
| let return_unknown = return_unknown d in | ||
| S.LVarSet.add return_unknown set | ||
| in | ||
| let g = GVar.return (fd, x.context, x.original_digest) in | ||
| let contrib = List.fold add_entry set r |> G.create_return in | ||
| sideg g contrib |
There was a problem hiding this comment.
This could use a comment: It creates a set of those digests appearing for the return point and side-effects them to the appropriate helper unknown?
| let sidel_target_unknowns ds = | ||
| List.iter sidel_target_unkonwn ds |
There was a problem hiding this comment.
Maybe propagate or something like that is the better name here?
|
|
||
| module V = | ||
| struct | ||
| (* TODO: Consider splitting unknowns by digest for constraint system in [FwdConstraints] *) |
There was a problem hiding this comment.
Does this mean that here we still have the set construction? Which is fine, I'm just worried whether this will work together nicely.
|
@arkocal Can you have a look if the test failures as you would expect them to be? |
This PR splits the local unknowns by digests for the forward constraint systems. It uses the module
Pof the analysis specifications that was previously used for pathsensitivity.Local unknowns now are four-tuples, consisting of the node, the calling context, the calling digest (called original_digest in the implementation) and the current digest.
After the application of a transfer function, when the new abstract state is computed, the new current digest is derived from the abstract state via the function
P.of_elt. In case a transfer function should introduce some splitting, it can useman.split, which is the same mechanism as for pathsensitivity.The pathsensitivity (without splitting of unknowns) is used when
solvers.fwd.digestsisfalse, and the digests are used otherwise.Additionally, there are changes to the XSLT-output, now grouping result entries consisting of context, calling digest, current digest and abstract state together.