-
Notifications
You must be signed in to change notification settings - Fork 476
Experiment: Reactive analysis with skip-lite CMT cache #8092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
cristianoc
wants to merge
34
commits into
master
Choose a base branch
from
reactive-analysis-experiment
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Collaborator
Author
|
On real project kindly provided: |
Vendor skip-lite library and integrate reactive analysis capabilities: - Vendor skip-lite marshal_cache and reactive_file_collection modules - Modify C++ code to handle ReScript CMT file format (CMI+CMT headers) - Add CmtCache module for mmap-based CMT file reading - Add ReactiveAnalysis module for incremental file processing - Add CLI flags: -cmt-cache, -reactive, -runs - Add README.md with usage and benchmark instructions Benchmark results (~5000 files): - Standard: CMT processing 0.78s, Total 1.01s - Reactive (warm): CMT processing 0.01s, Total 0.20s - Speedup: 74x for CMT processing, 5x total The reactive mode caches processed file_data and uses read_cmt_if_changed to skip unchanged files entirely on subsequent runs.
- Change --create-sourcedirs to default to true (always create .sourcedirs.json) - Hide the flag from help since it's now always enabled - Add deprecation warning when flag is explicitly used - Fix package name mismatches in test projects: - deadcode rescript.json: sample-typescript-app -> @tests/reanalyze-deadcode - rescript-react package.json: @tests/rescript-react -> @rescript/react
Simplify the reactive_file_collection implementation by making 'v t be the concrete record type used at runtime, removing the unused phantom fields and internal wrapper type. This eliminates warning 69 about unused record fields and relies directly on a single record with its process function stored as an Obj.t-based callback.
1b2651a to
9265b64
Compare
rescript
@rescript/darwin-arm64
@rescript/darwin-x64
@rescript/linux-arm64
@rescript/linux-x64
@rescript/runtime
@rescript/win32-x64
commit: |
- CmtCache: rewritten using Unix.stat for file change detection (mtime, size, inode) instead of C++ mmap cache - ReactiveFileCollection: new pure OCaml module for reactive file collections with delta-based updates - ReactiveAnalysis: refactored to use ReactiveFileCollection, collection passed as parameter (no global mutable state) - Timing: only show parallel merge timing when applicable - Deleted skip-lite vendor directory (C++ code no longer needed) This eliminates the Linux/musl C++ compilation issue while maintaining the same incremental analysis performance: - Cold run: ~1.0s - Warm run: ~0.01s (90x faster, skips unchanged files)
de52abf to
210862e
Compare
- Create new analysis/reactive library with: - Reactive.ml: Core combinators (delta type, flatMap with merge) - ReactiveFileCollection.ml: Generic file collection with change detection - Comprehensive tests including multi-stage composition - Remove CmtCache module (logic absorbed into ReactiveFileCollection) - ReactiveAnalysis now uses generic ReactiveFileCollection with: - read_file: Cmt_format.read_cmt - process: CMT analysis function - Test composition: files -> word_counts (with merge) -> frequent_words Demonstrates delta propagation across multiple flatMap stages
- Add ReactiveMerge module for reactive merge of per-file DCE data - Add extraction functions (builder_to_list, create_*) to data modules - Expose types needed for reactive merge (CrossFileItems.t fields, etc.) - ReactiveAnalysis: add iter_file_data, collect_exception_results - runAnalysis: use ReactiveMerge for decls/annotations/cross_file when reactive mode enabled Note: refs and file_deps still use O(n) iteration because they need post-processing (type-label deps, exception refs). Next step is to make these reactive via indexed lookups.
- Add Reactive.lookup: single-key subscription from a collection - Add Reactive.join: reactive hash join between two collections - Add ReactiveTypeDeps: type-label dependencies via reactive join - Add ReactiveExceptionRefs: exception ref resolution via reactive join - Update ARCHITECTURE.md with generated SVG diagrams - Add diagram sources (.mmd) for batch pipeline, reactive pipeline, delta propagation The reactive modules express cross-file dependency resolution declaratively: - ReactiveTypeDeps uses flatMap to index decls by path, then join to connect impl<->intf - ReactiveExceptionRefs uses join to resolve exception paths to declaration locations
Bug fix: - ReactiveFileCollection.process now receives (path, raw) instead of just (raw) - ReactiveAnalysis passes cmtFilePath to DceFileProcessing.process_cmt_file - This fixes @genType annotations being incorrectly collected when .cmti exists Previously, reactive mode passed cmtFilePath:"" which made the .cmti existence check always return false, causing @genType annotations to be collected even for files with interface files (where they should be ignored). Architecture updates: - Updated reactive-pipeline.mmd with accurate node names - Added legend table explaining all diagram symbols - Diagram now shows: VR/TR (per-file refs), all ReactiveTypeDeps fields, ReactiveExceptionRefs flow, and combined output Helper functions added for debugging (kept as useful): - Declarations.length, FileAnnotations.length/iter - References.value_refs_length/type_refs_length - FileDeps.files_count/deps_count
Introduce five new store modules that wrap reactive collections directly, eliminating O(N) freeze/copy operations in the merge phase: - AnnotationStore: wraps FileAnnotations reactive collection - DeclarationStore: wraps Declarations reactive collection - ReferenceStore: combines 7 reactive sources (value_refs, type_refs, type_deps.*, exception_refs) without copying - FileDepsStore: wraps file deps reactive collections - CrossFileItemsStore: iterates reactive collection directly without intermediate allocation Performance improvement on 4900-file benchmark: - Merge time: 37ms → 0.002ms (eliminated) - Warm run total: 165ms → 129ms (22% faster) The stores provide a unified interface that works with both frozen (non-reactive) and reactive data, dispatching to the appropriate implementation at runtime. Signed-off-by: Cristiano Calcagno <[email protected]>
This is a significant simplification of the dead code analysis solver: **Removed:** - Backward recursive solver (resolveRecursiveRefs, solveDeadBackward) - FileDepsStore module (was only needed for file ordering in backward solver) - Unused find_value_targets/find_type_targets functions **Added:** - Liveness module: forward fixpoint liveness computation - ReactiveDeclRefs: maps declarations to their outgoing references - ReactiveLiveness: reactive liveness via Reactive.fixpoint combinator - Reactive.union: combine two collections with optional merge - Reactive.fixpoint: transitive closure combinator (init + edges → reachable) **Changed:** - References now stores both directions: refs_to (for reporting) and refs_from (for liveness) - ReactiveMerge exposes value_refs_from and type_refs_from - ReactiveTypeDeps produces all_type_refs_from - ReactiveExceptionRefs produces refs_from direction - Updated ARCHITECTURE.md and reactive pipeline diagram The forward algorithm is simpler and more suitable for reactive updates: 1. Identify roots (@live/@genType annotations or externally referenced) 2. Build index mapping declarations to outgoing references 3. Propagate liveness from roots through references 4. Return set of all live positions Both reactive and non-reactive modes correctly report 380 issues on test suite. Signed-Off-By: Cristiano Calcagno <[email protected]>
Removes refs_to direction from core data structures. Now only refs_from is stored, which is what the forward liveness algorithm needs. **Removed from storage:** - References.builder/t: removed value_refs_to, type_refs_to hashtables - ReactiveMerge.t: removed value_refs, type_refs fields - ReferenceStore: removed find_value_refs, find_type_refs functions **Added lazy computation:** - DeadCommon.RefsToLazy: computes refs_to by inverting refs_from - Only computed when debug OR transitive mode is enabled - Zero cost in the common case (no storage, no computation) Cost analysis: - Common case (no debug, no transitive): 0 storage, 0 computation - Debug or transitive enabled: O(refs) one-time cost at reporting This simplifies the reactive architecture - it now only deals with refs_from direction, matching the forward liveness algorithm. Signed-Off-By: Cristiano Calcagno <[email protected]>
- Add rank-based incremental fixpoint algorithm to Reactive module - BFS expansion when init/edges grow - Well-founded cascade contraction when init/edges shrink - Inverse index for efficient edge removal handling - Add comprehensive fixpoint tests (14 new test cases) - Cycles, diamond graphs, alternative support - Adding/removing base elements and edges - Delta emission verification - Use Lazy.t for ReactiveLiveness creation - Created on first force (after files processed) - Cached and reused across runs - Fix external refs detection in ReactiveLiveness - Use flatMap with synchronous decls.get lookup - Correctly identifies refs from non-declaration positions - Fix DceCommand.ml missing reactive_liveness parameter Performance: Run 1 ~11ms, Run 2+ ~1ms (incremental fixpoint) Signed-Off-By: Cristiano Calcagno <[email protected]>
- Change external_value_refs and external_type_refs from flatMap to join in ReactiveLiveness.ml, making the dependency on decls explicit - Update ReactiveLiveness to return a record with live, edges, roots fields for better debugging and introspection - Add trace_edges debug flag in Reactive.Fixpoint for future debugging - Document order-dependence issue in BUG-reactive-order-dependence.md The reactive pipeline has a delta ordering issue where refs arriving before decls causes spurious external refs that aren't properly cleaned up. Using Lazy.t as workaround until properly fixed.
- Remove RefsToLazy module (no longer builds inverse refs index) - Add shared make_hasRefBelow function used by both reactive and non-reactive modes - Non-reactive mode: iterates through References.t directly - Reactive mode: iterates through Reactive.t directly (no freezing needed) - Remove unused posHashFindSet and posHashAddSet helpers - Remove deadcode-minimal test directory (not useful) This eliminates the need for frozen refs in reactive mode when transitive=true, and shares the same O(total_refs) per-dead-decl algorithm between both modes.
- Clarify current status: collect_issues iterates all dead_decls + live_decls - Document TODO items: isInsideReportedValue, hasRefBelow, module marking - Document missing issues: 18 optional args (6 Redundant + 12 Unused) - Document correct issues: 362 dead code + 1 incorrect @dead
- Add iter_live_decls to ReactiveSolver for iterating live declarations - Compute optional_args_state using reactive cross_file_items - Use Decl.isLive for liveness check (matches non-reactive behavior) - All 380 issues now match between reactive and non-reactive modes Signed-Off-By: Cristiano Calcagno <[email protected]>
Signed-Off-By: Cristiano Calcagno <[email protected]>
- Add reactive dead_modules collection (anti-join of modules_with_dead - modules_with_live) - Remove DeadModules.markDead/markLive calls in collect_issues - Add ?checkModuleDead callback parameter to DeadCommon.reportDeclaration - Use reactive dead_modules lookup instead of mutable DeadModules table Timing improvement on cache hits: - iter_dead: 9.94ms → 5.73ms (-4.2ms) - iter_live: 9.32ms → 5.69ms (-3.6ms) All 380 issues match between reactive and non-reactive modes.
- Add is_pos_live function to ReactiveSolver that uses reactive live collection - Store decls and live in ReactiveSolver.t for direct reactive lookup - Update Reanalyze.ml to use is_pos_live instead of Decl.isLive - Remove resolvedDead mutations from collect_issues Optional args now use reactive liveness check instead of mutable field. All 380 issues still match between reactive and non-reactive modes.
isInsideReportedValue only checks within same file, so files are independent. Now group dead declarations by file and process each file with: - Sort within file only (not global sort) - Fresh ReportingContext per file Timing improvement: - group: 4-5ms (was sort: 7-8ms for global sort) - report: 10-11ms (files processed independently) All 380/19000 issues still match between reactive and non-reactive modes.
- Add reactive dead_decls_by_file collection (flatMap with merge) - Add shouldReport callback to reportDeclaration (replaces report field mutation) - Issue generation now uses reactive per-file grouping - No more decl.report mutation in reactive path All 380/19000 issues still match between reactive and non-reactive modes.
- issues_by_file is now a reactive flatMap from dead_decls_by_file - Per-file issue generation with shouldReport callback (no mutations) - Module issues generated separately from dead_modules + modules_with_reported_values - Correctly handles flatMap not tracking reads from other collections Timing improvement on benchmark (cache hit): - iter_issues: ~1.5ms (down from ~32ms) - iter_modules: ~5-7ms (new, iterates dead_modules) - Total dead_code: ~12-14ms (down from ~38-40ms) All 380/19000 issues still match between reactive and non-reactive modes.
hasRefBelow only checks refs from the same file as the declaration. Now we group refs by source file (value_refs_from_by_file) and only scan refs from the declaration's file. Complexity per dead decl: O(file_refs) instead of O(total_refs) All 380/19000 issues still match between reactive and non-reactive modes.
Add incorrect_dead_decls = reactive join of live_decls + annotations where annotation = Dead. Now we only iterate the few incorrect decls instead of scanning all ~17,500 live declarations. Timing improvement (benchmark): - incorrect_dead: 0.03ms (was ~5ms when scanning all live_decls) - dead_code total: ~7-10ms (was ~10-11ms) All 380/19000 issues still match between reactive and non-reactive modes.
Add modules_with_reported = flatMap of issues_by_file to collect modules with reported values. Add dead_module_issues = reactive join of dead_modules + modules_with_reported. Now collect_issues only iterates pre-computed reactive collections: - incorrect_dead_decls (few) - issues_by_file (pre-computed per-file issues) - dead_module_issues (pre-computed module issues) Timing improvement (benchmark, cache hit): - iter_modules: 0.06-0.07ms (was ~5-6ms when iterating all dead_modules) - dead_code total: 0.66-1.3ms (was ~7-10ms) All 380/19000 issues still match between reactive and non-reactive modes.
- Update ARCHITECTURE.md with current ReactiveSolver collections - Add table documenting all reactive collections in ReactiveSolver - Update pipeline stages to show full reactive flow - Regenerate reactive-pipeline.svg with detailed solver internals - Show timing breakdown (~0.7ms on cache hit for dead_code solving)
Track updates_received and updates_emitted for each reactive collection. Add stats field to Reactive.t type. Add print_stats to ReactiveSolver to show all collection statistics. Example output (benchmark, cold start): dead_decls: recv=142575 emit=40682 len=18050 live_decls: recv=142575 emit=27674 len=17500 dead_modules: recv=68356 emit=11856 len=1700 dead_decls_by_file: recv=40682 emit=40682 len=2900 issues_by_file: recv=40682 emit=40682 len=2900 incorrect_dead_decls: recv=38474 emit=50 len=50 dead_module_issues: recv=56563 emit=21416 len=1550 On cache hit (second run), counters don't increase - zero new updates.
Print roots, edges, and live (fixpoint) stats in timing mode. Example output (benchmark, cold start): ReactiveLiveness collection stats: roots: recv=156200 emit=156200 len=16203 edges: recv=445363 emit=445363 len=35550 live (fixpoint): recv=601563 emit=107025 len=31903 Key insights: - fixpoint receives 601,563 updates (roots + edges combined) - fixpoint emits 107,025 updates (BFS expansion deltas) - 31,903 positions are live (17,500 decls + type positions) - On cache hit (run 2), counters stay same = zero new work
Add a new Batch case to the delta type that allows sending multiple
updates at once. This enables more efficient processing during bulk
loading (e.g., when processing CMT files).
Key changes:
- Add Batch of ('k * 'v option) list to delta type
- Add set/remove helper functions for batch entries
- Update all combinators (flatMap, lookup, join, union, fixpoint)
to handle Batch deltas
- Process batch entries, deduplicate affected keys, emit as batch
- Add tests for batch processing
Batch processing uses hashtbl internally for deduplication but
passes batches as lists between combinators. This matches the
existing pattern in combinators and provides O(1) dedup.
Example usage:
emit (Batch [
Reactive.set "a" 1;
Reactive.set "b" 2;
Reactive.remove "c";
])
Process all files as a single Batch delta instead of individual deltas. This dramatically reduces the number of updates flowing through reactive combinators during bulk loading. Key changes: - Add process_files_batch to ReactiveFileCollection (emits single Batch) - Add process_file_silent helper (processes without emitting) - Update ReactiveAnalysis.process_files to use batch processing Performance improvement on deadcode-benchmark: - Before: fixpoint recv=601,563 updates - After: fixpoint recv=13 updates (batches) This means downstream combinators process all file data together instead of handling 600K+ individual updates one at a time.
Instead of calling apply_init_delta (which runs BFS) for each entry in a batch, collect all new roots first and run a single BFS expansion. Before: Batch of N roots → N separate BFS operations After: Batch of N roots → 1 BFS with N-element frontier This matches how the fixpoint initialization already works. Note: The main CMT processing overhead (~2.5s) comes from other combinators (flatMaps, joins), not the fixpoint. The fixpoint only receives 13 batch updates vs 600K+ individual deltas before batching.
This reverts commit 42d0efe.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Vendor skip-lite library and integrate reactive analysis capabilities:
Benchmark results (~5000 files):
The reactive mode caches processed file_data and uses read_cmt_if_changed to skip unchanged files entirely on subsequent runs.