Skip to content

Conversation

@cristianoc
Copy link
Collaborator

@cristianoc cristianoc commented Dec 15, 2025

Vendor skip-lite library and integrate reactive analysis capabilities:

  • Vendor skip-lite marshal_cache and reactive_file_collection modules
  • Add ReactiveAnalysis module for incremental file processing
  • Add CLI flags: -cmt-cache, -reactive, -runs
  • Add README.md with usage and benchmark instructions

Benchmark results (~5000 files):

  • Standard: CMT processing 0.78s, Total 1.01s
  • Reactive (warm): CMT processing 0.01s, Total 0.20s
  • Speedup: 74x for CMT processing, 5x total

The reactive mode caches processed file_data and uses read_cmt_if_changed to skip unchanged files entirely on subsequent runs.

@cristianoc
Copy link
Collaborator Author

On real project kindly provided:

=== Run 1/3 ===
Reactive: 1888 files processed, 0 from cache

=== Timing ===
  CMT processing:     2.795s (90.3%)
    - Wall clock:        2.795s
    - Result collection: 0.000ms (aggregate)
  Analysis:           0.299s (9.7%)
    - Merging:          147.725ms
    - Solving:          151.455ms
  Reporting:          0.000s (0.0%)
  Total:              3.094s

=== Run 2/3 ===
Reactive: 0 files processed, 1888 from cache

=== Timing ===
  CMT processing:     0.009s (3.7%)
    - Wall clock:        0.009s
    - Result collection: 0.000ms (aggregate)
  Analysis:           0.231s (96.3%)
    - Merging:          126.980ms
    - Solving:          103.579ms
  Reporting:          0.000s (0.0%)
  Total:              0.239s

=== Run 3/3 ===
Reactive: 0 files processed, 1888 from cache

=== Timing ===
  CMT processing:     0.008s (3.3%)
    - Wall clock:        0.008s
    - Result collection: 0.000ms (aggregate)
  Analysis:           0.248s (96.7%)
    - Merging:          139.188ms
    - Solving:          108.941ms
  Reporting:          0.000s (0.0%)
  Total:              0.257s

Base automatically changed from reanalyze-reduce-global-state-ast to master December 15, 2025 14:52
Vendor skip-lite library and integrate reactive analysis capabilities:

- Vendor skip-lite marshal_cache and reactive_file_collection modules
- Modify C++ code to handle ReScript CMT file format (CMI+CMT headers)
- Add CmtCache module for mmap-based CMT file reading
- Add ReactiveAnalysis module for incremental file processing
- Add CLI flags: -cmt-cache, -reactive, -runs
- Add README.md with usage and benchmark instructions

Benchmark results (~5000 files):
- Standard: CMT processing 0.78s, Total 1.01s
- Reactive (warm): CMT processing 0.01s, Total 0.20s
- Speedup: 74x for CMT processing, 5x total

The reactive mode caches processed file_data and uses read_cmt_if_changed
to skip unchanged files entirely on subsequent runs.
- Change --create-sourcedirs to default to true (always create .sourcedirs.json)
- Hide the flag from help since it's now always enabled
- Add deprecation warning when flag is explicitly used
- Fix package name mismatches in test projects:
  - deadcode rescript.json: sample-typescript-app -> @tests/reanalyze-deadcode
  - rescript-react package.json: @tests/rescript-react -> @rescript/react
Simplify the reactive_file_collection implementation by making 'v t
be the concrete record type used at runtime, removing the unused
phantom fields and internal wrapper type. This eliminates warning 69
about unused record fields and relies directly on a single record with
its process function stored as an Obj.t-based callback.
@cristianoc cristianoc force-pushed the reactive-analysis-experiment branch from 1b2651a to 9265b64 Compare December 15, 2025 15:00
@pkg-pr-new
Copy link

pkg-pr-new bot commented Dec 16, 2025

Open in StackBlitz

rescript

npm i https://pkg.pr.new/rescript-lang/rescript@8092

@rescript/darwin-arm64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/darwin-arm64@8092

@rescript/darwin-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/darwin-x64@8092

@rescript/linux-arm64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/linux-arm64@8092

@rescript/linux-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/linux-x64@8092

@rescript/runtime

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/runtime@8092

@rescript/win32-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/win32-x64@8092

commit: aa1941e

- CmtCache: rewritten using Unix.stat for file change detection
  (mtime, size, inode) instead of C++ mmap cache
- ReactiveFileCollection: new pure OCaml module for reactive file
  collections with delta-based updates
- ReactiveAnalysis: refactored to use ReactiveFileCollection,
  collection passed as parameter (no global mutable state)
- Timing: only show parallel merge timing when applicable
- Deleted skip-lite vendor directory (C++ code no longer needed)

This eliminates the Linux/musl C++ compilation issue while
maintaining the same incremental analysis performance:
- Cold run: ~1.0s
- Warm run: ~0.01s (90x faster, skips unchanged files)
@cristianoc cristianoc force-pushed the reactive-analysis-experiment branch from de52abf to 210862e Compare December 16, 2025 03:43
- Create new analysis/reactive library with:
  - Reactive.ml: Core combinators (delta type, flatMap with merge)
  - ReactiveFileCollection.ml: Generic file collection with change detection
  - Comprehensive tests including multi-stage composition

- Remove CmtCache module (logic absorbed into ReactiveFileCollection)

- ReactiveAnalysis now uses generic ReactiveFileCollection with:
  - read_file: Cmt_format.read_cmt
  - process: CMT analysis function

- Test composition: files -> word_counts (with merge) -> frequent_words
  Demonstrates delta propagation across multiple flatMap stages
- Add ReactiveMerge module for reactive merge of per-file DCE data
- Add extraction functions (builder_to_list, create_*) to data modules
- Expose types needed for reactive merge (CrossFileItems.t fields, etc.)
- ReactiveAnalysis: add iter_file_data, collect_exception_results
- runAnalysis: use ReactiveMerge for decls/annotations/cross_file
  when reactive mode enabled

Note: refs and file_deps still use O(n) iteration because they need
post-processing (type-label deps, exception refs). Next step is to
make these reactive via indexed lookups.
- Add Reactive.lookup: single-key subscription from a collection
- Add Reactive.join: reactive hash join between two collections
- Add ReactiveTypeDeps: type-label dependencies via reactive join
- Add ReactiveExceptionRefs: exception ref resolution via reactive join
- Update ARCHITECTURE.md with generated SVG diagrams
- Add diagram sources (.mmd) for batch pipeline, reactive pipeline, delta propagation

The reactive modules express cross-file dependency resolution declaratively:
- ReactiveTypeDeps uses flatMap to index decls by path, then join to connect impl<->intf
- ReactiveExceptionRefs uses join to resolve exception paths to declaration locations
Bug fix:
- ReactiveFileCollection.process now receives (path, raw) instead of just (raw)
- ReactiveAnalysis passes cmtFilePath to DceFileProcessing.process_cmt_file
- This fixes @genType annotations being incorrectly collected when .cmti exists

Previously, reactive mode passed cmtFilePath:"" which made the .cmti existence
check always return false, causing @genType annotations to be collected even
for files with interface files (where they should be ignored).

Architecture updates:
- Updated reactive-pipeline.mmd with accurate node names
- Added legend table explaining all diagram symbols
- Diagram now shows: VR/TR (per-file refs), all ReactiveTypeDeps fields,
  ReactiveExceptionRefs flow, and combined output

Helper functions added for debugging (kept as useful):
- Declarations.length, FileAnnotations.length/iter
- References.value_refs_length/type_refs_length
- FileDeps.files_count/deps_count
Introduce five new store modules that wrap reactive collections directly,
eliminating O(N) freeze/copy operations in the merge phase:

- AnnotationStore: wraps FileAnnotations reactive collection
- DeclarationStore: wraps Declarations reactive collection
- ReferenceStore: combines 7 reactive sources (value_refs, type_refs,
  type_deps.*, exception_refs) without copying
- FileDepsStore: wraps file deps reactive collections
- CrossFileItemsStore: iterates reactive collection directly without
  intermediate allocation

Performance improvement on 4900-file benchmark:
- Merge time: 37ms → 0.002ms (eliminated)
- Warm run total: 165ms → 129ms (22% faster)

The stores provide a unified interface that works with both frozen
(non-reactive) and reactive data, dispatching to the appropriate
implementation at runtime.

Signed-off-by: Cristiano Calcagno <[email protected]>
This is a significant simplification of the dead code analysis solver:

**Removed:**
- Backward recursive solver (resolveRecursiveRefs, solveDeadBackward)
- FileDepsStore module (was only needed for file ordering in backward solver)
- Unused find_value_targets/find_type_targets functions

**Added:**
- Liveness module: forward fixpoint liveness computation
- ReactiveDeclRefs: maps declarations to their outgoing references
- ReactiveLiveness: reactive liveness via Reactive.fixpoint combinator
- Reactive.union: combine two collections with optional merge
- Reactive.fixpoint: transitive closure combinator (init + edges → reachable)

**Changed:**
- References now stores both directions: refs_to (for reporting) and refs_from (for liveness)
- ReactiveMerge exposes value_refs_from and type_refs_from
- ReactiveTypeDeps produces all_type_refs_from
- ReactiveExceptionRefs produces refs_from direction
- Updated ARCHITECTURE.md and reactive pipeline diagram

The forward algorithm is simpler and more suitable for reactive updates:
1. Identify roots (@live/@genType annotations or externally referenced)
2. Build index mapping declarations to outgoing references
3. Propagate liveness from roots through references
4. Return set of all live positions

Both reactive and non-reactive modes correctly report 380 issues on test suite.

Signed-Off-By: Cristiano Calcagno <[email protected]>
Removes refs_to direction from core data structures. Now only refs_from
is stored, which is what the forward liveness algorithm needs.

**Removed from storage:**
- References.builder/t: removed value_refs_to, type_refs_to hashtables
- ReactiveMerge.t: removed value_refs, type_refs fields
- ReferenceStore: removed find_value_refs, find_type_refs functions

**Added lazy computation:**
- DeadCommon.RefsToLazy: computes refs_to by inverting refs_from
- Only computed when debug OR transitive mode is enabled
- Zero cost in the common case (no storage, no computation)

Cost analysis:
- Common case (no debug, no transitive): 0 storage, 0 computation
- Debug or transitive enabled: O(refs) one-time cost at reporting

This simplifies the reactive architecture - it now only deals with
refs_from direction, matching the forward liveness algorithm.

Signed-Off-By: Cristiano Calcagno <[email protected]>
- Add rank-based incremental fixpoint algorithm to Reactive module
  - BFS expansion when init/edges grow
  - Well-founded cascade contraction when init/edges shrink
  - Inverse index for efficient edge removal handling

- Add comprehensive fixpoint tests (14 new test cases)
  - Cycles, diamond graphs, alternative support
  - Adding/removing base elements and edges
  - Delta emission verification

- Use Lazy.t for ReactiveLiveness creation
  - Created on first force (after files processed)
  - Cached and reused across runs

- Fix external refs detection in ReactiveLiveness
  - Use flatMap with synchronous decls.get lookup
  - Correctly identifies refs from non-declaration positions

- Fix DceCommand.ml missing reactive_liveness parameter

Performance: Run 1 ~11ms, Run 2+ ~1ms (incremental fixpoint)

Signed-Off-By: Cristiano Calcagno <[email protected]>
- Change external_value_refs and external_type_refs from flatMap to join
  in ReactiveLiveness.ml, making the dependency on decls explicit

- Update ReactiveLiveness to return a record with live, edges, roots fields
  for better debugging and introspection

- Add trace_edges debug flag in Reactive.Fixpoint for future debugging

- Document order-dependence issue in BUG-reactive-order-dependence.md
  The reactive pipeline has a delta ordering issue where refs arriving
  before decls causes spurious external refs that aren't properly
  cleaned up. Using Lazy.t as workaround until properly fixed.
- Remove RefsToLazy module (no longer builds inverse refs index)
- Add shared make_hasRefBelow function used by both reactive and non-reactive modes
- Non-reactive mode: iterates through References.t directly
- Reactive mode: iterates through Reactive.t directly (no freezing needed)
- Remove unused posHashFindSet and posHashAddSet helpers
- Remove deadcode-minimal test directory (not useful)

This eliminates the need for frozen refs in reactive mode when transitive=true,
and shares the same O(total_refs) per-dead-decl algorithm between both modes.
- Clarify current status: collect_issues iterates all dead_decls + live_decls
- Document TODO items: isInsideReportedValue, hasRefBelow, module marking
- Document missing issues: 18 optional args (6 Redundant + 12 Unused)
- Document correct issues: 362 dead code + 1 incorrect @dead
- Add iter_live_decls to ReactiveSolver for iterating live declarations
- Compute optional_args_state using reactive cross_file_items
- Use Decl.isLive for liveness check (matches non-reactive behavior)
- All 380 issues now match between reactive and non-reactive modes

Signed-Off-By: Cristiano Calcagno <[email protected]>
Signed-Off-By: Cristiano Calcagno <[email protected]>
- Add reactive dead_modules collection (anti-join of modules_with_dead - modules_with_live)
- Remove DeadModules.markDead/markLive calls in collect_issues
- Add ?checkModuleDead callback parameter to DeadCommon.reportDeclaration
- Use reactive dead_modules lookup instead of mutable DeadModules table

Timing improvement on cache hits:
- iter_dead: 9.94ms → 5.73ms (-4.2ms)
- iter_live: 9.32ms → 5.69ms (-3.6ms)

All 380 issues match between reactive and non-reactive modes.
- Add is_pos_live function to ReactiveSolver that uses reactive live collection
- Store decls and live in ReactiveSolver.t for direct reactive lookup
- Update Reanalyze.ml to use is_pos_live instead of Decl.isLive
- Remove resolvedDead mutations from collect_issues

Optional args now use reactive liveness check instead of mutable field.
All 380 issues still match between reactive and non-reactive modes.
isInsideReportedValue only checks within same file, so files are independent.
Now group dead declarations by file and process each file with:
- Sort within file only (not global sort)
- Fresh ReportingContext per file

Timing improvement:
- group: 4-5ms (was sort: 7-8ms for global sort)
- report: 10-11ms (files processed independently)

All 380/19000 issues still match between reactive and non-reactive modes.
- Add reactive dead_decls_by_file collection (flatMap with merge)
- Add shouldReport callback to reportDeclaration (replaces report field mutation)
- Issue generation now uses reactive per-file grouping
- No more decl.report mutation in reactive path

All 380/19000 issues still match between reactive and non-reactive modes.
- issues_by_file is now a reactive flatMap from dead_decls_by_file
- Per-file issue generation with shouldReport callback (no mutations)
- Module issues generated separately from dead_modules + modules_with_reported_values
- Correctly handles flatMap not tracking reads from other collections

Timing improvement on benchmark (cache hit):
- iter_issues: ~1.5ms (down from ~32ms)
- iter_modules: ~5-7ms (new, iterates dead_modules)
- Total dead_code: ~12-14ms (down from ~38-40ms)

All 380/19000 issues still match between reactive and non-reactive modes.
hasRefBelow only checks refs from the same file as the declaration.
Now we group refs by source file (value_refs_from_by_file) and only
scan refs from the declaration's file.

Complexity per dead decl: O(file_refs) instead of O(total_refs)

All 380/19000 issues still match between reactive and non-reactive modes.
Add incorrect_dead_decls = reactive join of live_decls + annotations
where annotation = Dead. Now we only iterate the few incorrect decls
instead of scanning all ~17,500 live declarations.

Timing improvement (benchmark):
- incorrect_dead: 0.03ms (was ~5ms when scanning all live_decls)
- dead_code total: ~7-10ms (was ~10-11ms)

All 380/19000 issues still match between reactive and non-reactive modes.
Add modules_with_reported = flatMap of issues_by_file to collect modules with reported values.
Add dead_module_issues = reactive join of dead_modules + modules_with_reported.

Now collect_issues only iterates pre-computed reactive collections:
- incorrect_dead_decls (few)
- issues_by_file (pre-computed per-file issues)
- dead_module_issues (pre-computed module issues)

Timing improvement (benchmark, cache hit):
- iter_modules: 0.06-0.07ms (was ~5-6ms when iterating all dead_modules)
- dead_code total: 0.66-1.3ms (was ~7-10ms)

All 380/19000 issues still match between reactive and non-reactive modes.
- Update ARCHITECTURE.md with current ReactiveSolver collections
- Add table documenting all reactive collections in ReactiveSolver
- Update pipeline stages to show full reactive flow
- Regenerate reactive-pipeline.svg with detailed solver internals
- Show timing breakdown (~0.7ms on cache hit for dead_code solving)
Track updates_received and updates_emitted for each reactive collection.
Add stats field to Reactive.t type.
Add print_stats to ReactiveSolver to show all collection statistics.

Example output (benchmark, cold start):
  dead_decls: recv=142575 emit=40682 len=18050
  live_decls: recv=142575 emit=27674 len=17500
  dead_modules: recv=68356 emit=11856 len=1700
  dead_decls_by_file: recv=40682 emit=40682 len=2900
  issues_by_file: recv=40682 emit=40682 len=2900
  incorrect_dead_decls: recv=38474 emit=50 len=50
  dead_module_issues: recv=56563 emit=21416 len=1550

On cache hit (second run), counters don't increase - zero new updates.
Print roots, edges, and live (fixpoint) stats in timing mode.

Example output (benchmark, cold start):
ReactiveLiveness collection stats:
  roots: recv=156200 emit=156200 len=16203
  edges: recv=445363 emit=445363 len=35550
  live (fixpoint): recv=601563 emit=107025 len=31903

Key insights:
- fixpoint receives 601,563 updates (roots + edges combined)
- fixpoint emits 107,025 updates (BFS expansion deltas)
- 31,903 positions are live (17,500 decls + type positions)
- On cache hit (run 2), counters stay same = zero new work
Add a new Batch case to the delta type that allows sending multiple
updates at once. This enables more efficient processing during bulk
loading (e.g., when processing CMT files).

Key changes:
- Add Batch of ('k * 'v option) list to delta type
- Add set/remove helper functions for batch entries
- Update all combinators (flatMap, lookup, join, union, fixpoint)
  to handle Batch deltas
- Process batch entries, deduplicate affected keys, emit as batch
- Add tests for batch processing

Batch processing uses hashtbl internally for deduplication but
passes batches as lists between combinators. This matches the
existing pattern in combinators and provides O(1) dedup.

Example usage:
  emit (Batch [
    Reactive.set "a" 1;
    Reactive.set "b" 2;
    Reactive.remove "c";
  ])
Process all files as a single Batch delta instead of individual deltas.
This dramatically reduces the number of updates flowing through reactive
combinators during bulk loading.

Key changes:
- Add process_files_batch to ReactiveFileCollection (emits single Batch)
- Add process_file_silent helper (processes without emitting)
- Update ReactiveAnalysis.process_files to use batch processing

Performance improvement on deadcode-benchmark:
- Before: fixpoint recv=601,563 updates
- After:  fixpoint recv=13 updates (batches)

This means downstream combinators process all file data together
instead of handling 600K+ individual updates one at a time.
Instead of calling apply_init_delta (which runs BFS) for each entry
in a batch, collect all new roots first and run a single BFS expansion.

Before: Batch of N roots → N separate BFS operations
After:  Batch of N roots → 1 BFS with N-element frontier

This matches how the fixpoint initialization already works.

Note: The main CMT processing overhead (~2.5s) comes from other
combinators (flatMaps, joins), not the fixpoint. The fixpoint only
receives 13 batch updates vs 600K+ individual deltas before batching.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants