Skip to content

Design doc for #2158: Flag-Gated Code Coverage for Bosatsu Test Execution#2159

Open
johnynek wants to merge 3 commits into
mainfrom
agent/design/2158-codecoverage-for-bosatsu
Open

Design doc for #2158: Flag-Gated Code Coverage for Bosatsu Test Execution#2159
johnynek wants to merge 3 commits into
mainfrom
agent/design/2158-codecoverage-for-bosatsu

Conversation

@johnynek

Copy link
Copy Markdown
Owner

Design doc.

Refs #2158

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive design document outlining the implementation of optional, flag-gated code coverage for Bosatsu's test execution. The proposal details how to instrument code during lowering, collect execution hits at runtime, and generate CI-friendly reports in LCOV and Cobertura formats, all while minimizing performance impact and ensuring compatibility with existing workflows.

Highlights

  • Feature Introduction: Introduced a design for flag-gated code coverage functionality for the Bosatsu tool test command, allowing optional collection of execution data.
  • Instrumentation Strategy: Proposed a method to record source-linked probes during the Matchless lowering phase, avoiding invasive changes to the language semantics.
  • Runtime Hit Collection: Designed an extension to the MatchlessToValue evaluator to increment counters for executed probes, ensuring minimal overhead in default operation.
  • Reporting Formats: Planned for the generation of industry-standard LCOV and Cobertura reports, with an internal data model compatible with JaCoCo for future expansion.
  • Performance and Fidelity: Addressed concerns about runtime overhead and source fidelity by making coverage optional and suggesting the use of NoOptimize during coverage runs.
Changelog
  • docs/design/2158-codecoverage-for-bosatsu.md
    • Added a new design document for flag-gated code coverage in Bosatsu test execution.
Activity
  • The pull request currently contains a design document for the proposed feature, serving as an initial blueprint for discussion and implementation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a design document for adding flag-gated code coverage to Bosatsu tests. The design is comprehensive, covering the architecture, implementation plan, testing strategy, and risks. It proposes instrumenting code during the Matchless lowering phase, collecting hit counts in the interpreter, and generating reports in LCOV and Cobertura formats. My main feedback concerns the interaction between probe registration and Matchless.Expr optimizations, which could affect the stability of probe identifiers.

Comment on lines +89 to +95
### 2) Static Probe Registration During Matchless Lowering
Instrument at lowering time, not by changing Bosatsu language semantics:
1. Extend `Matchless.fromLet` with optional coverage recorder callbacks.
2. For each compiled typed-expression root that has usable source region metadata, register a probe id against the produced Matchless expression object identity.
3. For each typed `Match`, additionally register a branch decision group with one alternative per branch body.

This avoids invasive AST-wide payload changes while still mapping execution to source regions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The proposal to register probes against Matchless.Expr object identity is a key part of the design. However, there's a potential issue with Matchless.Expr optimizations.

The Matchless.fromLet function, where instrumentation is proposed, applies several optimizations at the end of its execution (e.g., reuseConstructors, hoistInvariantLoopLets). These optimizations rewrite the Matchless.Expr tree, which will invalidate any object identities registered before they run. This could lead to probes being associated with stale objects and result in incorrect coverage data.

To address this, the design should clarify the ordering. Either:

  1. Probe registration should occur after all Matchless.Expr optimizations have been applied.
  2. These Matchless.Expr-level optimizations should be disabled when coverage collection is enabled, similar to how NoOptimize is used for TypedExpr optimizations.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comment.

One idea could be to aggregrate a RegionSet for Expr values, so when we optimize two different source positions into a single bit of code, we can take that single bit with counting for executing two regions. Although, that's a hand waving argument.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. I adopted the RegionSet direction in Static Probe Registration During Matchless Lowering: coverage metadata is now (sourceHash, RegionSet), and rewrites/merges explicitly union RegionSets so one optimized expression can count for multiple source regions.

@codecov

codecov Bot commented Mar 13, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.39%. Comparing base (0f3bea7) to head (521d33e).

⚠️ Current head 521d33e differs from pull request most recent head 6be5139

Please upload reports for the commit 6be5139 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2159      +/-   ##
==========================================
+ Coverage   85.03%   85.39%   +0.35%     
==========================================
  Files         190      184       -6     
  Lines       46891    38710    -8181     
  Branches    11805     9731    -2074     
==========================================
- Hits        39876    33057    -6819     
+ Misses       7015     5653    -1362     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

For each executed probe:
1. Source package name.
2. Source region (`start`, `end` offsets).
3. Probe kind (`line` probe, `branch alternative` probe).

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't this implied by keeping track of the source regions executed? For instance if you execute the branch expr for a match then you have taken that branch. How is it different from the source region?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I updated Coverage Data We Must Retain to make this explicit: executed regions do imply a branch was taken, and the extra decision-group metadata is only to preserve the denominator/untaken alternatives required by branch-report formats.

Comment on lines +89 to +95
### 2) Static Probe Registration During Matchless Lowering
Instrument at lowering time, not by changing Bosatsu language semantics:
1. Extend `Matchless.fromLet` with optional coverage recorder callbacks.
2. For each compiled typed-expression root that has usable source region metadata, register a probe id against the produced Matchless expression object identity.
3. For each typed `Match`, additionally register a branch decision group with one alternative per branch body.

This avoids invasive AST-wide payload changes while still mapping execution to source regions.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comment.

One idea could be to aggregrate a RegionSet for Expr values, so when we optimize two different source positions into a single bit of code, we can take that single bit with counting for executing two regions. Although, that's a hand waving argument.


This avoids invasive AST-wide payload changes while still mapping execution to source regions.

### 3) Runtime Hit Collection In Evaluator

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not slow down the normal MatchlessToValue to do this. It should be purely opt-in. So, when we aren't running coverage, there is no hit on performance.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. The Runtime Hit Collection In Evaluator section now states a strict zero-cost normal path: non-coverage runs keep the existing MatchlessToValue path unchanged, and coverage uses a separate evaluator path only when --coverage is enabled.


### 4) Source Mapping And Filtering
Coverage should only emit files we can map back to source paths:
1. Use `(Path, PackageName)` mapping already returned by compilation (`RuntimeCommandSupport.packMap`) to map package to source file path.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think hashing the source with blake3 and keeping that as the key in Matchless source information could be a good idea. So HashValue[Algo.Blake3] and the RegionSet should be enough.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integrated this suggestion. The doc now uses HashValue[Algo.Blake3] as the source key and describes hash -> (path, LocationMap) resolution for reporting.

2. Update `RuntimeCommandSupport.packMap` to accept compile options from caller.
3. Add coverage model classes (`CoverageModel`, `CoverageAggregator`) for static probes and runtime counters.
4. Extend `Matchless.fromLet` and `MatchlessFromTypedExpr.compile` to register probes and branch groups during lowering.
5. Extend `MatchlessToValue` with optional expression-entry callback.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, I think we must be clear that we should have a zero cost impact on the normal path.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified again in both architecture and plan/acceptance sections: normal runs must not pay coverage overhead, and coverage logic is isolated to the explicit opt-in path.

@johnynek

Copy link
Copy Markdown
Owner Author

Updated docs/design/2158-codecoverage-for-bosatsu.md to address all review comments (branch-vs-region clarification, RegionSet merge model, Blake3 source keying, and explicit zero-cost non-coverage path). Ran required pre-push check: scripts/test_basic.sh passed (Passed: Total 64, Failed 0, Errors 0).

@johnynek

Copy link
Copy Markdown
Owner Author

Merged origin/main into the PR branch. Post-merge verification passed with scripts/test_basic.sh (Passed: Total 67, Failed 0, Errors 0).

@johnynek

Copy link
Copy Markdown
Owner Author

MergeXO feedback automation is blocked because the agent returned commit_message but no new staged changes or local commits were detected.

Action: request concrete file edits (or explicit git_ops), then reset blocked feedback state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant