-
Notifications
You must be signed in to change notification settings - Fork 13
docs(flaky-tests): document AI Investigation tab #539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
samgutentag
wants to merge
1
commit into
main
Choose a base branch
from
sam-gutentag/flaky-tests-ai-investigation-tab
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,93 @@ | ||||||
| --- | ||||||
| description: >- | ||||||
| Use the Analysis tab on a test detail page to view AI-generated investigation | ||||||
| results, trigger new investigations, and get fix recommendations. | ||||||
| --- | ||||||
|
|
||||||
| # AI Investigation | ||||||
|
|
||||||
| Trunk Flaky Tests can automatically investigate a flaky test and surface root cause findings directly in the Trunk web app. The **Analysis** tab on a test detail page shows the latest investigation results, lets you trigger a new investigation, apply a fix, or browse past investigations. | ||||||
|
|
||||||
| {% hint style="info" %} | ||||||
| The Analysis tab requires a GitHub app installation for the repository. If your repo does not have the Trunk GitHub app installed, the Analyze button will be disabled. | ||||||
| {% endhint %} | ||||||
|
|
||||||
| ## Open the Analysis tab | ||||||
|
|
||||||
| 1. Navigate to **Flaky Tests** in the Trunk web app. | ||||||
| 2. Click a test case to open the test detail page. | ||||||
| 3. Select the **Analysis** tab. | ||||||
|
|
||||||
| If no investigation has run for this test yet, the tab shows an empty state with an **Analyze** button. | ||||||
|
|
||||||
| ## Understanding investigation results | ||||||
|
|
||||||
| When an investigation is available, the Analysis tab shows: | ||||||
|
|
||||||
| ### Latest Analysis header | ||||||
|
|
||||||
| At the top, you will see: | ||||||
|
|
||||||
| - The **overall confidence score** (color-coded green for 80%+, yellow for 50%+, orange below 50%) | ||||||
| - A relative timestamp for when the investigation ran | ||||||
| - An **Analyze** button to trigger a new investigation | ||||||
| - A **History** button to view past investigations | ||||||
| - An **Apply Fix** button if the investigation produced actionable findings | ||||||
|
|
||||||
| ### Key Findings | ||||||
|
|
||||||
| The Key Findings section lists the top three findings ordered by impact. Each finding shows: | ||||||
|
|
||||||
| | Field | Description | | ||||||
| |---|---| | ||||||
| | Fact type badge | The analysis source that produced the finding | | ||||||
| | Confidence percentage | How confident the AI is in this specific finding | | ||||||
| | Finding content | A summary with links to relevant code, CI logs, or commits | | ||||||
|
|
||||||
| A collapsible **Other Findings** section holds any additional findings beyond the top three. | ||||||
|
|
||||||
| ### Fact types | ||||||
|
|
||||||
| Each finding is labeled with the analysis source used to produce it: | ||||||
|
|
||||||
| | Fact type | What it analyzes | | ||||||
| |---|---| | ||||||
| | **CI Logs** | Supplements test failure outputs with CI workflow logs | | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The phrasing "Supplements test failure outputs with CI workflow logs" reads as though the fact type is augmenting the outputs rather than analyzing the logs themselves. Suggest clarifying the direction:
Suggested change
|
||||||
| | **Git Blame** | Recent code changes that may have introduced flakiness | | ||||||
| | **Failure Mode** | Patterns in error outputs and failure metadata | | ||||||
| | **Test Purpose** | What the test covers and how it has drifted from its intent | | ||||||
| | **Environment** | Environmental factors such as resource constraints or timing | | ||||||
| | **Co-failure** | Other tests that fail at the same time, pointing to shared causes | | ||||||
| | **File Co-change** | Related files that have changed alongside the failing test | | ||||||
|
|
||||||
| ## Trigger a new investigation | ||||||
|
|
||||||
| Click **Analyze** to open the Trigger Analysis modal. Click **Run Analysis** to kick off a fresh investigation. Trunk will analyze the test using the latest CI logs, git history, and failure data. | ||||||
|
|
||||||
| Investigations run automatically when a test first becomes flaky (if the GitHub app is installed). You can trigger additional investigations manually at any time. | ||||||
|
|
||||||
| ## Apply a fix | ||||||
|
|
||||||
| Click **Apply Fix** to open the Apply Fix modal. Trunk surfaces the fix options available for the current investigation: | ||||||
|
|
||||||
| - **Copy Prompt** — copies a prompt you can paste into an AI coding assistant to guide it toward the fix | ||||||
| - **Fix with MCP** — connects directly to the Trunk MCP server to apply the fix (requires the MCP server to be configured) | ||||||
| - **Automate with Webhooks** — links to the webhooks configuration to automate fix workflows | ||||||
|
|
||||||
| See [Use MCP Server](use-mcp-server/README.md) for information on setting up the MCP integration. | ||||||
|
|
||||||
| ## View investigation history | ||||||
|
|
||||||
| Click **History** to open the Analysis History modal. This shows past investigations for the test case, each with: | ||||||
|
|
||||||
| - A confidence badge | ||||||
| - A findings count | ||||||
| - A timestamp | ||||||
|
|
||||||
| Click any past investigation to expand its details. | ||||||
|
|
||||||
| ## Related | ||||||
|
|
||||||
| - [Get root cause analysis (MCP)](use-mcp-server/mcp-tool-reference/get-root-cause-analysis.md) | ||||||
| - [Webhooks](webhooks/README.md) | ||||||
| - [Managing detected flaky tests](managing-detected-flaky-tests.md) | ||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The confidence score ranges overlap: "yellow for 50%+" technically includes the 80%+ range already described as green. The three bands should be mutually exclusive.