-
Notifications
You must be signed in to change notification settings - Fork 11
docs(test-insights): Add Test Insights documentation section #10990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
remyduthu
wants to merge
1
commit into
main
from
devs/remyduthu/docs/test-insights/Ifd13a32440391e4ca28d83b0bc7952f0bb4d8b95
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,74 @@ | ||
| --- | ||
| title: Detection | ||
| description: Identify and prioritize unhealthy tests across your repositories. | ||
| --- | ||
|
|
||
| Even with prevention in place, tests can degrade over time. Detection surfaces | ||
| all unhealthy tests (flaky and broken) across your repositories, so you can | ||
| see the full picture and prioritize what to fix. | ||
|
|
||
| ## How tests are classified | ||
|
|
||
| Mergify classifies tests based on their results across multiple CI runs, | ||
| with recent results weighted more heavily: | ||
|
|
||
| - **Flaky**: The test produces inconsistent results on the same commit. It | ||
| passes on some runs and fails on others, without any code changes. | ||
|
|
||
| - **Broken**: The test fails consistently. Recent runs are weighted more | ||
| heavily, so a test that started failing recently will be classified as | ||
| broken even if it passed in earlier runs. | ||
|
|
||
| Only unhealthy tests (flaky or broken) appear in Detection. Healthy tests | ||
| are not listed. | ||
|
|
||
| ## Understanding confidence | ||
|
|
||
| Confidence indicates how much data is available to assess a test's health. | ||
|
|
||
| - **High confidence**: Enough runs have been collected to make a reliable | ||
| assessment. The health status is unlikely to change significantly. | ||
|
|
||
| - **Low confidence**: Limited data is available. The health status could | ||
| still shift as more runs are collected. Treat low-confidence results as | ||
| preliminary. | ||
|
|
||
| Confidence increases as more CI runs are collected for a given test. | ||
|
|
||
| ## Prioritizing with impact | ||
|
|
||
| The impact metric reflects how many failed executions a test causes. A | ||
| high-impact flaky test wastes more CI time and disrupts more workflows than | ||
| a low-impact one. | ||
|
|
||
| Use impact to decide which tests to fix first: high-impact tests give you | ||
| the most return on investment when fixed. | ||
|
|
||
| ## Practical workflows | ||
|
|
||
| ### Finding your worst tests | ||
|
|
||
| Sort by impact to surface the tests causing the most CI disruption. These | ||
| are the best candidates for immediate attention. | ||
|
|
||
| ### Narrowing scope | ||
|
|
||
| Use filters to focus on specific areas: | ||
|
|
||
| - **Test name**: Search for a specific test or pattern | ||
| - **Job name**: Focus on tests within a particular CI job | ||
| - **Pipeline name**: Narrow to a specific CI pipeline | ||
|
|
||
| ### Checking quarantine status | ||
|
|
||
| Tests that have already been quarantined are indicated in the health status. | ||
| This helps you avoid spending time investigating tests that are already being | ||
| managed through [Mitigation](/test-insights/mitigation). | ||
|
|
||
| ## Setup | ||
|
|
||
| Detection requires test metrics collection through repeated CI runs. See the | ||
| CI setup guides for your platform: | ||
|
|
||
| - [GitHub Actions setup](/ci-insights/setup/github-actions) | ||
| - [Jenkins setup](/ci-insights/setup/jenkins) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| --- | ||
| title: Mitigation | ||
| description: Quarantine problematic tests to unblock CI without losing visibility. | ||
| --- | ||
|
|
||
| When a flaky or broken test blocks CI, teams face a tough choice: fix it | ||
| immediately, delete it, or ignore it. Quarantine offers a better option. The | ||
| test keeps running, but its failures no longer block merges. You maintain full | ||
| visibility without disruption. | ||
|
|
||
| ## How quarantine works | ||
|
|
||
| A quarantined test still executes in your CI pipeline and its results are | ||
| still collected by Mergify. The difference is that failures are ignored for | ||
| merge decisions. | ||
|
|
||
| This means: | ||
|
|
||
| - Your CI stays green while you work on a fix | ||
|
|
||
| - Historical data is preserved, so you can track whether the test improves | ||
| or worsens over time | ||
|
|
||
| - Other team members can see the test is quarantined and why | ||
|
|
||
| Quarantine works on any branch, not just the default branch. | ||
|
|
||
| :::note | ||
| Quarantined tests must still be uploaded through one of the supported CI | ||
| integrations. See the | ||
| [test framework configuration](/ci-insights#test-framework-configuration) | ||
| for setup details. | ||
| ::: | ||
|
|
||
| ## Manual quarantine | ||
|
|
||
| You can manually add or remove specific tests from quarantine through the | ||
| Mergify dashboard. This is useful when you've identified a problematic test | ||
| through [Detection](/test-insights/detection) and want to stop it from | ||
| blocking your team while you investigate. | ||
|
|
||
| For technical details on how quarantine integrates with your CI pipeline, | ||
| see the [Quarantine documentation](/ci-insights/quarantine). | ||
|
|
||
| ## Auto-quarantine | ||
|
|
||
| Auto-quarantine lets Mergify automatically quarantine broken tests. When | ||
| enabled, tests that are classified as broken are quarantined without manual | ||
| intervention. | ||
|
|
||
| This is useful for teams that want hands-off management of consistently | ||
| failing tests. You can enable or disable auto-quarantine per repository | ||
| from the Mitigation page in the dashboard. | ||
|
|
||
| ## Practical workflows | ||
|
|
||
| ### Quarantining a test from Detection | ||
|
|
||
| When you identify a high-impact flaky or broken test in | ||
| [Detection](/test-insights/detection), you can quarantine it directly to | ||
| stop it from blocking merges while you work on a fix. | ||
|
|
||
| ### Reviewing quarantined tests | ||
|
|
||
| Periodically check the Mitigation page to review quarantined tests. Look | ||
| for tests whose health status has improved; these may be ready to be | ||
| removed from quarantine. | ||
|
|
||
| ### Enabling auto-quarantine | ||
|
|
||
| For repositories where broken tests frequently block CI, enable | ||
| auto-quarantine to let Mergify handle it automatically. This reduces manual | ||
| overhead and keeps your CI pipeline moving. | ||
|
|
||
| ## Setup | ||
|
|
||
| Mitigation uses the same CI integration as Detection. To ensure quarantine | ||
| works correctly, your CI must be configured to check quarantine status. See | ||
| the [Quarantine documentation](/ci-insights/quarantine) for technical setup | ||
| details. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| --- | ||
| title: Prevention | ||
| description: Catch flaky and broken tests on pull requests before they reach your codebase. | ||
| --- | ||
|
|
||
| Prevention monitors tests introduced or modified in pull requests. By | ||
| rerunning tests on PRs, it detects flaky behavior before code merges, keeping | ||
| your codebase reliable. | ||
|
|
||
| ## How it works | ||
|
|
||
| When a pull request runs tests, Mergify reruns them to check for consistency. | ||
| Tests that produce different results on the same commit are flagged as flaky. | ||
| This happens transparently as part of your existing CI pipeline, with no changes | ||
| to your test code needed. | ||
|
|
||
| Tests caught as flaky on a PR are prevented from silently degrading your | ||
| test suite. You can review their health status before deciding to merge. | ||
|
|
||
| ## What you can track | ||
|
|
||
| Prevention provides key metrics to help you understand test reliability | ||
| on pull requests: | ||
|
|
||
| ### Caught flaky tests | ||
|
|
||
| The number of flaky tests detected during PR reruns. This is the core value | ||
| of Prevention: every caught test is a reliability problem that didn't make it | ||
| into your codebase. | ||
|
|
||
| ### New tests | ||
|
|
||
| Tests being introduced on PRs, along with their health status. Each new test | ||
| is classified as healthy, flaky, or broken based on its rerun results. This | ||
| helps you spot unreliable tests before they're merged. | ||
|
|
||
| ### CI budget spent | ||
|
|
||
| The total CI time spent on reruns. This metric helps teams understand the | ||
| cost of flaky test prevention and make informed trade-offs between | ||
| thoroughness and CI budget. | ||
|
|
||
| ## Practical workflows | ||
|
|
||
| ### Reviewing tests before merging | ||
|
|
||
| When a PR introduces or modifies tests, check the Prevention page to see | ||
| their health status. Tests with a flaky or broken status should be | ||
| investigated before merging. | ||
|
|
||
| ### Filtering by pull request state | ||
|
|
||
| Use the pull request state filter to focus on specific PRs: | ||
|
|
||
| - **Open**: Tests on PRs still in review | ||
| - **Merged**: Tests on PRs that have already been merged | ||
| - **Closed**: Tests on PRs that were closed without merging | ||
|
|
||
| ### Understanding confidence on new tests | ||
|
|
||
| New tests have limited run data, so their confidence level may be low. A low | ||
| confidence means the health status could change as more data is collected. | ||
| Consider waiting for more runs before drawing conclusions about a test's | ||
| reliability. | ||
|
|
||
| ## Setup | ||
|
|
||
| Prevention requires test framework plugins that instrument test runs to track | ||
| flakiness on pull requests. | ||
|
|
||
| See the [test framework configuration](/ci-insights#test-framework-configuration) | ||
| for setup instructions specific to your framework (pytest-mergify, | ||
| rspec-mergify, etc.). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| --- | ||
| title: Test Insights | ||
| description: Monitor, detect, and manage unreliable tests across your repositories. | ||
| --- | ||
|
|
||
| Test Insights helps you manage test reliability across the full lifecycle. | ||
| It catches flaky tests on pull requests before they merge, surfaces unhealthy | ||
| tests across your repositories, and lets you quarantine problematic tests so | ||
| they don't block your CI pipeline. | ||
|
|
||
| ## How it works | ||
|
|
||
| Test Insights is organized into three phases that follow the natural lifecycle | ||
| of a test reliability problem: | ||
|
|
||
| 1. **[Prevention](/test-insights/prevention)**: Catch flaky and broken tests | ||
| on pull requests before they reach your codebase. Mergify reruns tests on | ||
| PRs to detect inconsistent behavior early. | ||
|
|
||
| 2. **[Detection](/test-insights/detection)**: Identify and prioritize | ||
| unhealthy tests across your repositories. See which tests are flaky or | ||
| broken, and focus on the ones with the most impact. | ||
|
|
||
| 3. **[Mitigation](/test-insights/mitigation)**: Quarantine problematic tests | ||
| to unblock CI without removing them. Tests keep running, but their failures | ||
| no longer block merges. | ||
|
|
||
| ## Key concepts | ||
|
|
||
| - **Flaky test**: A test that produces different results on the same commit. | ||
| For example, passing on one run and failing on the next with identical code. | ||
|
|
||
| - **Broken test**: A test that fails consistently, with recent runs weighted | ||
| more heavily. | ||
|
|
||
| - **Health status**: A test's reliability classification: healthy, flaky, or | ||
| broken. Based on results from multiple CI runs. | ||
|
|
||
| - **Confidence**: How much data is available to assess a test's health. Low | ||
| confidence means the status could still change significantly as more runs | ||
| are collected. | ||
|
|
||
| - **Quarantine**: Isolating a test so its failures are ignored for merge | ||
| decisions. The test still runs and results are still collected, preserving | ||
| full visibility. | ||
|
|
||
| ## Setup | ||
|
|
||
| Test Insights is powered by the same CI integration as | ||
| [CI Insights](/ci-insights). To get started, configure your CI system and test | ||
| framework: | ||
|
|
||
| - [GitHub Actions setup](/ci-insights/setup/github-actions) | ||
| - [Jenkins setup](/ci-insights/setup/jenkins) | ||
| - [Test framework configuration](/ci-insights#test-framework-configuration) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.