Skip to content

ci(pr): classify PR test failures as new vs. known and post sticky comment#1194

Closed
ramakrishnap-nv wants to merge 21 commits into
release/26.06from
ci/pr-test-classification-comment
Closed

ci(pr): classify PR test failures as new vs. known and post sticky comment#1194
ramakrishnap-nv wants to merge 21 commits into
release/26.06from
ci/pr-test-classification-comment

Conversation

@ramakrishnap-nv

@ramakrishnap-nv ramakrishnap-nv commented May 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

This is to handle flaky CI where failures might not be part of the PR and it will categorized as known failures so developers can admin merge PRs in urgent situations.

On every PR run, classify each test failure against the target branch's nightly history and post a single sticky comment that splits failures into:

  • CRASHES (signal-death) — surfaced in a red [!CAUTION] callout with the full diagnostic.
  • NEW — failures this PR likely introduced.
  • KNOWN — recurring on nightly, known flaky on nightly, or flaked in this run.

…mment

Cross-reference each PR test failure against the target branch's nightly
failure history and post a single sticky comment that splits failures
into NEW (introduced by this PR) and KNOWN (recurring on nightly, known
flaky on nightly, or flaked in this run via pytest-rerunfailures).

- Add `--mode pr` to nightly_report.py: read history without writing
  back, annotate each failure with `pr_classification`.
- Branch nightly_report_helper.sh on RAPIDS_BUILD_TYPE so PR runs read
  the target branch's history and write per-matrix summaries to a
  PR-scoped S3 prefix (`ci_test_reports/pr/run-{run_id}/`).
- Extract shared aggregator helpers into aggregate_common.py
  (download/load/aggregate/escape) so the PR aggregator can reuse them
  without behavioural drift on nightly.
- Add aggregate_pr.py to build the Markdown comment body and ci/pr_summary.sh
  to post or update the sticky comment via the GitHub API.
- Wire a new `pr-test-summary` job into pr.yaml: gated on `if: always()`,
  permissions `pull-requests: write`, runs after every PR test job.
  Not added to pr-builder needs — the comment is informational and must
  not block merging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@copy-pr-bot

copy-pr-bot Bot commented May 8, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@ramakrishnap-nv ramakrishnap-nv self-assigned this May 8, 2026
@ramakrishnap-nv ramakrishnap-nv added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels May 8, 2026
ramakrishnap-nv and others added 2 commits May 8, 2026 15:47
Mirror the nightly-summary pattern: keep job-level wiring (needs/if) in
pr.yaml and move the implementation into its own reusable workflow.
Use explicit secret pass-through (matching how test.yaml calls
nightly-summary.yaml) instead of `secrets: inherit`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The shell script had three blocks of inline Python plus a bash-specific
${VAR@Q} quoting trick.  Move all GitHub API interactions into
ci/utils/pr_comment_helper.py with two CLI subcommands:

  - base-ref: resolve the PR's target branch
  - post:     post or update the sticky comment by hidden marker

Stdlib only (urllib + json) so it runs in slim CI containers.  The
hidden marker now lives in pr_comment_helper.COMMENT_MARKER as the
single source of truth, imported by aggregate_pr.py.

ci/pr_summary.sh shrinks from ~120 lines of mixed shell+Python to ~70
lines of straight orchestration.  pr_test_summary.yaml's base-ref
lookup goes from a curl+inline-Python pipe to a one-liner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ramakrishnap-nv

Copy link
Copy Markdown
Collaborator Author

/ok to test c4876d2

ramakrishnap-nv and others added 2 commits May 11, 2026 10:42
rapids-check-pr-job-dependencies (run by the shared checks workflow)
fails any PR job not listed in pr-builder.needs.  pr-test-summary is
informational and intentionally kept out of pr-builder's needs, so add
it to ignored_pr_jobs.

Without this the dependency check fails, the subsequent changed-files
step is skipped, and the next step throws "Error reading JToken from
JsonReader" trying to fromJSON the empty output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
python:3.14-slim ships with dash as /bin/sh and GitHub Actions picks it
for `run:` steps in this container.  Dash doesn't understand `[[ ... ]]`
or `set -o pipefail`, so the inline script failed with
`/__w/_temp/X.sh: 4: [[: not found`.

Add `defaults.run.shell: bash` at the workflow level, matching the
existing convention in self_hosted_service_test.yaml and
build_test_publish_images.yaml.  Bash is already available in the
container — only the shell selection needed forcing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ramakrishnap-nv

Copy link
Copy Markdown
Collaborator Author

/ok to test 5b02581

DO NOT MERGE.  Exists only to confirm that pr-test-summary picks up a
new failure end to end and posts it under "NEW failures" in the sticky
PR comment.  Remove this file before merging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ramakrishnap-nv

Copy link
Copy Markdown
Collaborator Author

/ok to test 49da483

@github-actions

github-actions Bot commented May 11, 2026

Copy link
Copy Markdown

PR Test Classification

1 CRASH(es)

Compared against nightly history for main · PR head: fc35b3688f83 · Run date: 2026-05-14 · Workflow run

Per-matrix status
Test type Matrix Status Passed Failed Flaky Skipped
cpp cuda12.2.2-aarch64 PASS 533 0 0 3
cpp cuda12.2.2-x86_64 PASS 533 0 0 3
cpp cuda12.9.1-x86_64 PASS 533 0 0 3
cpp cuda13.0.2-aarch64 PASS 533 0 0 3
cpp cuda13.0.2-x86_64 PASS 533 0 0 3
cpp cuda13.1.1-aarch64 PASS 533 0 0 3
cpp cuda13.1.1-x86_64 PASS 533 0 0 3
python cuda12.2.2-py3.11-x86_64 PASS 208 0 0 10
python cuda12.2.2-py3.12-x86_64 PASS 208 0 0 10
python cuda12.9.1-py3.14-x86_64 PASS 208 0 0 10
python cuda13.0.2-py3.12-aarch64 PASS 207 0 0 11
python cuda13.0.2-py3.12-x86_64 PASS 208 0 0 10
python cuda13.1.1-py3.13-x86_64 PASS 208 0 0 10
python cuda13.1.1-py3.14-aarch64 PASS 207 0 0 11
python cuda13.1.1-py3.14-x86_64 PASS 208 0 0 10
wheel-python cuda12.2.2-py3.11-aarch64 NEW FAIL 0 1 0 0
wheel-python cuda12.9.1-py3.11-x86_64 PASS 114 0 0 3
wheel-python cuda12.9.1-py3.14-x86_64 PASS 114 0 0 3
wheel-python cuda13.0.2-py3.12-aarch64 PASS 114 0 0 3
wheel-python cuda13.0.2-py3.12-x86_64 PASS 114 0 0 3
wheel-python cuda13.1.1-py3.13-x86_64 PASS 114 0 0 3
wheel-python cuda13.1.1-py3.14-aarch64 PASS 114 0 0 3
wheel-python cuda13.1.1-py3.14-x86_64 PASS 114 0 0 3
wheel-server cuda12.2.2-py3.11-aarch64 PASS 93 0 0 8
wheel-server cuda12.9.1-py3.11-x86_64 PASS 94 0 0 7
wheel-server cuda12.9.1-py3.14-x86_64 PASS 94 0 0 7
wheel-server cuda13.0.2-py3.12-aarch64 PASS 93 0 0 8
wheel-server cuda13.0.2-py3.12-x86_64 PASS 94 0 0 7
wheel-server cuda13.1.1-py3.13-x86_64 PASS 94 0 0 7
wheel-server cuda13.1.1-py3.14-aarch64 PASS 93 0 0 8
wheel-server cuda13.1.1-py3.14-x86_64 PASS 94 0 0 7

Caution

CRASHES detected — a test process was terminated by a signal mid-run.
These need urgent investigation. The JUnit XML was not finalized, so the specific test that triggered the crash may not be identified; check the workflow run log for the last test invoked before the signal.

1 crash — click to expand details

pytest-cuoptPROCESS_CRASH [wheel-python / cuda12.2.2-py3.11-aarch64] — NEW

pytest process terminated by SIGABRT mid-run. The JUnit XML was not finalized; the test that triggered the crash is unknown — inspect the run log for the last test invoked.

Classification compares each failure against the most recent nightly history for the target branch. Tests passed on retry via pytest-rerunfailures are reported as flaky.

@ramakrishnap-nv

Copy link
Copy Markdown
Collaborator Author

/ok to test 0f8275a

@ramakrishnap-nv

Copy link
Copy Markdown
Collaborator Author

/ok to test b622786

1 similar comment
@ramakrishnap-nv

Copy link
Copy Markdown
Collaborator Author

/ok to test b622786

A reusable workflow can only request permissions the caller already has.
The repo's default GITHUB_TOKEN permissions don't include
pull-requests:write, so the caller in pr.yaml must grant it explicitly
for the comment poster.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ramakrishnap-nv

Copy link
Copy Markdown
Collaborator Author

/ok to test 062e063

The assert-failure test already verifies the basic NEW-failure flow.
Add a second test that sends SIGSEGV to pytest mid-run so the
crash-marker path (write_pytest_crash_marker, added in #1191) writes
a PROCESS_CRASH entry that the classifier surfaces under NEW failures
with a SIGSEGV message.

Named test_zz_* so it runs after the assert test in definition order
and the assertion failure still lands in the partial JUnit XML before
pytest is killed.

DO NOT MERGE.  Remove both smoke tests before merging PR #1194.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ramakrishnap-nv

Copy link
Copy Markdown
Collaborator Author

/ok to test c4dfdc8

Crashes (PROCESS_CRASH entries written by PR #1191's
write_pytest_crash_marker, or any failure whose message matches
"crashed with SIG*") now render in a dedicated GitHub `[!CAUTION]`
alert block at the top of the comment.  The full crash diagnostic
goes in a fenced code block — no more half-sentence truncation when
the message gets cut at 140 chars.

Bumps the table-cell short_msg cap from 140 to 300 chars so ordinary
failure messages also stop ending mid-sentence.

Crashes are split out of the NEW and KNOWN buckets before those
sections render, so the CAUTION block is the single source of truth
for crash entries (no duplication).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ramakrishnap-nv

Copy link
Copy Markdown
Collaborator Author

/ok to test 6984bc1

Multi-line docstring closing quotes must be on their own line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ramakrishnap-nv

Copy link
Copy Markdown
Collaborator Author

/ok to test d0e0957

ramakrishnap-nv and others added 2 commits May 13, 2026 10:30
Verify the PR comment shows the CAUTION block on exactly one matrix
entry (cuda*-py3.11-x86_64) while other Python versions stay green.
Uses pytest.mark.skipif on sys.version_info so non-3.11 runs skip
the crash without affecting the assertion-failure smoke test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Keep the CAUTION block as the headline; move per-crash heading and
the full diagnostic into a <details> tab labelled "N crash(es) —
click to expand details".  Same content, click-to-reveal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ramakrishnap-nv

Copy link
Copy Markdown
Collaborator Author

/ok to test 868f6b8

…r bug

Bug fix in classify_pr_against_history (nightly_report.py):
Check `is_flaky` before `status == "active"` for hard failures.  A test
record marked both active and flaky was previously bucketed only as
`known_recurring`, never surfacing `known_flaky_nightly`.  The
flaky-in-PR-run loop already had this precedence; align the hard-fail
loop to match.

Documentation/type-hint review fixes:
- aggregate_common.py: type hints + Google-style Args/Returns/Raises
  docstrings on download_summaries, load_local_summaries,
  aggregate_summaries, html_escape.
- aggregate_pr.py: same on build_comment_body and main().
- nightly_report.py: same on classify_pr_against_history.
- pr_comment_helper.py: same on resolve_base_ref,
  find_existing_comment_id, post_or_update_comment, main().

All files now `from __future__ import annotations` so the modern
``list[dict[str, Any]]`` / ``int | None`` syntax works without
needing typing.List/Dict/Optional.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ci/utils/pr_comment_helper.py`:
- Around line 188-190: The CLI currently accepts free-form --repo and
non-positive --pr; update the _add_common_args helper to validate inputs up
front: enforce that --repo matches the owner/name pattern (e.g., two non-empty
segments separated by a single slash using a simple regex) and enforce that --pr
is an integer > 0 (use an argparse type or validator that raises
argparse.ArgumentTypeError on invalid input). Modify the argument definitions
for "--repo" and "--pr" in _add_common_args to call these validators so the
parser fails fast with clear errors rather than constructing malformed GitHub
API URLs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9ad776a2-52d3-4c3e-a3fd-2ef64b2dfa12

📥 Commits

Reviewing files that changed from the base of the PR and between fc35b36 and af327d4.

📒 Files selected for processing (4)
  • ci/utils/aggregate_common.py
  • ci/utils/aggregate_pr.py
  • ci/utils/nightly_report.py
  • ci/utils/pr_comment_helper.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • ci/utils/aggregate_common.py
  • ci/utils/aggregate_pr.py
  • ci/utils/nightly_report.py

Comment thread ci/utils/pr_comment_helper.py Outdated
Comment on lines +188 to +190
def _add_common_args(sp):
sp.add_argument("--repo", required=True, help="owner/name")
sp.add_argument("--pr", required=True, type=int, help="PR number")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate CLI boundary inputs before constructing GitHub API URLs.

--repo is currently free-form and --pr allows non-positive values. Add strict owner/name validation and enforce PR > 0 to fail fast with clear errors instead of malformed API calls.

Suggested patch
@@
 import argparse
 import json
 import os
+import re
 import sys
 from urllib import error, request
@@
 COMMENT_MARKER = "<!-- pr-test-classification -->"
+REPO_SLUG_RE = re.compile(r"^[A-Za-z0-9_.-]+/[A-Za-z0-9_.-]+$")
@@
+def _positive_int(value: str) -> int:
+    pr_number = int(value)
+    if pr_number <= 0:
+        raise argparse.ArgumentTypeError("PR number must be > 0")
+    return pr_number
+
+
 def _add_common_args(sp):
     sp.add_argument("--repo", required=True, help="owner/name")
-    sp.add_argument("--pr", required=True, type=int, help="PR number")
+    sp.add_argument("--pr", required=True, type=_positive_int, help="PR number (> 0)")
@@
     args = p.parse_args()
+    if not REPO_SLUG_RE.fullmatch(args.repo):
+        print("ERROR: --repo must be in 'owner/name' format.", file=sys.stderr)
+        return 2
     if not args.token:
         print("ERROR: --token or $GITHUB_TOKEN must be set.", file=sys.stderr)
         return 2

As per coding guidelines, **/*.{cpp,cuh,cu,hpp,py}: Flag missing input validation at library and server boundaries.

Also applies to: 233-235

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ci/utils/pr_comment_helper.py` around lines 188 - 190, The CLI currently
accepts free-form --repo and non-positive --pr; update the _add_common_args
helper to validate inputs up front: enforce that --repo matches the owner/name
pattern (e.g., two non-empty segments separated by a single slash using a simple
regex) and enforce that --pr is an integer > 0 (use an argparse type or
validator that raises argparse.ArgumentTypeError on invalid input). Modify the
argument definitions for "--repo" and "--pr" in _add_common_args to call these
validators so the parser fails fast with clear errors rather than constructing
malformed GitHub API URLs.

ramakrishnap-nv and others added 4 commits May 14, 2026 11:45
Address jameslamb's review on PR #1194:

- pr.yaml: drop the now-self-explanatory comment block above the
  pr-test-summary `permissions:` block.
- pr_test_summary.yaml:
    * remove `defaults.run.shell: bash` — GHA's default already is
      bash -e {0}, so the override was redundant.
    * split `apt-get update && apt-get install` onto two lines.
    * drop the unused GITHUB_SERVER_URL env (now that pr_summary.sh
      hard-codes https://github.com).
- pr_summary.sh: rewrite to be strict.
    * all required env vars validated up front via `: "${VAR:?}"`
      assertions — including GITHUB_TOKEN, which now fails the job
      loudly before aggregate_pr.py runs instead of after.
    * remove every `:-default` fallback (CUOPT_AWS_*, GITHUB_BASE_REF,
      GITHUB_SHA, GITHUB_SERVER_URL).
    * hard-code https://github.com instead of templating
      GITHUB_SERVER_URL — this code is GitHub-specific and there is
      no real use case for overriding the server URL.
    * remove the silent "WARNING; exit 0" fallbacks for missing
      CUOPT_S3_URI / GITHUB_RUN_ID — the strict assertions cover them
      and the job should fail loudly when misconfigured.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Continue the simplification:

aggregate_pr.py:
- `--target-branch`, `--sha`, `--github-run-url` are now required.  No
  more env-var fallbacks in argparse defaults — the shell script passes
  every value explicitly, and the env-var fallback was a second hidden
  source of configuration.
- `--s3-pr-summaries-prefix` / `--local-summaries-dir` go into an
  argparse mutually-exclusive group with `required=True`, removing the
  manual ``return 2`` validation block.
- Keep static defaults that are not env-fallbacks: `--output-dir`,
  `--run-date` (defaults to today UTC).

pr_comment_helper.py:
- Drop the `--token` CLI flag entirely; read `GITHUB_TOKEN` from env
  directly with a strict early-fail when missing.  Removes the
  dual-source pattern (CLI default + env fallback).
- Drop the `--marker` CLI flag.  `COMMENT_MARKER` is the only valid
  value — make it the single source of truth.  Public functions still
  accept a `marker` kwarg (default `COMMENT_MARKER`) for unit testing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove section-divider banners and what-not-why comments now that the
function docstrings and well-named identifiers carry the same
information.  Keep all WHY comments (precedence reasoning in the
classifier, constant-value rationale, the CUOPT_AWS_*-to-AWS_*
mapping note).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The fallback exists because GHA leaves `github.base_ref` empty for the
`push` events the PR workflow triggers on, and the shared rapidsai
test workflows don't propagate a target branch into the test
container.  Inline comment + a follow-up note in the PR description
to centralize PR classification in pr-test-summary (which already
resolves the base ref via the API) so the fallback can be removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@jameslamb jameslamb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes and for re-writing the description, it's much clearer to me now what the goal is.

Now that I can see what it is, I don't support this.

I'm not convinced that this can work well (for example... we run different test matrices in nightlies vs. PRs, won't that confuse the "known issue" logic?), or that it's worth the costs it imposes which include:

  • non-trivial new complexity and maintenance burden
  • expanded GitHub Actions workflow permissions
  • new cloud costs (API calls + storage in S3)
  • new risk of hitting GitHub API limits (which could block all other operations in the repo, like artifact uploading / downloading)
  • a huge amount of new CI code (more code = larger changesets = easier for malicious code to sneak past human review)

HOWEVER... I'm leaving a non-blocking review. I am not an active maintainer of cuOpt. If you weigh all of those costs and still think the benefits outweigh them, and get another cuOpt maintainer in cuopt-ci-codeowners like @tmckayus to approve, then I won't stand in the way.

Comment thread ci/utils/aggregate_pr.py
flaked_in_pr_run = []

for entry in recurring:
cls = entry.get("pr_classification", "")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really struggling to follow this code. Using dataclasses to represent the JSON data would be clearer and stricter.

@jameslamb jameslamb dismissed their stale review May 14, 2026 19:57

dismissing my previous blocking review.

@ramakrishnap-nv

Copy link
Copy Markdown
Collaborator Author

Thanks for the changes and for re-writing the description, it's much clearer to me now what the goal is.

Now that I can see what it is, I don't support this.

I'm not convinced that this can work well (for example... we run different test matrices in nightlies vs. PRs, won't that confuse the "known issue" logic?), or that it's worth the costs it imposes which include:

  • non-trivial new complexity and maintenance burden
  • expanded GitHub Actions workflow permissions
  • new cloud costs (API calls + storage in S3)
  • new risk of hitting GitHub API limits (which could block all other operations in the repo, like artifact uploading / downloading)
  • a huge amount of new CI code (more code = larger changesets = easier for malicious code to sneak past human review)

HOWEVER... I'm leaving a non-blocking review. I am not an active maintainer of cuOpt. If you weigh all of those costs and still think the benefits outweigh them, and get another cuOpt maintainer in cuopt-ci-codeowners like @tmckayus to approve, then I won't stand in the way.

Hmm, I agree with nightly vs PR test suite testing on different matrices, but most of the times, the tests are failing similarly in both, so the corner case seems acceptable.

But I my major concern would be the new failure points getting added due to these which might trigger other issues.

Currently we have nightly dashboard which showcases all the failures and may be users can manually cross reference when needed.

@mlubin @chris-maes @rg20 what are your thoughts ?

@gforsyth

Copy link
Copy Markdown
Contributor

I would be concerned about the ability to distinguish between "this common failure is because this test is flaky" and "this seemingly common failure is because of a bug that's just been introduced".

Perfectly deterministic tests are hard, and I can imagine optimization problems make them even harder -- but if there are tests that become routine to ignore, then those tests no longer provide a useful signal, or they potentially mask a useful signal.

I would suggest that if a test flakes out more than 50% of the time, that is a broken test -- it should be fixed, or made more flexible (by, e.g. loosening comparison tolerances), or removed entirely.

@mlubin

mlubin commented May 14, 2026

Copy link
Copy Markdown
Contributor

I'd like to put aside the workflow/procedural discussion of if or when we should decide to merge PRs despite test failures, that's not being changed by this PR and is probably better to cover on a call.

The team is excited about the reporting aspect because test failures are common, and having to dig through the CI logs on each occasion is a waste of developer time. The tie in with historical data on if this is a new or pre-existing or flaky failure is also useful and saves developer time. However, maybe this is too much for a single PR and we could split this into more digestible chunks.

@jameslamb how many of your concerns would still apply to a bot that narrowly reads the PR's CI artifacts and produces a comment summarizing which tests failed/crashed?

@jameslamb

Copy link
Copy Markdown
Member

how many of your concerns would still apply to a bot that narrowly reads the PR's CI artifacts and produces a comment summarizing which tests failed/crashed?

That would eliminate the need for S3, which'd remove the concerns about cost and remove the need for this new workflow to have access to AWS credentials.

The other concerns would remain.

I would not support something like this RAPIDS-wide, I think the costs/risks outweigh the benefits and I agree with @gforsyth 's comments.

You all (cuOpt developers) can decide how you want to proceed here... I am not blocking the PR and you don't have to convince me. I'm going to remove myself from this PR.

@jameslamb jameslamb requested review from jameslamb and tmckayus and removed request for jameslamb May 15, 2026 14:16
rapids-bot Bot pushed a commit that referenced this pull request May 15, 2026
…1224)

Updates `skills/cuopt-developer` to capture the reviewer feedback from #1194:

- Expand the PR-description rule with an explicit "don't include" list (how-it-works walkthroughs, file tables, exhaustive test-plan checklists).
- Add a new "Editing CI scripts and workflows" section: prefer extending existing scripts, don't restate framework defaults, no fallback values for required inputs, hard-code GitHub URLs, validate early, split chained bash commands.

Goal is to keep future agent-authored PRs that touch `ci/` or `.github/workflows/` from generating the same review round-trips.

Authors:
  - Ramakrishnap (https://github.com/rgsl888prabhu)

Approvers:
  - Trevor McKay (https://github.com/tmckayus)
  - Miles Lubin (https://github.com/mlubin)

URL: #1224
chris-maes pushed a commit to chris-maes/cuopt that referenced this pull request May 18, 2026
…VIDIA#1224)

Updates `skills/cuopt-developer` to capture the reviewer feedback from NVIDIA#1194:

- Expand the PR-description rule with an explicit "don't include" list (how-it-works walkthroughs, file tables, exhaustive test-plan checklists).
- Add a new "Editing CI scripts and workflows" section: prefer extending existing scripts, don't restate framework defaults, no fallback values for required inputs, hard-code GitHub URLs, validate early, split chained bash commands.

Goal is to keep future agent-authored PRs that touch `ci/` or `.github/workflows/` from generating the same review round-trips.

Authors:
  - Ramakrishnap (https://github.com/rgsl888prabhu)

Approvers:
  - Trevor McKay (https://github.com/tmckayus)
  - Miles Lubin (https://github.com/mlubin)

URL: NVIDIA#1224
@anandhkb anandhkb added this to the 26.08 milestone May 19, 2026
@ramakrishnap-nv ramakrishnap-nv changed the base branch from main to release/26.06 May 20, 2026 17:43
@tmckayus

Copy link
Copy Markdown
Contributor

I like the idea of running a bot outside of github to pull the logs/artifacts and evaluate, and even post back to the PR with conclusions against a database somewhere. Such a bot could even be triggered by a webhook. The nightly runs are already logging information about tests, so that could remain. We could grow the capabilities of an external bot over time, and it remains credential issues etc from github.

As for ignoring flakes, that's a current and different problem. We have some flaky tests now and many times, under time pressure, we just re-run til we get a pass. We need to be more intentional about fixing or scrapping those tests, regardless of the tool.

@github-actions

Copy link
Copy Markdown

🔔 Hi @anandhkb @rgsl888prabhu, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

1 similar comment
@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown

🔔 Hi @anandhkb @rgsl888prabhu, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

@github-actions

Copy link
Copy Markdown

🔔 Hi @anandhkb @rgsl888prabhu, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

1 similar comment
@github-actions

Copy link
Copy Markdown

🔔 Hi @anandhkb @rgsl888prabhu, this pull request has had no activity for 7 days. Please update or let us know if it can be closed. Thank you!

If this is an "epic" issue, then please add the "epic" label to this issue.
If it is a PR and not ready for review, then please convert this to draft.
If you just want to switch off this notification, then use the "skip inactivity reminder" label.

@ramakrishnap-nv

Copy link
Copy Markdown
Collaborator Author

closing this on behalf of #1412

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants