chore: Add workflow to add area labels to PRs#511
Conversation
Signed-off-by: Kendrick Boyd <kendrickb@nvidia.com>
WalkthroughThis PR adds automated PR labeling to the repository using GitHub Actions. It introduces a labeler configuration file that maps area labels to changed-file patterns, a workflow that triggers the labeler on PR events, and updates documentation to explain the auto-labeling behavior. ChangesPR Labeling Automation
🎯 2 (Simple) | ⏱️ ~12 minutes Possibly Related PRs
Suggested Labels
Suggested Reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Greptile SummaryThis PR adds a GitHub Actions workflow that automatically applies
Confidence Score: 4/5Safe to merge; the workflow is correctly scoped and the only concern is an unpinned action tag. The pull_request_target workflow is well-constructed — no untrusted code checkout, minimal permissions, and additive-only labels. The one thing worth fixing before or shortly after merge is that actions/labeler@v6 is referenced by a mutable tag rather than a commit SHA, which leaves the workflow open to supply-chain tag movements. .github/workflows/labeler.yml — pin the labeler action to a commit SHA.
|
| Filename | Overview |
|---|---|
| .github/labeler.yml | New labeler config using actions/labeler@v6 syntax; glob mappings cover all documented product and infrastructure areas; no issues found. |
| .github/workflows/labeler.yml | New workflow using pull_request_target safely (no code checkout, minimal permissions); action pinned to a mutable v6 tag rather than a commit SHA. |
| CONTRIBUTING.md | Docs updated to reflect auto-labeling behavior; clear, accurate, and consistent with the new workflow. |
Sequence Diagram
sequenceDiagram
participant Dev as PR Author
participant GH as GitHub
participant WT as Workflow (pull_request_target)
participant LA as actions/labeler@v6
participant Cfg as .github/labeler.yml
Dev->>GH: Open / push to PR
GH->>WT: Trigger pull_request_target event
WT->>Cfg: Read labeler config (base branch)
WT->>LA: Run labeler action
LA->>GH: Evaluate changed files against glob rules
GH-->>LA: List of matched area labels
LA->>GH: Apply matched labels (additive only)
GH-->>Dev: "PR updated with area:* labels"
Reviews (1): Last reviewed commit: "chore: Add workflow to add area labels t..." | Re-trigger Greptile
| name: Apply area labels | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/labeler@v6 |
There was a problem hiding this comment.
The
actions/labeler action is referenced by a mutable tag (v6). If the tag is ever force-pushed — whether by the upstream maintainer or a supply-chain compromise — the new code runs with pull-requests: write access on every PR event. Pinning to an immutable commit SHA guarantees you always run exactly the code you reviewed.
| - uses: actions/labeler@v6 | |
| - uses: actions/labeler@ac9175f8a1f3729cd2a7092fd2cf5f7156284e6c # v6.3.1 |
There was a problem hiding this comment.
@binaryaaron @mckornfield Seems like a good idea, should we generally be pinning to shas for actions?
I'm also confused where greptile found v6.3.1, https://github.com/actions/labeler/releases only goes up to 6.1.0.
Agent suggests pinning all the actions in one go, as having one action pinned but a bunch of others unpinned isn't that helpful.
There was a problem hiding this comment.
Actionable comments posted: 3
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2156e8b5-5d45-4225-b4c0-e28013290fc5
📒 Files selected for processing (3)
.github/labeler.yml.github/workflows/labeler.ymlCONTRIBUTING.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: conventional-commit / semantic-pull-request
- GitHub Check: conventional-commit / semantic-pull-request
- GitHub Check: Unit Tests (3.11)
- GitHub Check: Unit Tests (3.13)
- GitHub Check: Unit Tests (3.12)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{md,markdown,py}
📄 CodeRabbit inference engine (.cursor/rules/agent-markdown-style.mdc)
**/*.{md,markdown,py}: Avoid decorative bold (**text**) in list items, body text, and docstrings; use structural cues (headers, list markers, colons, backticks) for emphasis instead
Use backticks for code identifiers, paths, and CLI commands in markdown and docstrings
Files:
CONTRIBUTING.md
**/*.{md,markdown}
📄 CodeRabbit inference engine (.cursor/rules/agent-markdown-style.mdc)
**/*.{md,markdown}: Bold is acceptable only in markdown tables where it's the conventional way to mark header-like cells in the body
Use##headers to segment markdown sections instead of bold text
Use--(em-dash) instead of-(hyphen) for asides in markdown
Files:
CONTRIBUTING.md
**/*.md
📄 CodeRabbit inference engine (STYLE_GUIDE.md)
**/*.md: No decorative**bold**in body text, list items, or docstrings. Use headers, list markers, colons, and backticks for structure.
Use--(em-dash) for asides, not-(hyphen).
Use single backticks for code identifiers, paths, and CLI commands in Markdown.
Use Mermaid diagrams with no spaces in node IDs, quote labels with special characters, no explicit colors or styles.
Include SPDX copyright header in Markdown files using HTML comments:<!-- SPDX-FileCopyrightText: ... -->and<!-- SPDX-License-Identifier: Apache-2.0 -->. Exception: for.mdfiles with YAML frontmatter, include hash-comment headers inside the frontmatter block.
Files:
CONTRIBUTING.md
**/*
📄 CodeRabbit inference engine (STYLE_GUIDE.md)
**/*: Include a newline at the end of all files, never trailing whitespace. This is enforced bypre-commit.
Use line length of 120 characters for code, comments, and docstrings (configured inruff.toml).
Files:
CONTRIBUTING.md
⚙️ CodeRabbit configuration file
**/*: Review as a senior maintainer for NeMo Safe Synthesizer. Prioritize issues that can change behavior, break user workflows, weaken privacy guarantees, hide failures, make tests unreliable, or create maintenance risk. Avoid generic style commentary unless it points to a concrete project convention that automated tools will not catch.
Comment only when the finding is actionable and tied to changed code. For each finding, state the impact, the condition that triggers it, and the smallest practical fix. Prefer one precise comment over broad advice. Do not ask for refactors outside the PR scope unless the changed code creates the problem.
Review type guidance: - Potential issue: use for correctness bugs, data loss, privacy leaks,
security risks, broken public APIs, invalid config behavior, missing
validation, hidden failures, nondeterministic tests, or CI breakage.
- Refactor suggestion: use for local maintainability problems introduced
by the diff when they have clear future cost, such as duplicated setup,
unclear boundaries, over-mocking, avoidable complexity, or opaque test
helpers.- Nitpick: avoid in chill mode. Do not emit formatting, import-order,
wording, or style-only comments unless automated tools cannot catch the
issue and it affects maintainability.Severity guidance: - Critical: security/privacy leaks, data loss, training/test/holdout
contamination, or broken release/package/core pipeline execution.
- Major: incorrect generation/training/evaluation behavior, broken
CLI/SDK public API, invalid config defaults or validators, or GPU/vLLM
cleanup and process-isolation bugs likely to fail CI or production
runs.- Minor: localized bugs, missing focused tests for changed behavior, or
bad test patterns that weaken regression coverage.- Trivial: small cleanup with no behavior impact. Usually suppress in
chill mode.- Info: context only. Avoid unless it helps reviewers understand risk.
Safe-Synthesizer-specific review focus: - Data ...
Files:
CONTRIBUTING.md
**/*.{py,sh,yaml,yml,md}
📄 CodeRabbit inference engine (CONTRIBUTING.md)
All source files (.py, .sh, .yaml, .yml, .md) must include SPDX copyright headers
Files:
CONTRIBUTING.md
.github/**
⚙️ CodeRabbit configuration file
Review GitHub configuration for branch protection expectations, CODEOWNERS alignment, least privilege permissions, pinned actions where practical, and consistency with CONTRIBUTING.md.
Files:
.github/workflows/labeler.yml.github/labeler.yml
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA-NeMo/Safe-Synthesizer
Timestamp: 2026-05-22T23:02:30.612Z
Learning: All commits must follow the Conventional Commits specification with format <type>(<scope>): <description>, where type must be one of: feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert
Learnt from: CR
Repo: NVIDIA-NeMo/Safe-Synthesizer
Timestamp: 2026-05-22T23:02:30.612Z
Learning: All branches except main must follow the naming pattern: <author>/<description>, <author>/<issue-id>-<description>, <author>/<type>/<description>, or <author>/<type>/<issue-id>-<description>, where author is lowercase alphanumeric with hyphens, and type is one of: feature, bugfix, hotfix, release, docs, chore, test
Learnt from: CR
Repo: NVIDIA-NeMo/Safe-Synthesizer
Timestamp: 2026-05-22T23:02:30.612Z
Learning: All commits must be cryptographically signed (GPG or SSH) and include DCO sign-off via git commit -s
Learnt from: CR
Repo: NVIDIA-NeMo/Safe-Synthesizer
Timestamp: 2026-05-22T23:02:30.612Z
Learning: Pull request titles must follow Conventional Commits format as they become the squash-merged commit message
Learnt from: CR
Repo: NVIDIA-NeMo/Safe-Synthesizer
Timestamp: 2026-05-22T23:02:30.612Z
Learning: Release versions must follow PEP440 format with major.minor.patch numbers; use rcN suffix for release candidates (no dash); GitHub tags must start with 'v' prefix
🪛 LanguageTool
CONTRIBUTING.md
[uncategorized] ~380-~380: The official name of this software platform is spelled with a capital “H”.
Context: ...n the files changed, using the rules in .github/labeler.yml. Lab...
(GITHUB)
🪛 zizmor (1.25.2)
.github/workflows/labeler.yml
[error] 45-45: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
🔇 Additional comments (2)
.github/labeler.yml (1)
1-139: LGTM!CONTRIBUTING.md (1)
378-380: LGTM!
| # Copyright (c) 2024-2026, NVIDIA CORPORATION. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
There was a problem hiding this comment.
Add SPDX headers required for .yml source files.
This workflow file is missing SPDX header lines in the required format. Smallest fix: add SPDX copyright and license identifiers at the top.
Suggested patch
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
# Copyright (c) 2024-2026, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");As per coding guidelines: "All source files (.py, .sh, .yaml, .yml, .md) must include SPDX copyright headers."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Copyright (c) 2024-2026, NVIDIA CORPORATION. | |
| # | |
| # Licensed under the Apache License, Version 2.0 (the "License"); | |
| # you may not use this file except in compliance with the License. | |
| # You may obtain a copy of the License at | |
| # | |
| # http://www.apache.org/licenses/LICENSE-2.0 | |
| # | |
| # Unless required by applicable law or agreed to in writing, software | |
| # distributed under the License is distributed on an "AS IS" BASIS, | |
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| # See the License for the specific language governing permissions and | |
| # limitations under the License. | |
| # SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| # SPDX-License-Identifier: Apache-2.0 | |
| # | |
| # Copyright (c) 2024-2026, NVIDIA CORPORATION. | |
| # | |
| # Licensed under the Apache License, Version 2.0 (the "License"); | |
| # you may not use this file except in compliance with the License. | |
| # You may obtain a copy of the License at | |
| # | |
| # http://www.apache.org/licenses/LICENSE-2.0 | |
| # | |
| # Unless required by applicable law or agreed to in writing, software | |
| # distributed under the License is distributed on an "AS IS" BASIS, | |
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| # See the License for the specific language governing permissions and | |
| # limitations under the License. |
There was a problem hiding this comment.
@binaryaaron Other workflows/*yml files don't have the SPDX header and make format && make check passes. So is this not relevant? Or do we need to update our copyright header check?
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| name: Apply area labels | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/labeler@v6 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify all workflow actions are pinned to commit SHAs (40 hex chars).
rg -nP '^\s*-\s*uses:\s*[^@]+@(?!(?:[a-f0-9]{40})$).+$' .github/workflowsRepository: NVIDIA-NeMo/Safe-Synthesizer
Length of output: 430
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== First 30 lines: .github/workflows/labeler.yml ==="
sed -n '1,30p' .github/workflows/labeler.yml
echo
echo "=== SPDX occurrences in .github/workflows/*.yml ==="
rg -n "SPDX|SPDX-License-Identifier" .github/workflows -S || true
echo
echo "=== First 30 lines: another workflow for comparison (.github/workflows/release.yml) ==="
sed -n '1,30p' .github/workflows/release.ymlRepository: NVIDIA-NeMo/Safe-Synthesizer
Length of output: 2860
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Ensure actions/labeler is pinned to a 40-hex commit SHA
rg -nP '^\s*-\s*uses:\s*actions/labeler@(?!(?:[a-f0-9]{40})$)[^ ]+' .github/workflows/labeler.yml || true
# 2) Ensure SPDX identifier exists
rg -n "SPDX-License-Identifier" .github/workflows/labeler.yml || trueRepository: NVIDIA-NeMo/Safe-Synthesizer
Length of output: 108
Pin actions/labeler to a full commit SHA in .github/workflows/labeler.yml (line 45).
actions/labeler@v6 is a mutable tag, creating supply-chain drift risk in a pull_request_target workflow.
Add an SPDX-License-Identifier header to .github/workflows/labeler.yml.
The file has the Apache license block but no SPDX-License-Identifier line.
Suggested patch
- - uses: actions/labeler@v6
+ - uses: actions/labeler@<full-commit-sha-for-v6>🧰 Tools
🪛 zizmor (1.25.2)
[error] 45-45: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
| Where in the codebase the work lands. Optional on issues. | ||
|
|
||
| On PRs, `area:*` labels are auto-applied based on the files changed, using the rules in [`.github/labeler.yml`](.github/labeler.yml). Labels are additive only -- maintainers can add, remove, or override them manually, and the action will not undo manual changes on subsequent pushes. Update the labeler rules in the same PR that introduces new modules or area labels. |
There was a problem hiding this comment.
Add SPDX HTML headers to this Markdown file.
This changed .md file currently has no SPDX header comments, which can fail repository compliance checks. Smallest fix: add the two required HTML comments at the top of CONTRIBUTING.md.
Suggested patch
+<!-- SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -->
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
# Contributing to NeMo Safe SynthesizerAs per coding guidelines: "All source files (.py, .sh, .yaml, .yml, .md) must include SPDX copyright headers."
🧰 Tools
🪛 LanguageTool
[uncategorized] ~380-~380: The official name of this software platform is spelled with a capital “H”.
Context: ...n the files changed, using the rules in .github/labeler.yml. Lab...
(GITHUB)
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Summary
Follow up on #503 to automate adding area:* labels to PRs.
Pre-Review Checklist
Github workflow only changes, python tests are not relevant.
Ensure that the following pass:
make format && make checkor via prek validation.make testpasses locallymake test-e2epasses locallymake test-ci-containerpasses locally (recommended)/syncon this PR to trigger a run (auto-triggers on ready-for-review)Pre-Merge Checklist
Other Notes
Summary by CodeRabbit
Release Notes