Skip to content

chore: Add workflow to add area labels to PRs#511

Open
kendrickb-nvidia wants to merge 1 commit into
mainfrom
kendrickb-nvidia/auto-label-areas
Open

chore: Add workflow to add area labels to PRs#511
kendrickb-nvidia wants to merge 1 commit into
mainfrom
kendrickb-nvidia/auto-label-areas

Conversation

@kendrickb-nvidia
Copy link
Copy Markdown
Collaborator

@kendrickb-nvidia kendrickb-nvidia commented May 22, 2026

Summary

Follow up on #503 to automate adding area:* labels to PRs.

Pre-Review Checklist

Github workflow only changes, python tests are not relevant.

Ensure that the following pass:

  • make format && make check or via prek validation.
  • make test passes locally
  • make test-e2e passes locally
  • make test-ci-container passes locally (recommended)
  • GPU CI status check passes -- comment /sync on this PR to trigger a run (auto-triggers on ready-for-review)

Pre-Merge Checklist

  • New or updated tests for any fix or new behavior
  • Updated documentation for new features and behaviors, including docstrings for API docs.

Other Notes

Summary by CodeRabbit

Release Notes

  • Chores
    • Implemented automated PR labeling system that categorizes pull requests by area (SDK/CLI, config, data processing, evaluation, generation, training, PII, privacy, LLM, observability, tests, docs, CI, dev experience, and build) based on modified files.
    • Updated contributor documentation to clarify that area labels are automatically applied based on file changes.

Review Change Stack

Signed-off-by: Kendrick Boyd <kendrickb@nvidia.com>
@kendrickb-nvidia kendrickb-nvidia requested a review from a team as a code owner May 22, 2026 23:02
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

Walkthrough

This PR adds automated PR labeling to the repository using GitHub Actions. It introduces a labeler configuration file that maps area labels to changed-file patterns, a workflow that triggers the labeler on PR events, and updates documentation to explain the auto-labeling behavior.

Changes

PR Labeling Automation

Layer / File(s) Summary
Label rules and configuration
.github/labeler.yml
Labeler configuration defining area labels for ten code domains (area:sdk-cli, area:config, area:data-processing, area:evaluation, area:generation, area:training, area:pii, area:privacy, area:llm, area:observability) mapped to src/nemo_safe_synthesizer/** subtrees, and five infrastructure labels (area:tests, area:docs, area:ci, area:dev-ex, area:build-dist) mapped to test/docs/workflow/build paths.
Labeler workflow automation
.github/workflows/labeler.yml
GitHub Actions workflow that invokes actions/labeler@v6 on PR opened/synchronize/reopened events with read repository and write pull-request permissions, configured to apply labels additively without overriding manual changes.
Labeling documentation
CONTRIBUTING.md
Documentation update clarifying that area:* labels are automatically applied based on changed files via .github/labeler.yml and explaining that label changes are additive and preserve manual overrides across subsequent pushes.

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly Related PRs

  • NVIDIA-NeMo/Safe-Synthesizer#503: Both PRs update repository label documentation in CONTRIBUTING.md to define and explain the label system, making them directly aligned around the same area-label infrastructure.

Suggested Labels

chore, docs

Suggested Reviewers

  • binaryaaron
  • mckornfield
  • alexahaushalter
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding a GitHub Actions workflow to automatically apply area labels to pull requests based on changed files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kendrickb-nvidia/auto-label-areas

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added docs Documentation-only change chore Maintenance not tied to a user-visible change labels May 22, 2026
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 22, 2026

Greptile Summary

This PR adds a GitHub Actions workflow that automatically applies area:* labels to pull requests based on changed files, using actions/labeler@v6 driven by a new .github/labeler.yml config. CONTRIBUTING.md is updated to document the auto-labeling behavior and instruct contributors to update labeler rules alongside new modules.

  • .github/labeler.yml: Defines glob rules for all product areas (area:sdk-cli, area:config, area:generation, etc.) and infrastructure areas (area:ci, area:tests, area:docs, area:dev-ex, area:build-dist), mirroring the module map in AGENTS.md.
  • .github/workflows/labeler.yml: Runs on pull_request_target (safe — no code checkout) with contents: read / pull-requests: write permissions; labels are additive only (sync-labels: false) so manual overrides are preserved.
  • CONTRIBUTING.md: Area-label section updated to reflect that area:* labels are now auto-applied rather than manually set on PRs.

Confidence Score: 4/5

Safe to merge; the workflow is correctly scoped and the only concern is an unpinned action tag.

The pull_request_target workflow is well-constructed — no untrusted code checkout, minimal permissions, and additive-only labels. The one thing worth fixing before or shortly after merge is that actions/labeler@v6 is referenced by a mutable tag rather than a commit SHA, which leaves the workflow open to supply-chain tag movements.

.github/workflows/labeler.yml — pin the labeler action to a commit SHA.

Security Review

  • Supply-chain risk (actions/labeler@v6) in .github/workflows/labeler.yml: the action is pinned to a mutable tag; a force-push to that tag would silently replace the running code with arbitrary content that has pull-requests: write access. Pinning to a full commit SHA closes this. No other security issues were identified — pull_request_target is used without a code checkout step, and permissions are scoped to the minimum required.

Important Files Changed

Filename Overview
.github/labeler.yml New labeler config using actions/labeler@v6 syntax; glob mappings cover all documented product and infrastructure areas; no issues found.
.github/workflows/labeler.yml New workflow using pull_request_target safely (no code checkout, minimal permissions); action pinned to a mutable v6 tag rather than a commit SHA.
CONTRIBUTING.md Docs updated to reflect auto-labeling behavior; clear, accurate, and consistent with the new workflow.

Sequence Diagram

sequenceDiagram
    participant Dev as PR Author
    participant GH as GitHub
    participant WT as Workflow (pull_request_target)
    participant LA as actions/labeler@v6
    participant Cfg as .github/labeler.yml

    Dev->>GH: Open / push to PR
    GH->>WT: Trigger pull_request_target event
    WT->>Cfg: Read labeler config (base branch)
    WT->>LA: Run labeler action
    LA->>GH: Evaluate changed files against glob rules
    GH-->>LA: List of matched area labels
    LA->>GH: Apply matched labels (additive only)
    GH-->>Dev: "PR updated with area:* labels"
Loading

Reviews (1): Last reviewed commit: "chore: Add workflow to add area labels t..." | Re-trigger Greptile

name: Apply area labels
runs-on: ubuntu-latest
steps:
- uses: actions/labeler@v6
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 security The actions/labeler action is referenced by a mutable tag (v6). If the tag is ever force-pushed — whether by the upstream maintainer or a supply-chain compromise — the new code runs with pull-requests: write access on every PR event. Pinning to an immutable commit SHA guarantees you always run exactly the code you reviewed.

Suggested change
- uses: actions/labeler@v6
- uses: actions/labeler@ac9175f8a1f3729cd2a7092fd2cf5f7156284e6c # v6.3.1

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@binaryaaron @mckornfield Seems like a good idea, should we generally be pinning to shas for actions?

I'm also confused where greptile found v6.3.1, https://github.com/actions/labeler/releases only goes up to 6.1.0.

Agent suggests pinning all the actions in one go, as having one action pinned but a bunch of others unpinned isn't that helpful.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2156e8b5-5d45-4225-b4c0-e28013290fc5

📥 Commits

Reviewing files that changed from the base of the PR and between 3feda50 and 635727c.

📒 Files selected for processing (3)
  • .github/labeler.yml
  • .github/workflows/labeler.yml
  • CONTRIBUTING.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: conventional-commit / semantic-pull-request
  • GitHub Check: conventional-commit / semantic-pull-request
  • GitHub Check: Unit Tests (3.11)
  • GitHub Check: Unit Tests (3.13)
  • GitHub Check: Unit Tests (3.12)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{md,markdown,py}

📄 CodeRabbit inference engine (.cursor/rules/agent-markdown-style.mdc)

**/*.{md,markdown,py}: Avoid decorative bold (**text**) in list items, body text, and docstrings; use structural cues (headers, list markers, colons, backticks) for emphasis instead
Use backticks for code identifiers, paths, and CLI commands in markdown and docstrings

Files:

  • CONTRIBUTING.md
**/*.{md,markdown}

📄 CodeRabbit inference engine (.cursor/rules/agent-markdown-style.mdc)

**/*.{md,markdown}: Bold is acceptable only in markdown tables where it's the conventional way to mark header-like cells in the body
Use ## headers to segment markdown sections instead of bold text
Use -- (em-dash) instead of - (hyphen) for asides in markdown

Files:

  • CONTRIBUTING.md
**/*.md

📄 CodeRabbit inference engine (STYLE_GUIDE.md)

**/*.md: No decorative **bold** in body text, list items, or docstrings. Use headers, list markers, colons, and backticks for structure.
Use -- (em-dash) for asides, not - (hyphen).
Use single backticks for code identifiers, paths, and CLI commands in Markdown.
Use Mermaid diagrams with no spaces in node IDs, quote labels with special characters, no explicit colors or styles.
Include SPDX copyright header in Markdown files using HTML comments: <!-- SPDX-FileCopyrightText: ... --> and <!-- SPDX-License-Identifier: Apache-2.0 -->. Exception: for .md files with YAML frontmatter, include hash-comment headers inside the frontmatter block.

Files:

  • CONTRIBUTING.md
**/*

📄 CodeRabbit inference engine (STYLE_GUIDE.md)

**/*: Include a newline at the end of all files, never trailing whitespace. This is enforced by pre-commit.
Use line length of 120 characters for code, comments, and docstrings (configured in ruff.toml).

Files:

  • CONTRIBUTING.md

⚙️ CodeRabbit configuration file

**/*: Review as a senior maintainer for NeMo Safe Synthesizer. Prioritize issues that can change behavior, break user workflows, weaken privacy guarantees, hide failures, make tests unreliable, or create maintenance risk. Avoid generic style commentary unless it points to a concrete project convention that automated tools will not catch.
Comment only when the finding is actionable and tied to changed code. For each finding, state the impact, the condition that triggers it, and the smallest practical fix. Prefer one precise comment over broad advice. Do not ask for refactors outside the PR scope unless the changed code creates the problem.
Review type guidance: - Potential issue: use for correctness bugs, data loss, privacy leaks,
security risks, broken public APIs, invalid config behavior, missing
validation, hidden failures, nondeterministic tests, or CI breakage.

  • Refactor suggestion: use for local maintainability problems introduced
    by the diff when they have clear future cost, such as duplicated setup,
    unclear boundaries, over-mocking, avoidable complexity, or opaque test
    helpers.
  • Nitpick: avoid in chill mode. Do not emit formatting, import-order,
    wording, or style-only comments unless automated tools cannot catch the
    issue and it affects maintainability.

Severity guidance: - Critical: security/privacy leaks, data loss, training/test/holdout
contamination, or broken release/package/core pipeline execution.

  • Major: incorrect generation/training/evaluation behavior, broken
    CLI/SDK public API, invalid config defaults or validators, or GPU/vLLM
    cleanup and process-isolation bugs likely to fail CI or production
    runs.
  • Minor: localized bugs, missing focused tests for changed behavior, or
    bad test patterns that weaken regression coverage.
  • Trivial: small cleanup with no behavior impact. Usually suppress in
    chill mode.
  • Info: context only. Avoid unless it helps reviewers understand risk.
    Safe-Synthesizer-specific review focus: - Data ...

Files:

  • CONTRIBUTING.md
**/*.{py,sh,yaml,yml,md}

📄 CodeRabbit inference engine (CONTRIBUTING.md)

All source files (.py, .sh, .yaml, .yml, .md) must include SPDX copyright headers

Files:

  • CONTRIBUTING.md
.github/**

⚙️ CodeRabbit configuration file

Review GitHub configuration for branch protection expectations, CODEOWNERS alignment, least privilege permissions, pinned actions where practical, and consistency with CONTRIBUTING.md.

Files:

  • .github/workflows/labeler.yml
  • .github/labeler.yml
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA-NeMo/Safe-Synthesizer

Timestamp: 2026-05-22T23:02:30.612Z
Learning: All commits must follow the Conventional Commits specification with format <type>(<scope>): <description>, where type must be one of: feat, fix, docs, style, refactor, perf, test, build, ci, chore, revert
Learnt from: CR
Repo: NVIDIA-NeMo/Safe-Synthesizer

Timestamp: 2026-05-22T23:02:30.612Z
Learning: All branches except main must follow the naming pattern: <author>/<description>, <author>/<issue-id>-<description>, <author>/<type>/<description>, or <author>/<type>/<issue-id>-<description>, where author is lowercase alphanumeric with hyphens, and type is one of: feature, bugfix, hotfix, release, docs, chore, test
Learnt from: CR
Repo: NVIDIA-NeMo/Safe-Synthesizer

Timestamp: 2026-05-22T23:02:30.612Z
Learning: All commits must be cryptographically signed (GPG or SSH) and include DCO sign-off via git commit -s
Learnt from: CR
Repo: NVIDIA-NeMo/Safe-Synthesizer

Timestamp: 2026-05-22T23:02:30.612Z
Learning: Pull request titles must follow Conventional Commits format as they become the squash-merged commit message
Learnt from: CR
Repo: NVIDIA-NeMo/Safe-Synthesizer

Timestamp: 2026-05-22T23:02:30.612Z
Learning: Release versions must follow PEP440 format with major.minor.patch numbers; use rcN suffix for release candidates (no dash); GitHub tags must start with 'v' prefix
🪛 LanguageTool
CONTRIBUTING.md

[uncategorized] ~380-~380: The official name of this software platform is spelled with a capital “H”.
Context: ...n the files changed, using the rules in .github/labeler.yml. Lab...

(GITHUB)

🪛 zizmor (1.25.2)
.github/workflows/labeler.yml

[error] 45-45: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

🔇 Additional comments (2)
.github/labeler.yml (1)

1-139: LGTM!

CONTRIBUTING.md (1)

378-380: LGTM!

Comment on lines +1 to +14
# Copyright (c) 2024-2026, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add SPDX headers required for .yml source files.

This workflow file is missing SPDX header lines in the required format. Smallest fix: add SPDX copyright and license identifiers at the top.

Suggested patch
+# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
 # Copyright (c) 2024-2026, NVIDIA CORPORATION.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");

As per coding guidelines: "All source files (.py, .sh, .yaml, .yml, .md) must include SPDX copyright headers."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Copyright (c) 2024-2026, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Copyright (c) 2024-2026, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@binaryaaron Other workflows/*yml files don't have the SPDX header and make format && make check passes. So is this not relevant? Or do we need to update our copyright header check?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

name: Apply area labels
runs-on: ubuntu-latest
steps:
- uses: actions/labeler@v6
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify all workflow actions are pinned to commit SHAs (40 hex chars).
rg -nP '^\s*-\s*uses:\s*[^@]+@(?!(?:[a-f0-9]{40})$).+$' .github/workflows

Repository: NVIDIA-NeMo/Safe-Synthesizer

Length of output: 430


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== First 30 lines: .github/workflows/labeler.yml ==="
sed -n '1,30p' .github/workflows/labeler.yml

echo
echo "=== SPDX occurrences in .github/workflows/*.yml ==="
rg -n "SPDX|SPDX-License-Identifier" .github/workflows -S || true

echo
echo "=== First 30 lines: another workflow for comparison (.github/workflows/release.yml) ==="
sed -n '1,30p' .github/workflows/release.yml

Repository: NVIDIA-NeMo/Safe-Synthesizer

Length of output: 2860


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Ensure actions/labeler is pinned to a 40-hex commit SHA
rg -nP '^\s*-\s*uses:\s*actions/labeler@(?!(?:[a-f0-9]{40})$)[^ ]+' .github/workflows/labeler.yml || true

# 2) Ensure SPDX identifier exists
rg -n "SPDX-License-Identifier" .github/workflows/labeler.yml || true

Repository: NVIDIA-NeMo/Safe-Synthesizer

Length of output: 108


Pin actions/labeler to a full commit SHA in .github/workflows/labeler.yml (line 45).
actions/labeler@v6 is a mutable tag, creating supply-chain drift risk in a pull_request_target workflow.

Add an SPDX-License-Identifier header to .github/workflows/labeler.yml.
The file has the Apache license block but no SPDX-License-Identifier line.

Suggested patch
-      - uses: actions/labeler@v6
+      - uses: actions/labeler@<full-commit-sha-for-v6>
🧰 Tools
🪛 zizmor (1.25.2)

[error] 45-45: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

Comment thread CONTRIBUTING.md
Comment on lines +378 to +380
Where in the codebase the work lands. Optional on issues.

On PRs, `area:*` labels are auto-applied based on the files changed, using the rules in [`.github/labeler.yml`](.github/labeler.yml). Labels are additive only -- maintainers can add, remove, or override them manually, and the action will not undo manual changes on subsequent pushes. Update the labeler rules in the same PR that introduces new modules or area labels.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add SPDX HTML headers to this Markdown file.

This changed .md file currently has no SPDX header comments, which can fail repository compliance checks. Smallest fix: add the two required HTML comments at the top of CONTRIBUTING.md.

Suggested patch
+<!-- SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -->
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
 # Contributing to NeMo Safe Synthesizer

As per coding guidelines: "All source files (.py, .sh, .yaml, .yml, .md) must include SPDX copyright headers."

🧰 Tools
🪛 LanguageTool

[uncategorized] ~380-~380: The official name of this software platform is spelled with a capital “H”.
Context: ...n the files changed, using the rules in .github/labeler.yml. Lab...

(GITHUB)

@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore Maintenance not tied to a user-visible change docs Documentation-only change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant