diff --git a/README.md b/README.md index a7edabae..40d53baa 100644 --- a/README.md +++ b/README.md @@ -47,6 +47,7 @@ correctness of the action. - [ASF Infrastructure Pelican Action](/pelican/README.md): Generate and publish project websites with GitHub Actions - [Stash Action](/stash/README.md): Manage large build caches + - [ASF Allowlist Check](/allowlist-check/README.md): Verify workflow action refs are on the ASF allowlist ## Management of Organization-wide GitHub Actions Allow List diff --git a/allowlist-check/README.md b/allowlist-check/README.md new file mode 100644 index 00000000..62e40458 --- /dev/null +++ b/allowlist-check/README.md @@ -0,0 +1,115 @@ + + +# ASF Allowlist Check + +A composite GitHub Action that verifies all `uses:` refs in a project's workflow files are on the ASF Infrastructure [approved allowlist](../approved_patterns.yml). Catches violations **before merge**, preventing the silent CI failures that occur when an action is not on the org-level allowlist (see [#574](https://github.com/apache/infrastructure-actions/issues/574)). + +## Why + +When a GitHub Actions workflow references an action that isn't on the ASF org-level allowlist, the CI job silently fails with "Startup failure" — no logs, no notifications, and the PR may appear green because no checks ran. This action catches those problems at PR time with a clear error message. + +## Usage + +Add a workflow file to your project (e.g., `.github/workflows/asf-allowlist-check.yml`): + +```yaml +name: "ASF Allowlist Check" + +on: + pull_request: + paths: + - ".github/**" + push: + branches: + - main + paths: + - ".github/**" + +permissions: + contents: read + +jobs: + asf-allowlist-check: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + persist-credentials: false + - uses: apache/infrastructure-actions/allowlist-check@main +``` + +That's it — two steps. The `actions/checkout` step checks out your repo so `.github/` is available to scan, then the allowlist check runs against those files. + +## Inputs + +| Input | Required | Default | Description | +|---|---|---|---| +| `scan-glob` | No | `.github/**/*.yml` | Glob pattern for YAML files to scan for action refs | + +### Custom scan glob + +To scan only workflow files (excluding other YAML under `.github/`): + +```yaml +- uses: apache/infrastructure-actions/allowlist-check@main + with: + scan-glob: ".github/workflows/*.yml" +``` + +## What it checks + +The action scans all matching YAML files for `uses:` keys and validates each action ref against the [approved_patterns.yml](../approved_patterns.yml) allowlist. + +### Automatically allowed + +Actions from these GitHub organizations are implicitly trusted and don't need to be in the allowlist: +- `actions/*` — GitHub's official actions +- `github/*` — GitHub's own actions +- `apache/*` — ASF's own actions + +### Skipped + +- **Local refs** (`./`) — paths within the same repo are not subject to the org allowlist +- **Docker refs** (`docker://`) — container actions pulled directly from a registry +- **Empty YAML files** — skipped +- **Malformed YAML files** — fails with an error + +### Violation output + +When violations are found, the action fails with exit code 1 and prints: + +``` +::error::Found 2 action ref(s) not on the ASF allowlist: +::error file=.github/workflows/ci.yml::some-org/some-action@v1 is not on the ASF allowlist +::error file=.github/workflows/release.yml::other-org/other-action@abc123 is not on the ASF allowlist +``` + +To resolve a violation, open a PR in this repo to [add the action](../README.md#adding-a-new-action-to-the-allow-list) or [add a new version](../README.md#adding-a-new-version-to-the-allow-list) to the allowlist. + +When all refs pass: + +``` +All 15 unique action refs are on the ASF allowlist +``` + +## Dependencies + +- Python 3 (pre-installed on GitHub-hosted runners) +- ruyaml (installed automatically by the action) diff --git a/allowlist-check/action.yml b/allowlist-check/action.yml new file mode 100644 index 00000000..2a6213ac --- /dev/null +++ b/allowlist-check/action.yml @@ -0,0 +1,41 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +name: "ASF Allowlist Check" +description: > + Verify that all GitHub Actions uses: refs in the caller's workflow files + are on the ASF Infrastructure approved allowlist. Fails with a clear error + listing any refs that are not allowlisted. +author: kevinjqliu + +inputs: + scan-glob: + description: "Glob pattern for YAML files to scan for action refs" + required: false + default: ".github/**/*.yml" + +runs: + using: composite + steps: + - name: Install ruyaml + shell: bash + run: pip install ruyaml + - name: Verify all action refs are allowlisted + shell: bash + run: python3 "${{ github.action_path }}/check_asf_allowlist.py" "${{ github.action_path }}/../approved_patterns.yml" + env: + GITHUB_YAML_GLOB: ${{ inputs.scan-glob }} diff --git a/allowlist-check/check_asf_allowlist.py b/allowlist-check/check_asf_allowlist.py new file mode 100644 index 00000000..f0c02c9a --- /dev/null +++ b/allowlist-check/check_asf_allowlist.py @@ -0,0 +1,184 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Check that all GitHub Actions uses: refs are on the ASF allowlist. + +Usage: + python3 check_asf_allowlist.py + +The allowlist is the approved_patterns.yml file colocated at the root of +this repository (../approved_patterns.yml relative to this script). + +The glob pattern for YAML files to scan can be overridden via the +GITHUB_YAML_GLOB environment variable (default: .github/**/*.yml). + +Exits with code 1 if any action ref is not allowlisted. +""" + +import fnmatch +import glob +import os +import sys +from typing import Any, Generator + +import ruyaml + +# actions/*, github/*, apache/* are implicitly trusted by GitHub/ASF +# See ../README.md ("Management of Organization-wide GitHub Actions Allow List") +TRUSTED_OWNERS = {"actions", "github", "apache"} + +# Default glob pattern for YAML files to scan for action refs +DEFAULT_GITHUB_YAML_GLOB = ".github/**/*.yml" + +# Prefixes that indicate local or non-GitHub refs (not subject to allowlist) +# ./ — local composite actions within the same repo +# docker:// — container actions pulled directly from a registry +SKIPPED_PREFIXES = ("./", "docker://") + +# YAML key that references a GitHub Action +USES_KEY = "uses" + + +def find_action_refs(node: Any) -> Generator[str, None, None]: + """Recursively find all `uses:` values from a parsed YAML tree. + + Args: + node: A parsed YAML node (any type returned by ruyaml) + + Yields: + str: Each `uses:` string value found in the tree + """ + if isinstance(node, dict): + for key, value in node.items(): + if key == USES_KEY and isinstance(value, str): + yield value + else: + yield from find_action_refs(value) + elif isinstance(node, list): + for item in node: + yield from find_action_refs(item) + + +def collect_action_refs( + scan_glob: str = DEFAULT_GITHUB_YAML_GLOB, +) -> dict[str, list[str]]: + """Collect all third-party action refs from YAML files. + + Skips local (./) and Docker (docker://) refs, as these are not + subject to the org-level allowlist. + + Args: + scan_glob: Glob pattern for files to scan. + + Returns: + dict: Mapping of each action ref to the list of file paths that use it. + """ + + action_refs = {} + for filepath in sorted(glob.glob(scan_glob, recursive=True)): + try: + yaml = ruyaml.YAML() + with open(filepath) as f: + content = yaml.load(f) + except ruyaml.YAMLError as exc: + print(f"::error file={filepath}::Failed to parse YAML: {exc}") + sys.exit(1) + if not content: + continue + for ref in find_action_refs(content): + if ref.startswith(SKIPPED_PREFIXES): + continue + action_refs.setdefault(ref, []).append(filepath) + return action_refs + + +def load_allowlist(allowlist_path: str) -> list[str]: + """Load the ASF approved_patterns.yml file. + + The file is a flat YAML list of entries like: + - owner/action@ (exact SHA match) + - owner/action@* (any ref allowed) + - golangci/*@* (any repo under owner, any ref) + + Python's fnmatch.fnmatch matches "/" with "*" (unlike shell globs), + so these patterns work directly without transformation. + + Args: + allowlist_path: Path to the approved_patterns.yml file + + Returns: + list[str]: List of allowlist patterns (empty list if file is empty) + """ + yaml = ruyaml.YAML() + with open(allowlist_path) as f: + result = yaml.load(f) + return result if result else [] + + +def is_allowed(action_ref: str, allowlist: list[str]) -> bool: + """Check whether a single action ref is allowed. + + An action ref is allowed if its owner is in TRUSTED_OWNERS or it + matches any pattern in the allowlist via fnmatch. + + Args: + action_ref: The action reference string (e.g., "owner/action@ref") + allowlist: List of allowlist patterns to match against + + Returns: + bool: True if the action ref is allowed + """ + owner = action_ref.split("/")[0] + if owner in TRUSTED_OWNERS: + return True + return any(fnmatch.fnmatch(action_ref, pattern) for pattern in allowlist) + + +def main(): + if len(sys.argv) != 2: + print(f"Usage: {sys.argv[0]} ", file=sys.stderr) + sys.exit(2) + + allowlist_path = sys.argv[1] + allowlist = load_allowlist(allowlist_path) + scan_glob = os.environ.get("GITHUB_YAML_GLOB", DEFAULT_GITHUB_YAML_GLOB) + action_refs = collect_action_refs(scan_glob) + + violations = [] + for action_ref, filepaths in sorted(action_refs.items()): + if not is_allowed(action_ref, allowlist): + for filepath in filepaths: + violations.append((filepath, action_ref)) + + if violations: + print( + f"::error::Found {len(violations)} action ref(s) not on the ASF allowlist:" + ) + for filepath, action_ref in violations: + print(f"::error file={filepath}::{action_ref} is not on the ASF allowlist") + print( + "::error::To resolve, open a PR in apache/infrastructure-actions to add" + " the action or version to the allowlist:" + " https://github.com/apache/infrastructure-actions#adding-a-new-action-to-the-allow-list" + ) + sys.exit(1) + else: + print(f"All {len(action_refs)} unique action refs are on the ASF allowlist") + + +if __name__ == "__main__": + main() diff --git a/allowlist-check/test_check_asf_allowlist.py b/allowlist-check/test_check_asf_allowlist.py new file mode 100644 index 00000000..03754133 --- /dev/null +++ b/allowlist-check/test_check_asf_allowlist.py @@ -0,0 +1,307 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import os +import shutil +import tempfile +import textwrap +import unittest + +from check_asf_allowlist import ( + collect_action_refs, + find_action_refs, + is_allowed, + load_allowlist, +) + + +class TestFindActionRefs(unittest.TestCase): + """Tests for recursive uses: extraction from parsed YAML trees.""" + + def test_simple_step(self): + tree = {"jobs": {"build": {"steps": [{"uses": "actions/checkout@v4"}]}}} + self.assertEqual(list(find_action_refs(tree)), ["actions/checkout@v4"]) + + def test_multiple_steps(self): + tree = { + "jobs": { + "build": { + "steps": [ + {"uses": "actions/checkout@v4"}, + {"run": "echo hello"}, + {"uses": "actions/setup-python@v5"}, + ] + } + } + } + refs = list(find_action_refs(tree)) + self.assertEqual(refs, ["actions/checkout@v4", "actions/setup-python@v5"]) + + def test_multiple_jobs(self): + tree = { + "jobs": { + "build": {"steps": [{"uses": "actions/checkout@v4"}]}, + "test": {"steps": [{"uses": "actions/setup-java@v4"}]}, + } + } + refs = list(find_action_refs(tree)) + self.assertIn("actions/checkout@v4", refs) + self.assertIn("actions/setup-java@v4", refs) + + def test_no_uses(self): + tree = {"jobs": {"build": {"steps": [{"run": "echo hello"}]}}} + self.assertEqual(list(find_action_refs(tree)), []) + + def test_empty_tree(self): + self.assertEqual(list(find_action_refs({})), []) + self.assertEqual(list(find_action_refs([])), []) + self.assertEqual(list(find_action_refs(None)), []) + + def test_reusable_workflow(self): + tree = { + "jobs": { + "call-workflow": { + "uses": "org/repo/.github/workflows/reusable.yml@main" + } + } + } + refs = list(find_action_refs(tree)) + self.assertEqual(refs, ["org/repo/.github/workflows/reusable.yml@main"]) + + def test_deeply_nested(self): + tree = {"a": {"b": {"c": {"d": [{"uses": "deep/action@v1"}]}}}} + self.assertEqual(list(find_action_refs(tree)), ["deep/action@v1"]) + + def test_uses_non_string_ignored(self): + """uses: with a non-string value (e.g., int) should be ignored.""" + tree = {"jobs": {"build": {"steps": [{"uses": 42}]}}} + self.assertEqual(list(find_action_refs(tree)), []) + + +class TestIsAllowed(unittest.TestCase): + """Tests for allowlist matching logic.""" + + def setUp(self): + self.allowlist = [ + "astral-sh/setup-uv@681c641aba71e4a1c380be3ab5e12ad51f415867", + "codecov/codecov-action@*", + "golangci/*@*", + ] + + def test_trusted_owner_actions(self): + self.assertTrue(is_allowed("actions/checkout@v4", self.allowlist)) + + def test_trusted_owner_github(self): + self.assertTrue(is_allowed("github/codeql-action/init@v3", self.allowlist)) + + def test_trusted_owner_apache(self): + self.assertTrue( + is_allowed("apache/infrastructure-actions/stash@main", self.allowlist) + ) + + def test_exact_sha_match(self): + self.assertTrue( + is_allowed( + "astral-sh/setup-uv@681c641aba71e4a1c380be3ab5e12ad51f415867", + self.allowlist, + ) + ) + + def test_exact_sha_no_match(self): + self.assertFalse( + is_allowed( + "astral-sh/setup-uv@0000000000000000000000000000000000000000", + self.allowlist, + ) + ) + + def test_wildcard_ref(self): + self.assertTrue( + is_allowed("codecov/codecov-action@v4", self.allowlist) + ) + self.assertTrue( + is_allowed( + "codecov/codecov-action@abc123def456", + self.allowlist, + ) + ) + + def test_wildcard_repo_and_ref(self): + self.assertTrue( + is_allowed("golangci/golangci-lint-action@abc123", self.allowlist) + ) + self.assertTrue( + is_allowed("golangci/some-other-action@v1", self.allowlist) + ) + + def test_not_allowed(self): + self.assertFalse( + is_allowed("evil-org/evil-action@v1", self.allowlist) + ) + + def test_empty_allowlist(self): + self.assertFalse(is_allowed("some/action@v1", [])) + + def test_owner_only_no_slash(self): + """An action ref that is just an owner name (edge case) should still work.""" + self.assertFalse(is_allowed("random", self.allowlist)) + + +class TestLoadAllowlist(unittest.TestCase): + """Tests for loading allowlist from a YAML file.""" + + def test_load_valid_file(self): + with tempfile.NamedTemporaryFile(mode="w", suffix=".yml", delete=False) as f: + f.write("- owner/action@abc123\n- other/action@*\n") + f.flush() + result = load_allowlist(f.name) + os.unlink(f.name) + self.assertEqual(result, ["owner/action@abc123", "other/action@*"]) + + def test_load_empty_file(self): + with tempfile.NamedTemporaryFile(mode="w", suffix=".yml", delete=False) as f: + f.write("") + f.flush() + result = load_allowlist(f.name) + os.unlink(f.name) + self.assertEqual(result, []) + + +class TestCollectActionRefs(unittest.TestCase): + """Tests for collecting action refs from workflow files.""" + + def setUp(self): + self.tmpdir = tempfile.mkdtemp() + self.github_dir = os.path.join(self.tmpdir, ".github", "workflows") + os.makedirs(self.github_dir) + + def tearDown(self): + shutil.rmtree(self.tmpdir) + + def _write_workflow(self, filename, content): + filepath = os.path.join(self.github_dir, filename) + with open(filepath, "w") as f: + f.write(textwrap.dedent(content)) + return filepath + + def test_collects_refs(self): + self._write_workflow( + "ci.yml", + """\ + name: CI + on: push + jobs: + build: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: codecov/codecov-action@v4 + """, + ) + scan_glob = os.path.join(self.tmpdir, ".github/**/*.yml") + refs = collect_action_refs(scan_glob) + self.assertIn("actions/checkout@v4", refs) + self.assertIn("codecov/codecov-action@v4", refs) + + def test_skips_local_refs(self): + self._write_workflow( + "ci.yml", + """\ + name: CI + on: push + jobs: + build: + runs-on: ubuntu-latest + steps: + - uses: ./local-action + - uses: actions/checkout@v4 + """, + ) + scan_glob = os.path.join(self.tmpdir, ".github/**/*.yml") + refs = collect_action_refs(scan_glob) + self.assertNotIn("./local-action", refs) + self.assertIn("actions/checkout@v4", refs) + + def test_skips_docker_refs(self): + self._write_workflow( + "ci.yml", + """\ + name: CI + on: push + jobs: + build: + runs-on: ubuntu-latest + steps: + - uses: docker://alpine:3.18 + - uses: actions/checkout@v4 + """, + ) + scan_glob = os.path.join(self.tmpdir, ".github/**/*.yml") + refs = collect_action_refs(scan_glob) + self.assertNotIn("docker://alpine:3.18", refs) + self.assertIn("actions/checkout@v4", refs) + + def test_empty_yaml(self): + self._write_workflow("empty.yml", "") + scan_glob = os.path.join(self.tmpdir, ".github/**/*.yml") + refs = collect_action_refs(scan_glob) + self.assertEqual(refs, {}) + + def test_invalid_yaml_errors(self): + self._write_workflow("bad.yml", ":\n - :\n invalid: [") + scan_glob = os.path.join(self.tmpdir, ".github/**/*.yml") + with self.assertRaises(SystemExit): + collect_action_refs(scan_glob) + + def test_tracks_multiple_files(self): + self._write_workflow( + "ci.yml", + """\ + name: CI + on: push + jobs: + build: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + """, + ) + self._write_workflow( + "release.yml", + """\ + name: Release + on: push + jobs: + release: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + """, + ) + scan_glob = os.path.join(self.tmpdir, ".github/**/*.yml") + refs = collect_action_refs(scan_glob) + self.assertEqual(len(refs["actions/checkout@v4"]), 2) + + def test_no_matching_files(self): + scan_glob = os.path.join(self.tmpdir, ".github/**/*.yml") + # no files written — github_dir exists but is empty + refs = collect_action_refs(scan_glob) + self.assertEqual(refs, {}) + + +if __name__ == "__main__": + unittest.main()