Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
docs/case-studies/issue-31/*.log -diff
48 changes: 0 additions & 48 deletions .github/workflows/python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -244,30 +244,6 @@ jobs:
verbose: true
skip-existing: true

- name: Diagnose PyPI publish failure
if: failure() && steps.pypi_publish.outcome == 'failure'
run: |
echo "::error title=PyPI publish failed::pypa/gh-action-pypi-publish exited with an error."
echo ""
echo "Most common causes & fixes:"
echo " 1. 'Trusted publisher … not configured' / 'invalid-publisher':"
echo " PyPI does not yet trust this repository+workflow. Configure it at:"
echo " https://pypi.org/manage/project/${{ steps.version_check.outputs.package_name }}/settings/publishing/"
echo " Match: owner=${{ github.repository_owner }}, repo=$(basename ${{ github.repository }}),"
echo " workflow=python.yml, environment=(blank unless you set one)."
echo ""
echo " 2. Token-based auth in use but PYPI_API_TOKEN missing/expired:"
echo " Generate at https://pypi.org/manage/account/token/ and store as a repo secret."
echo ""
echo " 3. id-token: write missing on the job (this workflow already has it)."
echo ""
echo " 4. Version already published: skip-existing is true so PyPI conflicts are tolerated;"
echo " a hard failure here means something else is wrong."
echo ""
echo "Docs: https://docs.pypi.org/trusted-publishers/"
echo " docs/case-studies/issue-29/README.md"
exit 1

- name: Verify package on PyPI
if: steps.version_check.outputs.should_release == 'true' && steps.pypi_publish.outcome == 'success'
run: |
Expand Down Expand Up @@ -378,30 +354,6 @@ jobs:
verbose: true
skip-existing: true

- name: Diagnose PyPI publish failure
if: failure() && steps.pypi_publish.outcome == 'failure'
run: |
echo "::error title=PyPI publish failed::pypa/gh-action-pypi-publish exited with an error."
echo ""
echo "Most common causes & fixes:"
echo " 1. 'Trusted publisher … not configured' / 'invalid-publisher':"
echo " PyPI does not yet trust this repository+workflow. Configure it at:"
echo " https://pypi.org/manage/project/${{ steps.pkg.outputs.package_name }}/settings/publishing/"
echo " Match: owner=${{ github.repository_owner }}, repo=$(basename ${{ github.repository }}),"
echo " workflow=python.yml, environment=(blank unless you set one)."
echo ""
echo " 2. Token-based auth in use but PYPI_API_TOKEN missing/expired:"
echo " Generate at https://pypi.org/manage/account/token/ and store as a repo secret."
echo ""
echo " 3. id-token: write missing on the job (this workflow already has it)."
echo ""
echo " 4. Version already published: skip-existing is true so PyPI conflicts are tolerated;"
echo " a hard failure here means something else is wrong."
echo ""
echo "Docs: https://docs.pypi.org/trusted-publishers/"
echo " docs/case-studies/issue-29/README.md"
exit 1

- name: Verify package on PyPI
if: (steps.version.outputs.version_committed == 'true' || steps.version.outputs.already_released == 'true') && steps.pypi_publish.outcome == 'success'
run: |
Expand Down
43 changes: 15 additions & 28 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,10 @@ jobs:
working-directory: ./rust
run: node scripts/check-file-size.mjs

- name: Run CI script tests
working-directory: ./rust
run: node --test scripts/*.test.mjs

# === TEST ===
# Test runs independently of changelog check
test:
Expand Down Expand Up @@ -284,53 +288,36 @@ jobs:
working-directory: ./rust
run: node scripts/get-bump-type.mjs

- name: Check if version already released
id: version_check
- name: Check if release is needed
id: release_check
working-directory: ./rust
run: |
# Get current version from Cargo.toml
CURRENT_VERSION=$(grep -Po '(?<=^version = ")[^"]*' Cargo.toml)
PACKAGE_NAME=$(grep -Po '(?<=^name = ")[^"]*' Cargo.toml | head -1)
echo "current_version=$CURRENT_VERSION" >> $GITHUB_OUTPUT
echo "package_name=$PACKAGE_NAME" >> $GITHUB_OUTPUT

# Decide based on the registry, not on the local git tag.
# A 200 from crates.io means the package@version is already published; otherwise we should
# publish. This makes the pipeline self-healing if a previous publish was silently skipped
# (see docs/case-studies/issue-25/README.md).
STATUS=$(curl -sS -o /dev/null -w '%{http_code}' "https://crates.io/api/v1/crates/${PACKAGE_NAME}/${CURRENT_VERSION}")
echo "crates.io HTTP status for ${PACKAGE_NAME}@${CURRENT_VERSION}: ${STATUS}"
if [ "$STATUS" = "200" ]; then
echo "Version ${PACKAGE_NAME}@${CURRENT_VERSION} already on crates.io, skipping publish"
echo "should_release=false" >> $GITHUB_OUTPUT
else
echo "Version ${PACKAGE_NAME}@${CURRENT_VERSION} is NOT on crates.io, will publish"
echo "should_release=true" >> $GITHUB_OUTPUT
fi
env:
HAS_FRAGMENTS: ${{ steps.bump_type.outputs.has_fragments }}
run: node scripts/check-release-needed.mjs

- name: Collect changelog and bump version
id: version
if: steps.version_check.outputs.should_release == 'true' && steps.bump_type.outputs.has_fragments == 'true'
if: steps.release_check.outputs.should_release == 'true' && steps.release_check.outputs.skip_bump != 'true'
working-directory: ./rust
run: |
node scripts/version-and-commit.mjs \
--bump-type "${{ steps.bump_type.outputs.bump_type }}"

- name: Get current version
id: current_version
if: steps.version_check.outputs.should_release == 'true'
if: steps.release_check.outputs.should_release == 'true'
working-directory: ./rust
run: |
VERSION=$(grep -Po '(?<=^version = ")[^"]*' Cargo.toml)
echo "version=$VERSION" >> $GITHUB_OUTPUT

- name: Build release
if: steps.version_check.outputs.should_release == 'true'
if: steps.release_check.outputs.should_release == 'true'
working-directory: ./rust
run: cargo build --release

- name: Publish to crates.io
if: steps.version_check.outputs.should_release == 'true'
if: steps.release_check.outputs.should_release == 'true'
working-directory: ./rust
env:
CARGO_REGISTRY_TOKEN: ${{ secrets.CARGO_REGISTRY_TOKEN || secrets.CARGO_TOKEN }}
Expand All @@ -351,7 +338,7 @@ jobs:
exit 1
fi
# `cargo publish` exits non-zero on retry if the version already exists; we tolerate that
# because the registry probe in version_check already proved the version is missing, so a
# because the registry probe in release_check already proved the version is missing, so a
# late "already exists" error means a parallel run won the race -- treat as success.
if ! OUT=$(cargo publish 2>&1); then
echo "$OUT"
Expand All @@ -376,7 +363,7 @@ jobs:
echo "$OUT"

- name: Create GitHub Release
if: steps.version_check.outputs.should_release == 'true'
if: steps.release_check.outputs.should_release == 'true'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
working-directory: ./rust
Expand Down
78 changes: 78 additions & 0 deletions docs/case-studies/issue-31/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Issue 31 Case Study: CI/CD Release Failures

Issue: https://github.com/link-foundation/lino-objects-codec/issues/31

PR: https://github.com/link-foundation/lino-objects-codec/pull/32

## Preserved Evidence

The failing and suspicious runs were downloaded before changes were made:

| Language | Run | Created | Conclusion | Evidence |
| ---------- | ----------- | -------------------- | ---------- | ----------------------------------------------------------- |
| Rust | 25286485825 | 2026-05-03T17:55:55Z | success | `run-25286485825-rust.json`, `run-25286485825-rust.log` |
| JavaScript | 25286485840 | 2026-05-03T17:55:55Z | failure | `run-25286485840-js.json`, `run-25286485840-js.log` |
| Python | 25286485829 | 2026-05-03T17:55:55Z | failure | `run-25286485829-python.json`, `run-25286485829-python.log` |

Copies of the raw logs were also saved under `ci-logs/` at the repository root during the investigation.

## Findings

### Rust

The Rust workflow had two false-success paths.

First, the release probe treated every non-200 crates.io response as "not published". In run `25286485825`, crates.io returned HTTP 403 for `lino-objects-codec@0.2.0` at `run-25286485825-rust.log:2853`. The workflow then tried to publish a version that already existed, and `cargo publish` reported `crate lino-objects-codec@0.2.0 already exists` at `run-25286485825-rust.log:2935`. That was converted to a warning at `run-25286485825-rust.log:2936`.

Second, GitHub release creation did not inspect the `gh api` exit code. The log shows `Validation Failed (HTTP 422)` and `already_exists` at `run-25286485825-rust.log:2951`, followed immediately by a success message at `run-25286485825-rust.log:2952`.

The Rust workflow also referenced `steps.bump_type.outputs.has_fragments`, but `rust/scripts/get-bump-type.mjs` never emitted that output. That made automatic releases unable to distinguish "new changelog fragments need a bump" from "current version should be republished because it is missing from crates.io".

### JavaScript

The JavaScript release failed because npm setup silently continued after npm failed to upgrade. In run `25286485840`, npm started at `10.9.7` (`run-25286485840-js.log:4091`), `npm install -g` failed with `Cannot find module 'promise-retry'` (`run-25286485840-js.log:4092-4093`), and setup still printed `Updated npm version: 10.9.7` (`run-25286485840-js.log:4108`).

The publish step then failed with npm E404/access symptoms and Changesets' `packages failed to publish` marker (`run-25286485840-js.log:4187`, `run-25286485840-js.log:4226`). The existing publish wrapper correctly exhausted retries and failed the workflow at `run-25286485840-js.log:4396`, but its operator guidance did not cover E404 PUT responses that usually mean package access or trusted-publisher configuration needs attention.

npm's trusted publishing documentation currently requires npm CLI 11.5.1 or later and Node.js 22.14.0 or later:

https://docs.npmjs.com/trusted-publishers

### Python

The PyPI publish failure was a trusted-publisher configuration problem. The `pypa/gh-action-pypi-publish` action already emitted the useful root cause: `invalid-publisher` at `run-25286485829-python.log:1714`, plus PyPI's troubleshooting URL at `run-25286485829-python.log:1734`.

The workflow then ran a separate diagnostic step starting at `run-25286485829-python.log:1737`. That duplicated the action's own troubleshooting output and created a second red step, which is exactly what issue 31 asked to remove.

PyPI documents `invalid-publisher` as an OIDC claim mismatch or missing trusted-publisher configuration:

https://docs.pypi.org/trusted-publishers/troubleshooting/

## Template Comparison

The Rust template already has the right shape: `get-bump-type` emits `has_fragments`, and `check-release-needed` makes the release decision from registry state plus changelog-fragment state.

The Python template does not add a separate diagnostic-only step after `pypa/gh-action-pypi-publish`; it relies on the publishing action's own error output.

The JavaScript template has the same npm setup class of issue. The current template documents the `promise-retry` runner image failure, but its fallback can still install npm 11.4.2 and its success logic accepts any npm 11.x. I filed the upstream template issue here:

https://github.com/link-foundation/js-ai-driven-development-pipeline-template/issues/48

## Changes Made

- Added `rust/scripts/check-release-needed.mjs` and `rust/scripts/crates-release-helpers.mjs`.
- Added `rust/scripts/crates-release-helpers.test.mjs` and `experiments/issue-31/test-rust-release-helpers.mjs`.
- Made `rust/scripts/get-bump-type.mjs` emit `fragment_count` and `has_fragments`.
- Updated the Rust release workflow to fail on ambiguous crates.io probe responses and to skip the version bump only when the current Cargo.toml version is missing and no changelog fragments exist.
- Made `rust/scripts/create-github-release.mjs` check `gh api` output and treat an already-existing release as idempotent instead of printing a false success.
- Hardened `js/scripts/setup-npm.mjs` so it enforces Node.js >= 22.14.0 and npm >= 11.5.1, dynamically selects a supported npm 11 tarball fallback, and exits if the final npm version is still too old.
- Extended npm publish analysis so E404 PUT failures print trusted-publisher and package-access guidance.
- Removed the separate Python PyPI diagnostic steps.

## Remaining Operator Configuration

The code can make the workflows honest, but registry settings still need to match the repository:

- npm: configure the `lino-objects-codec` package trusted publisher for this repository and `.github/workflows/js.yml`, or fix package access if the package is not owned by the publishing account.
- PyPI: configure the project trusted publisher for owner `link-foundation`, repository `lino-objects-codec`, workflow `python.yml`, and the configured environment value.
- crates.io: keep `CARGO_REGISTRY_TOKEN` or `CARGO_TOKEN` valid with publish/update permissions for the crate.
Loading
Loading