Skip to content

Package-versions existence check produces false positives when CodeArtifact has a local-version build #523

@sethfitz

Description

Summary

reusable-check-python-package-versions.yaml uses pip download "pkg==X" to test whether version X is already published in CodeArtifact. PEP 440's == operator strips local-version labels from candidate versions during matching, so a published build with a local segment (e.g. 0.1.1.dev1+spark.codegen) satisfies the specifier ==0.1.1.dev1. The workflow then concludes the version "already exists" and fails the PR, even though the bare 0.1.1.dev1 is genuinely unpublished and would not collide on upload.

Observed failure

  • packages/overture-schema-system/src/overture/schema/system/__about__.py declares __version__ = "0.1.1.dev1".

  • CodeArtifact contains overture-schema-system 0.1.1.dev1+spark.codegen (published from the spark-codegen branch).

  • The "Fail if any of the new versions already exist in the repo" step runs:

    pip download "overture-schema-system==0.1.1.dev1" --index-url $INDEX_URL --no-deps ...

    pip finds 0.1.1.dev1+spark.codegen, strips the local segment, treats it as 0.1.1.dev1, exits 0. The shell guard interprets the success as "version exists" and fails the workflow.

Root cause

PEP 440 specifier matching, confirmed with the packaging library:

spec: ==0.1.1.dev1
  '0.1.1.dev1'                -> True
  '0.1.1.dev1+spark.codegen'  -> True   <-- collision
  '0.1.1.dev3+spark.codegen'  -> False

spec: ===0.1.1.dev1
  '0.1.1.dev1'                -> True
  '0.1.1.dev1+spark.codegen'  -> False
  '0.1.1.dev3+spark.codegen'  -> False

== is documented to ignore local-version labels on candidates when the specifier itself carries none. The check needs literal-string equality against the version we are about to publish, not PEP 440 equality.

Recommended actions

1. Switch the existence check to === (primary fix)

.github/workflows/reusable-check-python-package-versions.yaml:158:

-output=\$(uv run pip download "\${package}==\${after}" ...)
+output=\$(uv run pip download "\${package}===\${after}" ...)

=== is PEP 440 arbitrary equality (literal-string match). pip supports it, emits the same "Could not find a version" message on miss, and correctly distinguishes 0.1.1.dev1 from 0.1.1.dev1+spark.codegen. metadata.version() returns the canonical PEP 440 form, so a literal compare against the registry is sound.

2. Fix plural typo in the error-message guard

.github/workflows/reusable-check-python-package-versions.yaml:161:

-  "\${output,,}" != *"no matching distributions"*
+  "\${output,,}" != *"no matching distribution"*

pip emits "No matching distribution" (singular). The plural form never matches today; the workflow only passes the miss case because the sibling "could not find a version" check happens to fire first. A future pip wording change to either substring would silently break the guard.

3. Skip entries where after is null

When a package is removed between commits, compare() emits {"package": "X", "before": "...", "after": null} and the shell loop runs pip download "X===null". pip rejects the spec and prints "could not find a version", so the loop accidentally passes -- for the wrong reason. Guard the loop so deletions are not treated as new-version publications:

+ if [[ "\$after" == "null" ]]; then
+   echo "Package \${package} was removed; skipping existence check."
+   continue
+ fi
  exit_code=0
  output=\$(uv run pip download ...)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions