[TEST] CI: cross-repo PR matrix vs dandi-archive#2784#1851
Draft
yarikoptic wants to merge 16 commits intomasterfrom
Draft
[TEST] CI: cross-repo PR matrix vs dandi-archive#2784#1851yarikoptic wants to merge 16 commits intomasterfrom
yarikoptic wants to merge 16 commits intomasterfrom
Conversation
…_upload->singlepart_items The split of a batch into "items above the 5GB threshold" and "items at or below" is really a split between items routed through the multipart upload endpoint and items uploaded with a single signed PUT. Naming them after the upload mechanism rather than the size makes the subsequent capability/gating logic read more naturally. Pure rename, no behavior change. Co-Authored-By: Claude Code 2.1.123 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dandi-archive#2784 (still open) introduces the ``/uploads/zarr/initialize/`` endpoint required to upload zarr chunks larger than 5 GiB. Until that lands and is deployed, posting to the endpoint returns 404 and the multi-part code path here would surface that as an opaque ``HTTP404Error`` from inside ``_multipart_upload``. Add a lazy ``DandiAPIClient.supports_zarr_multipart_upload`` cached property that probes the endpoint once with an empty POST body: 404 means the server lacks the feature, anything else (400 for the bogus payload, 401/403 for auth) means it is present. In ``ZarrAsset.iter_upload``, right after the batch is split into ``multipart_items`` and ``singlepart_items``, raise an ``UploadError`` when there are multipart items and the server does not support them. The error reports the count and aggregate size of the oversized chunks plus the path of the largest one, mirroring the wording proposed in #1827. Marked with a TODO: the check (and the cached property) can be removed once all supported servers ship dandi-archive#2784 (> 0.23.0). Co-Authored-By: Claude Code 2.1.123 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…umulative ``_multipart_upload`` (used here to upload zarr chunks larger than 5 GiB) yields progress dicts whose ``current`` is the number of bytes uploaded *within the chunk currently being transferred*. The surrounding ``ZarrAsset.iter_upload`` loop reports ``current`` as the number of bytes uploaded *across the whole zarr*. Yielded as-is, the inner per-file values caused ``current`` to jump backwards each time a new multipart chunk started, breaking progress bars driven from that field. Wrap the inner generator: when an "uploading" status carries a ``current`` field, translate it by the cumulative bytes uploaded so far before re-yielding; pass other status dicts through unchanged. The trailing post-loop yield is dropped — the last per-part translated yield already reports the full cumulative size for that chunk. Co-Authored-By: Claude Code 2.1.123 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The two ``ai_generated`` tests added in 9bc2976 patch ``ZARR_LARGE_CHUNK_THRESHOLD`` down so chunks are routed through ``_multipart_upload`` against whatever server backs ``new_dandiset``. The local docker dandi-archive image does not yet ship dandi-archive#2784, so the new ``supports_zarr_multipart_upload`` capability check raises ``UploadError`` before ``_multipart_upload`` is ever called and the tests fail without exercising what they claim to. Probe the live client for capability and ``pytest.skip`` only when the endpoint is missing — keeps the tests effective on setups whose server already carries dandi-archive#2784. Co-Authored-By: Claude Code 2.1.123 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5594b9d to
be2b5d3
Compare
be2b5d3 to
e5625f7
Compare
Member
Author
|
woohoo -- testing failed with and this smells like coming from the server so it indicates that we did pickup the server changes from the PR! edit/note: that PR in dandi-archive was already merged! so we would get errors even if testing against master version of the image |
Add a standalone GitHub Actions workflow,
`.github/workflows/cross-repo-pr-tests.yml`, that runs an extra
integration matrix when a dandi-cli PR description includes lines such
as
dandi-archive PR: dandi/dandi-archive#2784
dandi-schema PR: #321
The workflow has three jobs:
* `discover-cross-repo-prs` parses the PR body and resolves each marker
to repo+sha (handles forks via `gh api`).
* `build-archive-image` checks out the dandi-archive PR head (or master
if only a schema PR is given), optionally rewrites the pinned
`dandischema` line in `pyproject.toml` to a git URL pointing at the
schema PR head, builds with `dev/django-public.Dockerfile`, and
exports the image as a workflow artifact.
* `test-cross-repo` loads that image, force-installs `dandischema`
from the PR head into the runner venv, sets
`DANDI_TESTS_PULL_DOCKER_COMPOSE=0` so docker compose uses the
loaded image instead of pulling Docker Hub, and runs the
`--dandi-api` test suite.
Triggered on `pull_request: types: [opened, edited, synchronize,
reopened]` so adding or removing a marker re-fires *only* this
workflow, not the full test matrix in `run-tests.yml`. A per-PR
`concurrency:` group cancels older in-flight runs when the body or
commits change.
A PR template advertises the marker convention; the design notes
under `.specify/specs/cross-repo-pr-matrix.md` cover the rationale
and edge cases (forks, `pull_request` vs `pull_request_target`,
schema-version pin coordination, future generalization to an
organization-level reusable workflow).
Pattern adapted from bids-standard/bids-examples
`.github/workflows/validate_datasets.yml`.
Co-Authored-By: Claude Code 2.1.114 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… jobs Set `if: false` on the top-level job of `run-tests.yml`, `lint.yml`, `typing.yml`, `codeql.yml`, and `docs.yml` so we don't burn CI minutes on the full PR check suite while iterating on the new `cross-repo-pr-tests.yml` workflow. The cross-repo workflow itself is unaffected. This commit is intended to be dropped before this PR (or its successor) lands on master. Co-Authored-By: Claude Code 2.1.114 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e5625f7 to
39f51d9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Trial run of the new cross-repo PR matrix CI feature (see commit
CI: matrix runs against cross-repo dandi-archive / dandi-schema PRs).The marker line above should be parsed by the new
discover-cross-repo-prsjob; thebuild-archive-imagejob shouldthen build the dandi-archive Docker image from the head of
dandi/dandi-archive#2784 and the
test-cross-repojob should run the--dandi-apitest suite against it.This PR also stacks on top of #1839 (zarr multipart upload), which is
the change we ultimately want to validate against #2784 once the
cross-repo plumbing is confirmed working.
Why a separate PR
to exercise the new
workflow against a real upstream change before we merge and trust it to gate
real PRs.
TODOs before anything lands on master
discover-cross-repo-prs,build-archive-image, andtest-cross-repojob logs..github/changes (drop thezarr-multipart commits from Use multi-part upload on zarr chunks larger than 5GB #1839) and re-target master as the
real "land the cross-repo feature" PR. Or: cherry-pick the two
CI commits onto a fresh branch off master and open that as the
merge candidate.
if: falsecommit (CI [drop-before-merge]: temporarily disable the existing test matrix,342fa181CI [drop-before-merge]: temporarily disable the existing test matrix)before merging, so the regular test matrix runs again.
dandi-archive PR: https://github.com/dandi/dandi-archive/pull/2784so Use multi-part upload on zarr chunks larger than 5GB #1839 itself gets cross-repo coverage against #2784.
Design notes
See
.specify/specs/cross-repo-pr-matrix.mdfor the full design,including how the dandi-archive Docker image override works (build
locally from
dev/django-public.Dockerfile,docker image save,upload as artifact,
docker image loadin the test job, withDANDI_TESTS_PULL_DOCKER_COMPOSE=0sodocker compose pulldoesn'tclobber it) and how dandischema is overridden in both the runner venv
and inside the dandi-archive container build.
Test plan
discover-cross-repo-prsparsesdandi-archive PR: …from thisbody and outputs
archive_repo=dandi/dandi-archive,archive_ref=<sha of #2784 head>,should_run=true.build-archive-imagesucceeds and uploadsdandiarchive-api-cross-repoartifact.test-cross-repoloads the artifact, shows the right image SHAunder "Show cross-repo overrides", and runs
pytest --dandi-apiagainst it.
testjob is skipped (if: false).skipped (
should_run=falseshort-circuits the gating).