Use multi-part upload on zarr chunks larger than 5GB by jjnesbitt · Pull Request #1839 · dandi/dandi-cli

jjnesbitt · 2026-04-22T21:30:44Z

Depends on and adds support for dandi/dandi-archive#2784

codecov · 2026-04-22T21:34:17Z

Codecov Report

❌ Patch coverage is 82.25806% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.27%. Comparing base (43f02a3) to head (f1f48da).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
dandi/files/bases.py	83.63%	9 Missing ⚠️
dandi/dandiapi.py	50.00%	5 Missing ⚠️
dandi/files/zarr.py	78.26%	5 Missing ⚠️
dandi/consts.py	0.00%	1 Missing ⚠️
dandi/support/digests.py	80.00%	1 Missing ⚠️
dandi/tests/test_files.py	96.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1839      +/-   ##
==========================================
+ Coverage   76.26%   76.27%   +0.01%     
==========================================
  Files          87       87              
  Lines       12512    12575      +63     
==========================================
+ Hits         9542     9592      +50     
- Misses       2970     2983      +13

Flag	Coverage Δ
unittests	`76.27% <82.25%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jjnesbitt · 2026-04-24T21:11:24Z

@yarikoptic I believe this is in a good state.

Regarding the test failures, it's worth noting that tests against the API are almost guaranteed to fail, as they blindly pull the dandiarchive/dandiarchive-api image from dockerhub, and the required changes to make this work are contained in dandi/dandi-archive#2784. There is also a new release of RabbitMQ that is causing celery to not work correctly, which means that any tests that depend on the API updating the status of things behind the scenes via celery (a good amount) are broken. We're looking into it.

yarikoptic · 2026-04-25T00:54:21Z

Thank you @jjnesbitt ! I meanwhile submitted

BF: Pin rabbitmq image to 4.2-management in test docker-compose #1841

to mitigate CI fails. Claude identified the same "cause" and provide some of its "aiplanation" but it could be slop, so beware.

yarikoptic · 2026-04-25T12:03:58Z

@jjnesbitt FYI - I have rebased on now green master

jjnesbitt · 2026-04-27T17:49:28Z

@yarikoptic It would be good to get your review/approval on this PR. The only remaining test failure is the following:

FAILED dandi/tests/test_files.py::test_upload_zarr_large_chunks - dandi.exceptions.HTTP404Error: Error 404 while sending POST request to http://localhost:8000/api/uploads/zarr/initialize/: <!DOCTYPE html>

This is due to the fact that the version of the API that the tests are being run against doesn't contain the new URLs required for this change.

yarikoptic · 2026-04-27T20:35:04Z

re tests:

FAILED dandi/tests/test_files.py::test_upload_zarr_large_chunks - dandi.exceptions.HTTP404Error: Error 404 while sending POST request to http://localhost:8000/api/uploads/zarr/initialize/: <!DOCTYPE html>
This is due to the fact that the version of the API that the tests are being run against doesn't contain the new URLs required for this change.

So let's make such test(s) conditional and skip if ran against released/older version of the archive. Is there anything in the /server-info which would be indicative of this feature to be there?

jjnesbitt · 2026-04-27T20:48:45Z

So let's make such test(s) conditional and skip if ran against released/older version of the archive. Is there anything in the /server-info which would be indicative of this feature to be there?

We could do that, but it would achieve very little, since as soon as the feature is merged to master and the new image is built, they will immediately be re-enabled again. There's nothing at the /server-info endpoint at the moment to delineate the change.

Since the API change is backwards compatible, I suggest we simply merge the API change once it's ready (hopefully soon), at which point the tests will use the new image, and things are back to normal.

yarikoptic · 2026-04-27T20:50:04Z

re server-capability conditioning -- and then client ideally be aware of that too at run time (not only in the tests) so it could just refuse/error if trying to upload 5TB files as I have implemented awhile back in #1827 and will be merging shortly, instead of hitting a non-existing end point at EMBER and LINC (if they keep it afloat). I will leave it to you to decide on how client could discover server capability here:

could be just version based if you are positive that next minor would have it
I thought I filed an issue proposing to add some 'features: [ ]' listing to /api/info but I fail to find
could sense if there is an expected endpoint (once per instance) and if not - assume absent feature.

yarikoptic · 2026-04-27T20:50:58Z

Sorry @jjnesbitt , I will not be merging a PR with totally Red CI, it is out of question.

yarikoptic · 2026-04-27T20:54:47Z

ideally it should actually test that server works with at least some TEMP commit which would somehow provision container with that new mighty version of the server from the PR. But I guess that is for the future, to do smth like I did for bids-examples where in PR description we can point to perspective PRs for bids-specification and bids-validator to test against: https://github.com/bids-standard/bids-examples/blob/master/.github/workflows/validate_datasets.yml#L35

jjnesbitt · 2026-04-27T20:55:07Z

Sorry @jjnesbitt , I will not be merging a PR with totally Red CI, it is out of question.

That is not what I was suggesting. Once the API PR is merged, this CI will become green, because it will be using the latest image of dandiarchive/dandiarchive-api.

yarikoptic · 2026-04-27T20:57:02Z

ok, let's wait then.

yarikoptic

some minor ones about imports but also a question to clarify on why not to use smaller threshold

jjnesbitt · 2026-04-28T18:27:30Z

To avoid complicated conditioning of this feature based on the archive's capability (for which they may quickly upgrade anyway), we could do the following:

Merge and release this change
In the prod instance, set cli-minimal-version to the newly released version
In the downstream archives, add the newly released version to cli-bad-versions, so that anyone with a CLI version that is newer than what is supported gets a CLI error.

This removes any conditional logic based on archive capabilities in the CLI (which should be discouraged anyway) and puts the compatibility check solely on the archive side. Once LINC/EMBER merge in the supported change, the CLI version can be removed from cli-bad-versions, and things are back to normal.

@yarikoptic Do you have objections to this?

yarikoptic · 2026-04-28T18:53:25Z

yes, I have objections, as

I think conditioning should not be complicated, let me know if I should take stab at that,
again -- it seems would require red CI to be merged /released as current CI is red (didn't check details)?
we do have other instances running so I do not want to break new client release functioning with them ...

But before we jump any ways, I think we still need to elaborate on overall details of the change in behaviors! Thanks for quickly addressing the question of diverging checksum/etag'ing -- I left a comment there as I think multipart upload etag compute must be consistent with how we compute for regular blobs, shouldn'it?

And I (at least) still do not have a complete picture how it

would all be consistent with existing zarrs in the archive (already having >100MB single part uploads),
zarr-checksum library behavior (does not do any multipart splitting, and we do not want different checksums by the library and uploader/archive ) etc.

Potential "resolutions" to consider:

switch back to 5GB limit for deciding going multipart with zarr blobs -- even though I was the one who asked/reminded about AWS recommendation on when to switch to multipart uploads.
- pros: This would allow for existing zarrs with single part uploads of large files (which succeeded) to be ok; finalizing upload of huge parts via multipart, if we fixup zarr-checksum library to behave similarly etc - we have consistent state
- cons: single part uploads are more fragile etc
switch to this new logic of 100MB limit as now to go multipart for zarr
- pros: consistent with blobs
- cons: need to be figured out how we make it consistent with library etc . e.g.
  - it could be some new form of checksum, (e.g. zarr-checksum-v2, or zarr-checksum-mp100 where mp100 signals on multipart past 100M or zarr-dandietag-checksum to signal that individual blobs follow dandietag) and thus we just improve digester to understand both and then existing ones with old zarr-checksum and then for new uploads zarr-checksum-v2. will require zarr-checksum library to get support for both I guess
  - ... ???...

or, once again my question -- am I overcomplicating?

jjnesbitt · 2026-04-28T20:53:06Z

again -- it seems would require red CI to be merged /released as current CI is red (didn't check details)?

And I'll again reiterate, the archive PR is not yet merged, which is why the CI is red. Once that happens, it will be green.

we do have other instances running so I do not want to break new client release functioning with them ...

What other clients besides LINC/EMBER? Why is simply excluding the new CLI version not sufficient to support these archives?

I think conditioning should not be complicated, let me know if I should take stab at that,

The CLI code for zarr upload is extremely convoluted as it is, adding this conditional logic further exacerbates that. However, if you're really determined to implement this conditional logic then I'll leave it to you.

But before we jump any ways, I think we still need to elaborate on overall details of the change in behaviors! Thanks for quickly addressing the question of diverging checksum/etag'ing -- I left a comment there as I think multipart upload etag compute must be consistent with how we compute for regular blobs, shouldn'it?

And I (at least) still do not have a complete picture how it

would all be consistent with existing zarrs in the archive (already having >100MB single part uploads),

zarr-checksum library behavior (does not do any multipart splitting, and we do not want different checksums by the library and uploader/archive ) etc.

The zarr-checksum library doesn't need to know specifics of how the digests are computed. As it stands now, we simply pass it the items containing the path and digest, so we control what digests it uses. There should be no issue here.

Potential "resolutions" to consider:

switch back to 5GB limit for deciding going multipart with zarr blobs -- even though I was the one who asked/reminded about AWS recommendation on when to switch to multipart uploads.

I think short term this is probably the best solution, as it immediately unblocks people trying to upload zarrs with parts over this limit. The underlying issue should still be re-addressed, but we can hold off until there's not a funding gap looming.

yarikoptic · 2026-04-29T00:15:21Z

again -- it seems would require red CI to be merged /released as current CI is red (didn't check details)?

And I'll again reiterate, the archive PR is not yet merged, which is why the CI is red. Once that happens, it will be green.

great -- when it gets green we can merge! I feel like in a groundhog day ;) I guess we are missing each other's point on this one.

we do have other instances running so I do not want to break new client release functioning with them ...

What other clients besides LINC/EMBER? Why is simply excluding the new CLI version not sufficient to support these archives?

because it would require THEM to exclude, whenever WE introduce breakage, so I would prefer US to avoid breakage that rely on clients to mitigate, and then remitigate whenever we update or release yet another version.

Long term -- it is client/server's couple job imho to agree on acceptable interfaces to be used.

I think conditioning should not be complicated, let me know if I should take stab at that,

The CLI code for zarr upload is extremely convoluted as it is, adding this conditional logic further exacerbates that. However, if you're really determined to implement this conditional logic then I'll leave it to you.

gotcha, I will look into that... tomorrow morning.

My question would only remain of absent testing against WiP archive I guess. did you have chance to test it against some instance? what was its /info output -- I wonder what would be the best indicator if instance supports this -- probing new interface?

But before we jump any ways, I think we still need to elaborate on overall details of the change in behaviors! Thanks for quickly addressing the question of diverging checksum/etag'ing -- I left a comment there as I think multipart upload etag compute must be consistent with how we compute for regular blobs, shouldn'it?
And I (at least) still do not have a complete picture how it

would all be consistent with existing zarrs in the archive (already having >100MB single part uploads),

zarr-checksum library behavior (does not do any multipart splitting, and we do not want different checksums by the library and uploader/archive ) etc.

The zarr-checksum library doesn't need to know specifics of how the digests are computed. As it stands now, we simply pass it the items containing the path and digest, so we control what digests it uses. There should be no issue here.

doesn't it provide compute across filetree?:
https://github.com/dandi/zarr_checksum/blob/73bfdf56e822065623b282049e7c70783d87072c/zarr_checksum/generators.py#L108

Potential "resolutions" to consider:

switch back to 5GB limit for deciding going multipart with zarr blobs -- even though I was the one who asked/reminded about AWS recommendation on when to switch to multipart uploads.

I think short term this is probably the best solution, as it immediately unblocks people trying to upload zarrs with parts over this limit. The underlying issue should still be re-addressed, but we can hold off until there's not a funding gap looming.

agreeed! let's stick to 5GB limit as decision point for now

jjnesbitt · 2026-04-29T16:03:10Z

My question would only remain of absent testing against WiP archive I guess. did you have chance to test it against some instance? what was its /info output -- I wonder what would be the best indicator if instance supports this -- probing new interface?

I've done this both through running the integration tests with this branch of the CLI code (successful run), as well as running the corresponding branch of the API locally and pointing the tests at that, instead of utilizing the docker-compose pytest fixture.

As far as what test can be done to check for server compatibility, the existence of the /uploads/zarr/initialize/ endpoint is the simplest and easiest answer.

doesn't it provide compute across filetree?: dandi/zarr_checksum@73bfdf5/zarr_checksum/generators.py#L108

What you've linked is the one of the "default" file generators included in the zarr-checksum package. We do not use that generator, and instead directly provide the files and digests used to compute the zarr checksum.

agreeed! let's stick to 5GB limit as decision point for now

I've switched this back.

…_upload->singlepart_items The split of a batch into "items above the 5GB threshold" and "items at or below" is really a split between items routed through the multipart upload endpoint and items uploaded with a single signed PUT. Naming them after the upload mechanism rather than the size makes the subsequent capability/gating logic read more naturally. Pure rename, no behavior change. Co-Authored-By: Claude Code 2.1.123 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dandi-archive#2784 (still open) introduces the ``/uploads/zarr/initialize/`` endpoint required to upload zarr chunks larger than 5 GiB. Until that lands and is deployed, posting to the endpoint returns 404 and the multi-part code path here would surface that as an opaque ``HTTP404Error`` from inside ``_multipart_upload``. Add a lazy ``DandiAPIClient.supports_zarr_multipart_upload`` cached property that probes the endpoint once with an empty POST body: 404 means the server lacks the feature, anything else (400 for the bogus payload, 401/403 for auth) means it is present. In ``ZarrAsset.iter_upload``, right after the batch is split into ``multipart_items`` and ``singlepart_items``, raise an ``UploadError`` when there are multipart items and the server does not support them. The error reports the count and aggregate size of the oversized chunks plus the path of the largest one, mirroring the wording proposed in #1827. Marked with a TODO: the check (and the cached property) can be removed once all supported servers ship dandi-archive#2784 (> 0.23.0). Co-Authored-By: Claude Code 2.1.123 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…umulative ``_multipart_upload`` (used here to upload zarr chunks larger than 5 GiB) yields progress dicts whose ``current`` is the number of bytes uploaded *within the chunk currently being transferred*. The surrounding ``ZarrAsset.iter_upload`` loop reports ``current`` as the number of bytes uploaded *across the whole zarr*. Yielded as-is, the inner per-file values caused ``current`` to jump backwards each time a new multipart chunk started, breaking progress bars driven from that field. Wrap the inner generator: when an "uploading" status carries a ``current`` field, translate it by the cumulative bytes uploaded so far before re-yielding; pass other status dicts through unchanged. The trailing post-loop yield is dropped — the last per-part translated yield already reports the full cumulative size for that chunk. Co-Authored-By: Claude Code 2.1.123 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The two ``ai_generated`` tests added in 9bc2976 patch ``ZARR_LARGE_CHUNK_THRESHOLD`` down so chunks are routed through ``_multipart_upload`` against whatever server backs ``new_dandiset``. The local docker dandi-archive image does not yet ship dandi-archive#2784, so the new ``supports_zarr_multipart_upload`` capability check raises ``UploadError`` before ``_multipart_upload`` is ever called and the tests fail without exercising what they claim to. Probe the live client for capability and ``pytest.skip`` only when the endpoint is missing — keeps the tests effective on setups whose server already carries dandi-archive#2784. Co-Authored-By: Claude Code 2.1.123 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>

yarikoptic · 2026-04-30T02:12:20Z

Sent

Zarr multipart upload tune ups: condition on server capability, fix progress reporting, harmonize variable names #1846

conditioning part is quite minimal, as I expected.

Zarr multipart upload tune ups: condition on server capability, fix progress reporting, harmonize variable names

github-advanced-security AI found potential problems Apr 22, 2026

View reviewed changes

Comment thread dandi/files/zarr.py Fixed

Comment thread dandi/files/zarr.py Fixed

jjnesbitt force-pushed the zarr-multipart-upload branch from b427dc7 to 0e505bd Compare April 23, 2026 21:10

github-advanced-security AI found potential problems Apr 23, 2026

View reviewed changes

Comment thread dandi/files/zarr.py Fixed

Comment thread dandi/files/zarr.py Fixed

Comment thread dandi/files/zarr.py Fixed

yarikoptic mentioned this pull request Apr 24, 2026

ENH+BF: Zarr upload retry resilience, diagnostics, and explicit timeout #1827

Open

5 tasks

yarikoptic reviewed Apr 24, 2026

View reviewed changes

Comment thread dandi/files/zarr.py Outdated

yarikoptic reviewed Apr 24, 2026

View reviewed changes

Comment thread dandi/files/zarr.py Outdated

github-advanced-security AI found potential problems Apr 24, 2026

View reviewed changes

Comment thread dandi/files/zarr.py Dismissed

Comment thread dandi/files/zarr.py Dismissed

jjnesbitt force-pushed the zarr-multipart-upload branch from c1843cd to 367bd24 Compare April 24, 2026 16:20

jjnesbitt marked this pull request as ready for review April 24, 2026 21:04

jjnesbitt requested a review from yarikoptic April 24, 2026 21:05

yarikoptic added minor Increment the minor version when merged zarr labels Apr 25, 2026

yarikoptic force-pushed the zarr-multipart-upload branch from 367bd24 to 83db4f4 Compare April 25, 2026 12:03

jjnesbitt force-pushed the zarr-multipart-upload branch from 83db4f4 to 7121c31 Compare April 27, 2026 17:16

yarikoptic requested changes Apr 27, 2026

View reviewed changes

Comment thread dandi/tests/test_files.py Outdated

Comment thread dandi/files/zarr.py

jjnesbitt force-pushed the zarr-multipart-upload branch from 9f79117 to 3be1ccd Compare April 27, 2026 21:44

jjnesbitt changed the title ~~Use multi-part upload on zarr chunks larger than 5GB~~ Use multi-part upload on zarr chunks larger than 100MB Apr 28, 2026

yarikoptic reviewed Apr 28, 2026

View reviewed changes

Comment thread dandi/support/digests.py Outdated

jjnesbitt commented Apr 28, 2026

View reviewed changes

Comment thread dandi/support/digests.py Outdated

jjnesbitt added 9 commits April 29, 2026 11:26

Use multi-part upload on files larger than 5GB

0894a06

Unify multipart upload logic between zarrs and blobs

b3bf063

Fix zarr params to multi-part upload function

4490bbb

Fix linting

3772cc4

Fix typing error

0ded7f9

Move test import to top of file

6e9064b

Use multi-part checksums for multi-part zarr uploads

57b353f

Add test for multi-part zarr upload

9bc2976

Merge multipart behavior into md5file_nocache func

6d99670

jjnesbitt force-pushed the zarr-multipart-upload branch from c349a4c to 6d99670 Compare April 29, 2026 15:30

jjnesbitt changed the title ~~Use multi-part upload on zarr chunks larger than 100MB~~ Use multi-part upload on zarr chunks larger than 5GB Apr 29, 2026

Ignore incorrect type checking error

871294b

yarikoptic mentioned this pull request Apr 29, 2026

Adjust logic to switch to "dandi-etag" for files within Zarr over 5GB dandi/zarr_checksum#94

Open

yarikoptic and others added 4 commits April 29, 2026 19:51

yarikoptic mentioned this pull request Apr 30, 2026

Zarr multipart upload tune ups: condition on server capability, fix progress reporting, harmonize variable names #1846

Merged

jjnesbitt added 3 commits April 30, 2026 11:51

Merge pull request #1846 from dandi/zarr-multipart-upload-enhs

56126dd

Zarr multipart upload tune ups: condition on server capability, fix progress reporting, harmonize variable names

Merge branch 'master' into zarr-multipart-upload

fc6b36e

Fix test_upload_zarr_large_chunks

db6bcf5

yarikoptic mentioned this pull request Apr 30, 2026

[TEST] CI: cross-repo PR matrix vs dandi-archive#2784 #1851

Draft

11 tasks

Merge branch 'master' into zarr-multipart-upload

f1f48da

Conversation

jjnesbitt commented Apr 22, 2026

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjnesbitt commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yarikoptic commented Apr 25, 2026

Uh oh!

yarikoptic commented Apr 25, 2026

Uh oh!

jjnesbitt commented Apr 27, 2026

Uh oh!

yarikoptic commented Apr 27, 2026

Uh oh!

jjnesbitt commented Apr 27, 2026

Uh oh!

yarikoptic commented Apr 27, 2026

Uh oh!

yarikoptic commented Apr 27, 2026

Uh oh!

yarikoptic commented Apr 27, 2026

Uh oh!

jjnesbitt commented Apr 27, 2026

Uh oh!

yarikoptic commented Apr 27, 2026

Uh oh!

yarikoptic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jjnesbitt commented Apr 28, 2026

Uh oh!

Uh oh!

yarikoptic commented Apr 28, 2026

Uh oh!

jjnesbitt commented Apr 28, 2026

Uh oh!

Uh oh!

yarikoptic commented Apr 29, 2026

Uh oh!

jjnesbitt commented Apr 29, 2026

Uh oh!

yarikoptic commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Apr 22, 2026 •

edited

Loading

jjnesbitt commented Apr 24, 2026 •

edited

Loading