RFC-8: Collections#343
Conversation
Automated Review URLs |
|
this is looking really cool! |
|
Looks nice! As a quick initial comment, it would be super helpful to have a minmal example that demonstrates the new metadata structure being proposed - the webknossos examples are nice, but I'm struggling to distinguish what's required and optional in those files because there's lots of extra (I think?) attributes. |
propose interface between rfc5 and rfc8
| }, { | ||
| "name": "..", | ||
| "type": "collection", | ||
| "path": "./nested_collection.json" |
There was a problem hiding this comment.
The collection should be a directory that contains a zarr.json, right?
e.g. "path": "./nested_collection.zarr"
There was a problem hiding this comment.
Ah, now I see that this standalone json file is proposed as part of this RFC. But that isn't covered until much later below under Examples Where is this collection metadata stored?. Maybe that should be moved up above this point?
If an implementation is using e.g. zarr-python or another zarr library to retrieve zarr metadata, then it may be kinda painful to also support fetching of vanilla file.json files using a different mechanism? Don't know about other libs.
|
I started a basic implementation of Collections spec for the validator at ome/ome-ngff-validator#62. |
TODO:This post is a bit stream of consciousness-y - I hope I manage to express the bump I a stumbling over with the current state of transforms in here. In the version of this RFC, when
In ome/ngff-spec#117, this was made more explicit, so that these "input": {
"path": "./scale0",
"node": "node_name",
"name": "coordinate_system_name"
}And I think porting over this formalism is important, because instances of This has implications. In RFC8, the transforms for {
"ome": {
"version": "0.x",
"type": "collection",
"name": "example",
"attributes": {
"coordinateSystems": [
{
"id": "world",
"name": "world",
"axes": [...]
}
]
},
"nodes": [{
"name": "raw",
"type": "multiscale",
"nodes": [{
"id": "raw_0",
"type": "singlescale",
"path": {
"type": "zarr",
"path": "./raw/0"
},
"attributes": {
"coordinateTransformations": [
{
"type": "scale",
"scale": [1, 1, 1],
"input": {
"path": "raw_0",
},
"output": {
"name": "world"
"node": "raw_0"
}
}
]
}
}, ...]
}, ... ]
}
}The question I'm stuck with now: If the Singlescale is not inlined - where does the
I don't have a good idea about which to prefer, though. |
lubianat
left a comment
There was a problem hiding this comment.
(sorry, the approval was a misclick on GH mobile when hastily ok'ing ome/ngff-spec#128)
will-moore
left a comment
There was a problem hiding this comment.
Just adding comments, but seems I have to create a review...
|
Seems that adding comments to the changes page isn't working for me at the moment. So I'll add some here
Many of the collections I would like to represent with this spec contain images of different OME-Zarr versions. E.g. the figure at https://ome.github.io/omero-figure/?file=https://gist.githubusercontent.com/will-moore/75a7f0de5be0f7b4202d5f0229cadcc9/raw/ngff_images_figure.json or the list of samples at https://idr.github.io/ome-ngff-samples/ so this would be a blocker for many use-cases.
I'm not sure what the motivation is for |
|
Thanks @will-moore!
Collections will likely be a feature of OME-Zarr 1.0. I don't think it is reasonable to referentially include all previous versions of the spec in the 1.0 release because of the burden that would put on implementations.
The motivation for
Multiscales are now collections of Singlescales. The field
Multiscales with a single Singlescale are not disallowed, but not required anymore. Users can just create Singlescales as Zarr arrays without the need for enclosing Zarr groups. |
could you define the term "image" to mean "a Zarr array", and "multiscale image" to mean "a collection of images at different levels of detail". Starting with the more basic thing (a single array) and defining the collection in terms of that seems better than starting with the collection (multiscales) and defining the more basic thing in terms of it. |
|
It feels like we have been working on RFC-5 for a long time and have finally reached a consensus on transforms and scenes etc. But even before v0.6 is released we are proposing to re-work all that again (and other core concepts like Multiscales.datasets that have been around since v0.1). Are we saying that OME.zarr data v0.6 and earlier are not expected to be supported by tools that read v1.0 because they are too different? That would discourage adoption of OME.zarr v0.6 because it's sunsetted even before it's released. My first impression of RFC-8 was that it's a way of grouping existing Multiscales images into Collections. But this proposal looks like starting from scratch and ditching previous work and support for existing data? I'm not even sure I fully understand @jo-mueller's question above, except that it shows all the hard RFC-5 discussions are going to need to be revisited again? |
|
@will-moore thanks for the feedback. About my comment above, I think discussing intents and structure last week in Düsseldorf helped to structure my ideas for RFC8. I opened normanrz#4 with some suggestions that address some of my concerns. |
I appreciate the design work that has gone into RFC-5 and I think RFC-8 is building on top of that. I'll review with @jo-mueller next week whether to bring back the scene metadata.
I think it is important to look at RFC-8 as part of the long-term vision of the 1.0 release. This probably warrants its own RFC, but in my view 1.0 is supposed to be a long-term release that carries us through the next decade without breaking changes. Up until now every release of OME-Zarr has been breaking and I think that needs to stop to foster serious adoption. That also means this is the last opportunity in a while to break things in order to make the OME-Zarr spec more consistent and extensible. Basically, take all the learnings from the 0.x releases and make a great long-term 1.0 release.
I definitely think that tools should be considered compliant with the 1.0-spec if they only support v1.0 and no previous versions. This is already the case with 0.x versions. Only very few tools understand 0.1-0.3 and some tools only understand 0.5 and not 0.4 anymore. I think that is totally fine, because they are 0.x releases. That being said, I think the extension mechanism could be used to include 0.x OME-Zarrs in 1.0 Collections. Just define an extension node type that references 0.x multiscales. Tools could voluntarily support that, if they find it useful. I want to add that 0.5 -> 0.6 -> 1.0 are metadata-only changes. I don't think it is unreasonable for users to consider migrating the metadata. This will be less of a lift than the 2024 NGFF challenge, where we actually converted the data. |
|
seconding norman's POV. And a broader point about churn: churn during development is valuable if it buys a better released product. This churn affects devs for months, but users will interact with 1.0 for years. It would be unfortunate if they had to tolerate a deficient product because devs settled too early. Now is the time to fix stuff. It only gets harder later. |
|
I think this is a super-useful discussion here. If anything, it will help RFC8 authors to get a feeling from which direction to expect feedback or sharpen RFC8 towards. I think there are two separate things to take from this discussion: Minimally, I think the relationship between coordinate system and nodes needs to be clarified. To a degree, this already happened in 0.6.dev3 -> 0.6.dev4. The important thing to note here is that coordinate systems and transformations define their own graph like structure, that can be independent of the collection/node layout. Since a The other thing is the following:
I'm not so sure about that. In 0.x, the smallest interpretable, indivisible aggregation of data and metadata is the The introduction of the Don't get me wrong, I'm not opposed to renaming What I propose in normanrz#4 is simply a stratification and clarification of where metadata sits and what collections are expected to collect:
This is currently not necessarily the case with the Imho, making this restriction doesn't take from the expressiveness and elegance of RFC8, but adds to the integrity and reliability of images - aka multiscales - as an essential concept in the spec. |
Co-authored-by: Norman Rzepka <code@normanrz.com>
RFC8 elementary multiscales & scene
This is the work-in-progress draft for RFC-8.
cc @jluethi @lorenzocerrone @tischi @perlman @matthewh-ebi