Giter Site home page Giter Site logo

ome / ngff Goto Github PK

View Code? Open in Web Editor NEW
111.0 24.0 38.0 1.84 MB

Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.

Home Page: https://ngff.openmicroscopy.org

License: Other

Python 10.41% Bikeshed 88.80% Makefile 0.35% Batchfile 0.43%
bioimaging spec file-formats cloud data-science

ngff's Introduction

DOI

ome-ngff

Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.

Editing

Specifications are written in markdown, or technically bikeshed -- a markdown document, with special extensions understood by the bikeshed tool. The bikeshed tool is run on-commit via the spec-prod github action, generating the familiar "spec looking" ReSpec format. ReSpec is just html with a javascript ReSpec library.

Specification files end with the .bs file extension. The github action runs on commit to automatically convert to respec/html, via bikeshed.

Learn more about bikeshed

Reviewing

Commits on GitHub can be viewed using web services from the W3C:

New version

  • Make new changes to latest/index.bs
  • Update changelog at the bottom of latest/index.bs
  • Find references to previous version and in most cases, bump to the current version.

JSON schemas

For each top-level metadata key of the OME-NGFF specification, JSON schemas are maintained for each version of the specification and stored under $VERSION/schemas/ or latest/schemas/. Tests validating these schemas must be implemented to follow principles of the JSON schema test suite and stored under $VERSION/tests/ or latest/tests/ to allow their execution on each CI build.

All official example snippets must also be extracted and managed as separate JSON files under $VERSION/examples/ or latest/examples/, validated by the appropriate schema by adding a .config.json file specifying the JSON schema to use and included in the specification document using the include-code directive.

The official OME-NGFF JSON schemas are published under https://ngff.openmicroscopy.org//schemas/<schema_name>.schema using the Spec prod GitHub action. When a new JSON schema is introduced, this action needs to be reviewed to update the deployment script and allow the publication of the schema.

Release process

  • copy latest/index.bs to $VERSION/index.bs
  • copy latest/copyright.include to $VERSION/copyright.include
  • update the head matter in the $VERSIONed file
    • Use: Status: w3c/CG-FINAL
    • Update URL:
    • Use the following Status Text:: "This is the $VERSION release of this specification. Migration scripts will be provided between numbered versions. Data written with the latest version (an "editor's draft") will not necessarily be supported."
  • update the footer matter in the $VERSIONed file
    • Version in the citation block including release date
  • Update https://github.com/ome/spec-prod for the new version

Citing

Please see https://ngff.openmicroscopy.org/latest#citing for the latest citation.

ngff's People

Contributors

alanmwatson avatar bugraoezdemir avatar constantinpape avatar d-v-b avatar dstansby avatar dzenanz avatar giovp avatar glyg avatar ivirshup avatar joshmoore avatar julianhn avatar jwindhager avatar kevinyamauchi avatar matthewh-ebi avatar melissalinkert avatar melonora avatar normanrz avatar sbesson avatar tcompa avatar virginiascarlett avatar will-moore avatar yarikoptic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ngff's Issues

z_downsampling metadata

It's possible that we may generate multiple "datasets" for a given OME-Zarr Image, e.g. with and without down-sampling in Z. It could be useful to have a way to identify these.

See hms-dbmi/vizarr#71 (comment)
"will there be any way to distinguish between the downsampled-in-z version of a dataset and the non-downsampled version? The shape of the zarr array, I presume? But will there be any metadata about it? Just curious........."

@joshmoore "One thing we could do at the moment (pre-lightsheet changes) would be to have a naming convention for the multiscales themselves. Another option would be to use the metadata for the method of the downsampling, but we’d need something well-defined to say “in-z".

Metadata for color and contrast limits

@joshmoore @will-moore

It would be great to add metadata specifications for colour and contrastLimits (aka displayRange).

Below an example of how I am opening the data with BigDataViewer and how I am setting the colours, thus showing the kind of information I would need:

OMEZarrS3Reader reader = new OMEZarrS3Reader( "https://s3.embassy.ebi.ac.uk", "us-west-2", "idr" );
SpimData image = reader.read( "zarr/v0.1/6001237.zarr" );
List< BdvStackSource< ? > > sources = BdvFunctions.show( image );
sources.get( 0 ).setColor( new ARGBType( ARGBType.rgba( 0,0,255,255 ) ) );
sources.get( 0 ).setDisplayRange( 0, 3000 );
sources.get( 1 ).setColor( new ARGBType( ARGBType.rgba( 0,255,0,255 ) ) );
sources.get( 1 ).setDisplayRange( 0, 3000 );
sources.get( 2 ).setColor( new ARGBType( ARGBType.rgba( 255,0,0,255 ) ) );
sources.get( 2 ).setDisplayRange( 0, 3000 );
sources.get( 3 ).setColor( new ARGBType( ARGBType.rgba( 255,255,255,255 ) ) );
sources.get( 3 ).setDisplayRange( 0, 3000 );

Hierarchical chunk storage

Currently, the ome zarr specification demands that the chunks are stored in a single directory.
This causes issues on the file system for large images / volumes with many chunks.

N5 stores chunks in a hierarchy natively and zarr also supports this with the NestedDirectoryStore (although this is currently not working according to @joshmoore).

A few misc questions

Let me know if this is the wrong forum and I can move this post.
We are considering making a big move to use ome-zarr. I have some miscellaneous questions/issues on the state of things.

  1. Context: We have lots of on-prem storage but need to move all of it to cloud. We will then need to make large images still accessible to compute that is more distant from the data. A possible expected scenario is
    a. microscope-->proprietary file format
    b. upload and immediate conversion to open format using aicsimageio
    c. scientists do compute and vis on chunked remote open format using aicsimageio

  2. Is it possible to store multiscale zarr groups on different storage categories? For example can we say we want the full resolution level on cold storage but downsampled levels on cloudfront/a more "hot" service?

  3. Is there an assumption on ome-ngff that multiscale resolutions are necessarily halved in x,y at each level? Or can I write any downsampling I want at each level (I have some calculations that forces it to fit in a certain memory footprint, for example). If so, key question: how do I get the data shape at each level?

  4. The current ome-ngff document here https://ngff.openmicroscopy.org/latest/#omero-md refers me to https://docs.openmicroscopy.org/omero/5.6.1/developers/Web/WebGateway.html#imgdata. Does that mean the spec is really the full omero spec contained at the latter link? That latter spec provides for physical pixel dimensions and shape information in top level metadata but it is not shown in the example in the ngff doc page.

  5. We capture a lot of large "multi-scene" files (the dreaded 6th dimension). Let's assume they are not separate wells. In ome-zarr, are we supposed to put them in separate root-level groups in the same store? Does ome provide some recommendation for this apart from just treating them as "different" images?

HCS acquisition metadata

Currently, acquisitions are implemented in ome-zarr as a hierarchy group:

plate/acquisition/row/column/field

e.g. see ome/omero-cli-zarr#43

However, this hierarchy doesn't exist in OMERO: Acquisitions are just an orthoganal grouping of fields, but the webclient etc show it as a hierarchy.
We should consider alternatives for storing acquisitions in ome-zarr.

Option 1:
Store acquisition metadata at the plate level (with an ID), and each Well can refer to the acquisition it is linked to via ID:
eg.

"plate": {
  "acquisitions": [
    {
      "id": 1,
      "name": "run 1",
      "description": "foo",
      "starttime": <timestamp>
      "endtime": <timestamp>
      "maximumfieldcount": <integer>
    }
  ],
...}

Then in each 'well', the images list could link to acquisition (optional):

"well": {
    "images": [
        {"path": "0", "acquisition": 1},
        {"path": "1", "acquisition": 2}
    ]
}

Alternative is to put the 'acquisition' in the 'image' metadata, but it is probably more useful to have this together at the Well level, and not 'contaminate' the image metadata with this.

This Option would make it easier to view ALL the fields for a Well, across the different acquisitions.
But makes it harder to mimic the current OMERO client behaviour of viewing a whole Plate with the fields from a particular acquisition.

Other Options or feature requests to consider?

Napari plugin for ome-zarr

We are currently working on generating reference data for ome.zarr v0.3 (see #46 ) and providing a fiji plugin for opening it through MoBIE, see mobie/mobie.github.io#52.

I think that it would be good to have a plugin in napari as well to have a second reference implementation for opening it.
Is there some "official" napari plugin for ome.zarr already that this could be integrated into?
If not, I could write something for 0.3, but I would need a pointer on the latest instructions for writing napari plugins.

cc @jni @sofroniewn

HCS Well metadata

Currently, we have Plate metadata like:

"plate": {
        "column_names": [
            1, 2, 3
        ],
        "columns": 3,
        "row_names": [
            "A", "B"
        ],
        "rows": 2
        "images": [
            {"path": "0/A/3/Field_1"},
            {"path": "0/B/2/Field_1"},
            {"path": "0/A/1/Field_1"},
            {"path": "0/B/3/Field_1"},
            {"path": "0/A/2/Field_1"},
            {"path": "0/B/1/Field_1"}
        ],
        "name": "plate1_1_013",
        "plateAcquisitions": [
            {"path": "0"}
        ],
    }

The presence of the "plate":{} in url/.zattrs is currently used by vizarr and napari to know that the URL is a plate.
For an image the url/.zattrs contains "omero":{} and "multiscales":{}.
For labels, the url/.zattrs contains "labels": ["0", "1"] list.
NB: Maybe this should be "labels": [{"path": "0"}, {"path": "1"}]

But, for a Well we don't have url/.zattrs, so we need something to identify a Well.
This could be:

"well": {
    "images": [
        {"path": "0/A/1/Field_1"}, {"path": "0/A/1/Field_2"},
    ],
}

But this is redundant with the "images" in the plate metadata.

Option 1

Store "images" on Well - not on Plate

  • Pros:
    • When loading Well, we have direct access to children - consistent with e.g. multiscales, labels etc.
  • Cons:
    • When we are loading "Plate", we don't know max number of fields (do we need to?) - Propose to add "fields": 2 to the plate metadata.
    • We also don't know which Wells are populated vv empty. This could be useful in sparsely-populated Plates. Don't want to try to load a large number of empty Wells. Maybe Plate should have "wells": [{"path":"A/1"}, {"path":"A/2"}]

Option 2

Store "images" on Plate - not on Well

  • Pros:
    • We have ALL the images info when loading Plate - know e.g. number of fields for each Well.
  • Cons:
    • For big plates (e.g. idr0033) we have 384 wells x 9 Fields, so "images" list will be 3456 items long. Is that too long?
    • When loading a Well, we need to make an extra call to load child fields from the parent Plate.
    • Consistency: generally best if we stick to the rule that "containers should list their children". If we don't put this in "well":{} then we just have an empty dict!

cc @sbesson @joshmoore

Update label properties to include `example:` prefix

see: #3

Currently the "class" field is used as an example in the label properties specification. Pending discussion, we may want to prefix this with something like "example:" to make it clear that it's not a well-known. field.

Storing tables

An outcome of this hackathon was that we would like to store tabular data in ome-zarr.

I wanted to ask whether we should consider that whatever we store can be easily mapped onto a csv file. Meaning that fromCSV and toCSV should work smoothly such that other software that can work with tables can be interoperable with the tabular content of the ome-zarr.

What do you think?

affines and dimensions

if i may make one comment that we allow affines back as transformation. but optional and if included then translation and scale should not be used. for some of our use cases it would be really nice to be able to include the affine in the dataset itself rather than separately. we can wait till the next iteration of the spec, but if there are no objections that affines will also be axisIndices or dimension compliant, i don't see a reason why including it would be difficult.

let me first start with dimensions. at present it is time, channel, shape, shape, shape. while time can cover many aspects, we may want to include something like slice, channel, shape, shape, shape for 3d imaging, where we take a 3d tissue and slice it into thinner slices and then put them together as a single dataset.

this has a few additional requirements and this is where the affine part comes in. although a computed transform, it may be helpful to align these slices in 3D space. also we may have missing slices or uneven spacing. thus having affines be associated with each slice could be helpful.

in out practical use case, we have stains, not channels. so our dimensions would be:

slice, stain_index, shape, shape, shape, chunk

where chunks are partially overlapping acquisitions. chunks would be dropped if the images were prestitched.

in this use case, the affines could be associated with different slices and potentially chunks. for simplicity let's leave chunks aside.

then our transformations would be a list of affines for each slice [ aff1, aff2, ....] where the axisIndices would be [2,3,4] and the aff* would be [4x4] since axisIndices is of length 3.

thus there are a few requests here.

  1. that we allow arbitrary dimensions
  2. list of affines as transformations. since each slice could have been arbitrarily rotated and flipped (front to back) having a computed affine could really help place all the pieces together
  3. additional metadata to situation the coordinate system better w.r.t some physical property of the sample by defining a global origin.

this figure may help visualize the situation: https://scalablebrainatlas.incf.org/human/BIGB13

cc: @thewtex

Specification for extensions

It would be nice to have a field to specify custom extension attributes that may be specific to certain reader tools.

And a comment regarding extensions. I don't have much experience on how to do this properly, but I was just simply thinking of another metadata field like extension with keys referring to a tool or library, e.g. tensorstore, and therein an arbitrary dictionary that the tool/library knows how to parse. e.g.

{
'extension': {
 'tensorstore': {
   'voxel_offset': [0, 42, 84]
 }
}

Data dimensionality and axes metadata

In last weeks meeting the question of data dimensionality came up again (in the morning it was raised by @jni, and I think it came up in the afternoon as well).
Currently, the spec demands that all data is 5 dimensional (I think with axis order TCXYZ, but I am not quite sure).

Do we want to lift the restriction and allow data of lower dimensionality? In this case, we would add metadata in multiscales to describe the axes (e.g. "axes": ["x", "y", "z"]).

Note that this is also important for the transformation spec #28, where we need to clarify which axes a transformation applies to.

Independent of the decisions, we should add a field that describes the physical units of the axes, e.g. "units": ["micrometer", "micrometer", "micrometer"].

"Compound" datasets

@axtimwalde @constantinpape @joshmoore

Based on the latest posts of @glyg in #28 I was wondering about the following.

Let's say we have, e.g. a FLIM data set.

I think it could be useful to store it as a 5D data set with these dimensions:

x
y
z
t
c (intensity, lifetime)

Where, in this case, my feeling is that the c dimension is qualitatively different from the other dimensions. Because "moving along the c-axis" will change the unit of the output value (which is not the case for any of the other dimensions).

unit(data[0,0,0,0,0]) = grayValue
unit(data[0,0,0,0,1]) = nanoseconds

Are you guys having any thoughts on this? I mean, should we treat such dimensions that change the unit of the output value differently than other dimensions?

OME-XML equivalent data

Hi all, first, thanks for starting this project - we are considering NGFF / Zarr 3.0 for large 3D multichannel datasets of light-sheet microscopy brain data.

I'm wondering if there is any plan to capture OME-XML data in the Zarr attribute hierarchy, in particular the microscopy data such as Instrument. I'd be happy to participate in the discussion and formulation of this extension.

--Lee

Spec development process

I am a bit confused by the different versions of the spec that are stored in this repository:
We have the specs for the individual versions, right now 0.1 and 0.2 and we have latest.
Currently, 0.2 is further ahead then latest, because it contains the changes to the chunk layout from #40, which are not in latest yet.

In the future, I would propose a more structured dev process:
All changes are made against latest and only when we draft a new release, we copy latest to the folder for the corresponding version and add this with a separate commit (which only copies the spec and does not introduce any further changes).

cc @joshmoore

LUT specification

Opening issue about whether specifying multi-color LUTs would be within scope and how to support it.

Quoting @will-moore from #23

Also in OMERO, we support LUTs e.g. "lut": "cool.lut", but that relies on the server knowing what LUT that refers to and is probably outside the scope of this issue.

For MoBIE we also need support for LUTs like viridis or glasbey.

Multi-scaled image labels

During/despite the summer break, work has progressed on a specification for labeled images (i.e. segmentations). The issue is a working draft for an upcoming post to image.sc. Change suggestions (and PRs of course) welcome.


Summary:

This specification defines a convention for storing multiscale integer-valued labels, commonly used for storing a segmentation. Multiple such labeled images can be associated with a single image:

image/
│
├── 0-4                # Resolution levels of the primary image
│
└── labels             # Group container for the labeled images.
    │
    ├── original       # One or more groups which each represent
    ├── ..             # a multi-scale pyramid of integer values
    └── recalculated   # representing e.g. detected objects.

Use of this draft for the specification of IDR images in S3 is available at https://github.com/ome/omero-ms-zarr/blob/master/spec.md

"labels" group

In order to enable discovery, the well-known group "labels" within each image directory functions as a registry of all known image labels.

See color-key below

-{
-    "labels": [
!        "original",
!        "recalculated"
-    ]
-}

"image-label" group

Each group image label group is itself a multiscale image and should contain the "multiscales" metadata key. However, the presence of the "image-label" key identifies a group as a labeled image. In order to enable discovery each such group should be registered with the "labels" group above. Additionally, labeled images should list their source image in order to enable a bidirectional link.

The primary additional metadata that is currently specified is in the "colors" metadata key. Each label value can be registered in an array

See color-key below

-{
-    "image-label": {
!        "version": "0.1",
!        "source": {
!            "image": "../.."
!        },
+        "colors": [
!            {
-             "label-value": 1,
-             "rgba": [128, 128, 128, 128]
!            }          
-        ]
-    },
-}

Example workflow

If you would like to experiment with this specification, you can install the ome-zarr library via:

pip install ome-zarr==0.0.13

The library provides a napari plugin, which can optionally be activated via:

pip install ome-zarr[napari]==0.0.13

Sample data is available under the test-data subdirectory of the S3 bucket:

$ ome_zarr info https://s3.embassy.ebi.ac.uk/idr/zarr/test-data/labels-0.1.3-dev4/6001247.zarr/
https://s3.embassy.ebi.ac.uk/idr/zarr/test-data/labels-0.1.3-dev4/6001247.zarr/ [zgroup]
 - metadata
   - Multiscales
   - OMERO
 - data
   - (1, 2, 257, 210, 253)
https://s3.embassy.ebi.ac.uk/idr/zarr/test-data/labels-0.1.3-dev4/6001247.zarr/labels/ [zgroup] (hidden)
 - metadata
   - Labels
 - data
https://s3.embassy.ebi.ac.uk/idr/zarr/test-data/labels-0.1.3-dev4/6001247.zarr/labels/1/ [zgroup] (hidden)
 - metadata
   - Label
   - Multiscales
 - data
   - (1, 1, 257, 210, 253)
   - (1, 1, 257, 126, 105)
   - (1, 1, 257, 52, 63)
   - (1, 1, 257, 31, 26)
   - (1, 1, 257, 13, 15)

If you have existing masks in OMERO, you can export your image and masks using omero-cli-zarr:

$ pip install omero-cli-zarr
$ omero zarr export Image:6001240
$ omero zarr masks Image:6001240
$ ome_zarr info 6001240.zarr/
/tmp/6001240.zarr [zgroup]
 - metadata
   - Multiscales
   - OMERO
 - data
   - (1, 2, 236, 275, 271)
/opt/data/6001240.zarr/labels [zgroup] (hidden)
 - metadata
   - Labels
 - data
/tmp/6001240.zarr/labels/0 [zgroup] (hidden)
 - metadata
   - Label
   - Multiscales
 - data
   - (1, 1, 236, 275, 271)
   - (1, 1, 236, 135, 137)
   - (1, 1, 236, 68, 67)

Design trade-offs

Two additional layouts were considered for the labeled data itself. The first was a split representation in which each label was a separate bitmask. This representation is still possible by using multiple labeled images. The other was a 6-dimensional bitmask structure. The benefit of both of these was the support of overlaps. The downside was that many implementations do not natively support a compact representation of bit arrays.

For the metadata, also a number of different configurations were considered for describing metadata about each label value. The primary choice was between an array representation and two sparse representation: a dictionary with the downside of requiring string keys and a list of dictionaries with the downside of possible reduncancy. More details to this discussion are available under "Revamp color metadata" (#62).

Current limitations

Specification

  • Multi-channel labeled images are currently not supported. The colors metadata specification would need to be updated to do so.
  • The current assumption is that for every multiscale level in the image data a layer of equal size will be present in the labeled image.
  • Currently missing metadata:
    • label value for overlaps
    • source array (as opposed to group) of the segmentation

Implementation

Color key

(according to https://www.ietf.org/rfc/rfc2119.txt):

- MUST     : If these values are not present, the multiscale series will not be detected.
! SHOULD   : Missing values may cause issues in future versions.
+ MAY      : Optional values which can be readily omitted.
# UNPARSED : When updating between versions, no transformation will be performed on these values.
Revision Source Date Description
0.1.0 @joshmoore 2020.09.16 Initial version on GitHub

Progressive specification

To elaborate on @joshmoore's summary of my comment here:

https://forum.image.sc/t/next-call-on-next-gen-bioimaging-data-tools-feb-23/48386/9

Minimal, or intermediate, specification: @DragaDoncila and @jni pointed out that some of the specifications are more general purpose and make the current well-suited for non-microscopy data. It is likely worth keeping the specification “approachable” for outsiders with the most general purpose specifications coming first to engage with as many communities as possible.

We should organise the spec progressively, so that you can, say, minimally claim to be an ome-ngff image with just a zarr array and a .zattrs containing {ome-ngff: v0.2} or whatever.

Next, if we want voxel spacing, you can optionally add {'dimensions': ['t', 'c', 'y', 'x']} (note lack of z 😉) and {'spacing': {'t': 5, 'y': 0.5, 'x': 0.5}}. (Happy to argue about whether spacing should be a dict or a list, I'm on the fence about it.) Optionally: {'units': {'t': 's', 'y': 'um', 'x': 'um'}}.

I think strict requirements should be as minimal as possible, and in terms of the ordering of elements within the spec, we should front load those according to how many communities it affects. So, things like "electron beam intensity" should be far down the list because it only affects people who image with electrons, while voxel spacing affects ~everybody.

One question I have is whether there is scope for this format to be more general than the M in OME implies. From @DragaDoncila's and my work, it seems many more communities could make use of this format, so it would be pretty nice if we can keep it generic enough so that they can use it, even if non-microscopy uses are lower in the list of priorities.

5D or List< 4D >

@joshmoore

Citing @will-moore about opening ome.zarr in Napari

...since each image channel is split into a separate 4D layer, and then the labels are another 4D layer ...

I am doing the same for opening it in BDV as there the data model is List< 4D >.

This makes me wonder about the considerations that made you go for 5D rather than a list of 4D arrays (x,y,z,t)?!

I am not saying that I think List< 4D > is better, to me 5D feels quite natural and effective, I am just wondering!

Collections Specification

What is an image collection?
A collection of images is a semantic grouping of two or more associated ome-ngff images and/or image-labels.

This definition could include

  • Images which do not share a physical coordinate space e.g. training dataset of images containing bees
  • Images which share a physical coordinate space and whose storage specification must support sufficient metadata to determine this positioning e.g. high-content screening plates and wells
  • A hierarchy of image groups of arbitrary depth which may or may not share physical coordinates
  • Other things…?

What workflows should it support?
The specification should support implementations being able to traverse the image collection and, where relevant, map the associated metadata to the physical coordinate space for loading these images.

Ideally, the specification should provide sufficient information at each level of a hierarchical grouping to allow for the loading of both the entire collection, and the loading of an arbitrary level of the hierarchy. This can be important when wanting to share/view partial datasets or update only small parts of the entire collection.

Where labels or other related data is provided (e.g. meshes, points…), the specification should support being able to associate any member of the image collection with its associated labels, regardless of the level in the hierarchy.

The OME-NGFF spec is close to supporting this functionality with the HCS specification which allows the positioning of wells into rows and plates. The main drawbacks of this specification are

  • It is too specific to be easily used for images which ARE physically associated but are not HCS acquisitions
  • It may be difficult to understand for researchers who are not working with HCS images but nevertheless wish to store their collection in OME-NGFF format
  • It does not support an arbitrary depth of groupings
  • It does not support collections which are not physically associated

What should it be called?

  • Dataset - this term is already used in various places so may not be the best choice
  • Collection - a general enough term which is currently mostly unused
  • Hierarchical definition - there is a case for this specification being a hierarchy of specifications, with each one defining a more tightly bound collection e.g.
    • Bag - associated images with no metadata
    • Stack - associated images which overlap in physical space
    • Panorama - associated images which stitch together in physical space

Ideally, the names used in the base specification would be general enough to support a broad variety of use cases and tailored use cases could be demonstrated using examples in the documentation.

Reference specifications
BDV XML Files
SVG
TrakEM2
Napari Plugin for image-label collections
mobie grid view of many sources

Related
Image.SC discussion on collections
Live notes from latest community call
HCS Specification

What next?
I think we should first decide on whether we want to support arbitrary levels in the hierarchy and whether we want a general spec which we can “inherit” from for more detailed specs, or whether we want one spec to rule them all.

My vote is that we define the most generic collection (a “bag” of images) which works with arbitrary levels of grouping (it’s collections all the way down), and then work to add to it for more complex collections. I will be working on this over the coming week and will post here once I have something working, but of course would love to hear what everyone’s thoughts are on the best way forward.

Decoupling version numbers of different specifications

While working on #46 I noticed that we now need to bump the version number for the HCS spec, although nothing in it has changed. In the future with more specifications for different OME-NGFF formats, this can become pretty confusing, because any change in one of the specifications will require a "global" version bump.

I am not sure if there is a better solution, because having diverging version numbering in the same repository could introduce it's own set of problems. But I still wanted to bring this up for discussion.

Spec naming style

When looking to add new keys to the spec I wonder how to name e.g. position_x I wonder what the naming style should be e.g.

  • position_x
  • positionX
  • position-x
  • positionx

Looking at the current spec, we have all of these styles there already, reflecting the origin of various terms (mea culpa):

  • image-label
  • defaultT - Rendering definition (omero). to be replaced by rendering spec?
  • maximumfieldcount - HCS, to be replace by Collections spec
  • field_count - Also HCS.

So, which of these do we prefer? Python style, Java style or something else?

HCS group layout

Following the support for a multiscales and masks, the focus is now shifting to trying to represent HCS data in the NGFF spec. An initial prototype of plate layout had already been implemented in the context of the OME Community Meeting 2020 - https://github.com/ome/omero-guide-cellprofiler/blob/3a441e5594b80e8e95e5e473baa8da140db03656/notebooks/idr0002_zarr.ipynb.

Overall it feels like the HCS specification should primarily revolve around:

  • the specification of extra group(s) above the multiscales modelling the HCS concept
  • the specification of the metadata conventions associated with each group

The number of effective dimensions currently supported by the OME model and the various HCS datasets produced by the community are: Plate, Plate Acquisition (also called Plate Run), Well, Well Sample(also called Field of View). The first question is whether how flat vs deep the Zarr folder hierarchy should be to represent these concepts. The two layout below are put for discussion.

All names, layout and content are still up for discussion at this stage.

Option 1: single group

This is the closest to the implementation mentioned above where a series of multiscale images aka Zarr groups (potentially with labels) are collected within a plate Zarr group. Each multiscale image represents a field of view within a well within a plate acquisition with its metadata specified in a dedicated well sample specification.

└── plate.zarr                # Plate
    ├── .zgroup
    ├── .zattrs               # Implements "plate"
    ├── 0                     # First field of view 
    │   ├── .zgroup
    │   ├── .zattrs           # Implements "multiscale", "omero", "well sample" 
    │   ├── 0
    │   │   ...               # Resolution levels
    │   ├── n
    │   └── labels
    ├── ...                   # Field of views
    └── n                    

Pros:

  • this keeps the addition fairly minimal and does not create a large nested specialized structure
  • outside the HCS use case, this simple layout representation could potentially be generalized or at least concepts could be re-used for representing multi-position acquisitions i.e. a group of images related via some spatial context

Cons:

  • some of the classical HCS look-ups ("find all fields of view within a well") involve traversing all elements and iterating over the attributes
  • many HCS datasets will easily exceed 10K images per plate acquisition these days - typical number of wells range between 96 and up to 1536 and some acquisitions systems will image ~300 fields of view per well. From the classical experience of HCS file formats (which can easily create 10K-100K binary files under a single folder), we know large number of folders can lead to performance issues on file systems.

Option 2: plate/acquisition/well/well sample

In this proposal, three groups are inserted above the image group: plate, plate acquisition and well. Each multiscale image represents a field of view within a well within a plate acquisition. The full HCS metadata is distributed across the plate acquisition, well and well sample specifications.

└── plate.zarr                    # Plate
    ├── .zgroup
    ├── .zattrs                   # Implements "plate"
    │
    ├── 0                         # First plate acquisition
    │   │
    │   ├── .zgroup
    │   ├── .zattrs               # Implements "plate acquisition"
    │   │
    │   ├── 0                     # First well
    │   │   ├── .zgroup
    │   │   ├── .zattrs           # Implements "well"
    │   │   ├── 0                 # First field of view
    │   │   │   ├── .zgroup
    │   │   │   ├── .zattrs       # Implements "multiscale", "omero", "well sample"
    │   │   │   ├── 0
    │   │   │   │   ...             # Resolution levels
    │   │   │   ├── n
    │   │   │   └── labels
    │   │   ├── ...               # Field of views
    │   │   └── n
    │   ├── ...                   # Wells
    │   └── m
    ├── ...                       # Plate acquisitions
    └── l

Pros:

  • for most HCS datasets, this should limit the maximal number of sub-groups to be of the order 1K
  • the structure is more amenable to the classical ways to store and introspect HCS elements

Cons:

  • in many cases, there is only one acquisition per plate and the second group will be a singleton
  • this requires four new specifications as opposed to two to describe the various levels. The added complexity might be a barrier to adoption

Option 3: plate/acquisition/row/column/well sample

See https://github.com/ome/omero-ms-zarr/issues/73#issuecomment-706770955

Group names

In both example above, 0, 1,...n are used as the generic group names. Using more explicit informative names reflecting the acquisition e.g. A1, A2, ... or A1 Field 1, A1 Field 2... is definitely a possibility. Given the number of variants found in the ecosystem, I would avoid trying to enforce these names and/or rely on them. Instead the corresponding metadata (typically row, column, index) should be unambiguously specified within the .zattrs of the relevant group(s).

Support for multi-channel labels

@will-moore @joshmoore @constantinpape

What is meaning of the channel dimension for the label images?

I could imagine:

  1. It must be a singleton dimension, where only channel 0 exists
  2. If the intensity image has multiple channels, each channel could have its own segmentation (label-image), and the channel dimension of the label image corresponds to the channel dimension of the intensity image

Is there already a spec for this?

json-ld, "omero" metadata and labels paths

A couple of outstanding issues with the current spec from https://github.com/ome/omero-ms-zarr/issues/76#issuecomment-722279765 that we may want to fix before "v0.2" since they are breaking changes.

  • Plate has "plate":{} and a Well has "well":{} but an Image has "omero":{}, "multiscales":{}, so maybe that omero key should be renamed to image to be consistent?
  • Labels should have an object with 'path', e.g. "labels": [{"path": "0"}, {"path": "1"}] to be consistent with elsewhere, instead of ["0", "1"]

Mesh specification

As discussed in the feb. 2021 ngff community call, and following this image.sc thread

The idea is to follow PLY specification to store meshes in ome-zarr. A ply file is organised in:

  • a header (translated in the .attrs of the zarr group)
  • the points (a zarr.Array)
  • for each element (e.g. faces) the list of vertices. Those are group by number of sides, with one zarr.Array per polygon.

There is a draft implementation here: https://github.com/centuri-engineering/ply-zarr

Some questions:

  • I don't know how to deal with binary formats
  • PLY allows for datatype specification at the column level, how to deal with that in zarr?

Transformation Specification

This is a feature request towards v0.2 of the multi-scale image spec:
Support (affine) transformations for images to apply these on the fly (e.g. to map images to a shared coordinate system).

So far, I have come around two different implementations that contain a proposal for this:

Open Organelle

{
"datasets": [
      {
        "path": "s0",
        "transform": {
          "axes": ["z", "y", "x"],
          "scale": [1, 1, 1],
          "translate": [0, 0, 0],
          "units": ["nm", "nm", "nm"]
         }
       }
   ]
} 

The transformation metadata is also duplicated in the individual datasets.

I2K Tutorial

Same as the open organelle proposal, but the transformation is defined for on the dataset level, not for each individual scale:

{
"datasets": [...],
"transform": {...}
}

The transformation metadata is not duplicated.

(I have left out the nested multiscale etc. in both cases to make this more readable.)

I would be happy to develop this into a PR to the spec, but it would be good to have some feedback first:

  • Is there a competing approach we should take into account? Any general critique?
  • Should the transformation be defined per dataset or per scale level (I think per scale level actually makes more sense).
  • Do we add rotation and shear to transformation, so that a full affine trafo can be specified?

@joshmoore @frauzufall @kephale @axtimwalde @d-v-b @tischi @jni
(please tag anyone else who might be interested in this)


Original text from #12:

Originally discussed as part of the multiscales specification, scale metadata related to each volume in a multiscale image needs storing for the proper interpretation of the data.

see also ome/omero-iviewer#359 in that each level should be strictly smaller than the last.

json terminology

Do we have a fixed convention for how to use json terminology in the spec?
For object ({}) I think we are currently using dictionary. And for array ([]) we are using list.
I.e. we are using the names of the equivalent python data structures. I think this is fine (esp. since we don't want to use array due to the overloaded meaning in the zarr spec), but it should be stated somewhere in the preamble of the spec.

Multiscales questions

@joshmoore @will-moore
Multiscales is an array: Multiscale[] multiscales = n5.getAttribute( "", "multiscales", Multiscale[].class );
I don't understand in which scenarios this array could have more than one item?

Initial review

Please add comments/TODOs here for cleaning up this repository after its migration from https://github.com/joshmoore/ngff.

You can see a published version of the main branch under: https://ngff.openmicroscopy.org/

  • use orchids
  • Add other editors
  • Review copyright line (see original w3c example below)
  • Review title (should be added to a "how to cite" section)
  • Proof introduction section
  • Tag 0.1
  • Migrate content for labels, etc. and re-tag

Original W3C copyright

Copyright © [YEAR] W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and permissive document license rules apply.

Add scale factors used for MIP to multiscales metadata

The multiscales metadata currently does not contain any information about the downscaling factors used for generating the different scale datasets (MIP). This information is necessary for visualisation in a couple of tools, e.g. bigdataviewer. See the bdv.n5 format for an example on how to store it. As a workaround, we have been computing the factors on the fly based on the extent of the scale datasets/arrays.
However, there are some corner cases, e.g. if the extents are not divisible by the scale factors. So I think it would be good to add this as an (optional) field to the metadata.

We could either add it in a similar fashion to BDV and add a field downsamplingFactors = [[1, 1, 1], [2, 2, 2], [4, 4, 4], ...].
Or we could add it to the datasets field: "datasets": [{"path": "0", "downsamplingFactor": [1, 1, 1]}, {"path": "1", "downsamplingFactor": [2, 2, 2]}, ...].

NGFF dataset validator

A tool to validate whether a dataset follows the NGFF spec. Per-version validation. Generate a visual and programmatic summary required and optional features and any errors related to types, etc.

Z-downsampling metadata

It's possible that we may generate multiple "datasets" for a given OME-Zarr Image, e.g. with and without down-sampling in Z. It could be useful to have a way to identify these.

See hms-dbmi/vizarr#71 (comment)
"will there be any way to distinguish between the downsampled-in-z version of a dataset and the non-downsampled version? The shape of the zarr array, I presume? But will there be any metadata about it? Just curious........."

@joshmoore "One thing we could do at the moment (pre-lightsheet changes) would be to have a naming convention for the multiscales themselves. Another option would be to use the metadata for the method of the downsampling, but we’d need something well-defined to say “in-z".

Clarify that "image-label" and "multiscales" are siblings

I found it spec slightly confusing, based on this wording:

3.4. "image-label" metadata
Groups containing the image-label dictionary represent an image segmentation in which each unique pixel value represents a separate segmented object. image-label groups MUST also contain multiscales metadata

Since the "multiscales" object is not shown in the example, I could imagine that image/labels/0/.zattrs could contain "image-label" that MUST contain "multiscales" like:

{
    "image-label": {
        "multiscales": [],
        "colors": [],
    }
}

The 'tree' outline doesn't clarify this either - doesn't mention that .zattrs here MUST contain "multiscales"

├── .zattrs   # Metadata of the related image and as well as display information under the "image-label" key.

In fact, image/labels/0/.zattrs should be:
{
"multiscales": [],
"image-label": {
"colors": [],
}
}

cc @sbesson

Managing image segmentation data (mutability of ome.zarr)

@joshmoore @constantinpape

We (ping @cgirardot) have been thinking a bit about a data management with ome.ngff and had a question/ concern.

Let's say you start with an ome.zarr container that only contains the raw data and then you compute a segmentation (label mask image).

If you add this label mask image into the original ome.zarr, you sort of mutate its identity, because its content is changing, which may not be ideal from a data management perspective.

If you instead were to create a new ome.zarr containing both the raw data and the segmentation, you would have to copy the raw data, which may be prohibitive.

So we were wondering if the idea is to create a new ome.zarr container that only contains the label mask data and a link to the raw data, such that viewers would still open it as if it would contain both the raw and segmentation data.

Any thoughts on this?

(Nonlinear) Transformation specification

Start of discussion. @tischi @LeeKamentsky

In my view, there are so many ways to store / transform images that trying covering all cases is premature right now. e.g. are we storing BSpline coefficients? are we storing ThinPlateSpline knots and weights? something else?

The one exception is that the use displacement fields is pretty standard representation across tools. And, my hope is that storing a displacement field will be "easy" once other aspects of the standard are worked out.

Specifically

  1. Where are grid pixels stored in physical space
  2. Which axis indexes the "vector" dimension.
  3. Maybe more broadly "how do I interpret this data"

There's something we started working on here, but it's not especially general (and not meant to be).

Remote links

At various locations in the current specification, a path is stored to another group or array within the same fileset:

  • The datasets of multiscales link to the individual arrays comprising the pyramid.
  • Labels link individual labeled images.
  • Plates link to the wells that are contained.

So far, these paths are constrained to being relative to the same fileset and in some cases even to being within the same group. A generic mechanism for linking between filesets should be developed which can then be used across other NGFF specifications.

Compatibility with xarray

With the aspiration for OME-Zarr to be The One Imaging Format to Rule them All 💍 , I would like to propose compatibility with xarray. For the most part, the needs of the:

  • bioimaging
  • geospatial imaging
  • medical imaging
  • many other scientific imaging domains

overlap. A common, well-supported standard will facilitate integration and cross-pollination across communities, and avoid those I/O headaches 🤯 .

In summary, we could extend the current OME-Zarr spec to be compatible with the result of xarray.Dataset.to_zarr, in a way that adds spatial metadata, addressing #28 #12, through the xarray encoded coords using scientific imaging dimensions, x, y, z, c, t, standard in OME-Zarr, for the xarray array dimensions, making their name and order explicit #35.

Resulting consolidated metadata from idr0094
{
    "metadata": {
        ".zattrs": {
            "multiscales": [
                {
                    "datasets": [
                        {
                            "path": "0/idr0094"
                        },
                        {
                            "path": "1/idr0094"
                        },
                        {
                            "path": "2/idr0094"
                        },
                        {
                            "path": "3/idr0094"
                        },
                        {
                            "path": "4/idr0094"
                        },
                        {
                            "path": "5/idr0094"
                        }
                    ],
                    "name": "idr0094",
                    "version": "0.1"
                }
            ]
        },
        ".zgroup": {
            "zarr_format": 2
        },
        "0/.zattrs": {},
        "0/.zgroup": {
            "zarr_format": 2
        },
        "0/c/.zarray": {
            "chunks": [
                3
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "lz4",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "<u4",
            "fill_value": null,
            "filters": null,
            "order": "C",
            "shape": [
                3
            ],
            "zarr_format": 2
        },
        "0/c/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "c"
            ]
        },
        "0/idr0094/.zarray": {
            "chunks": [
                270,
                540,
                2
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "zstd",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "|u1",
            "fill_value": null,
            "filters": null,
            "order": "C",
            "shape": [
                1080,
                1080,
                3
            ],
            "zarr_format": 2
        },
        "0/idr0094/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "y",
                "x",
                "c"
            ],
            "direction": [
                [
                    1.0,
                    0.0
                ],
                [
                    0.0,
                    1.0
                ]
            ],
            "ranges": [
                [
                    0.0,
                    255.0
                ],
                [
                    0.0,
                    255.0
                ],
                [
                    0.0,
                    255.0
                ]
            ]
        },
        "0/x/.zarray": {
            "chunks": [
                1080
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "lz4",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "<f8",
            "fill_value": "NaN",
            "filters": null,
            "order": "C",
            "shape": [
                1080
            ],
            "zarr_format": 2
        },
        "0/x/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "x"
            ]
        },
        "0/y/.zarray": {
            "chunks": [
                1080
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "lz4",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "<f8",
            "fill_value": "NaN",
            "filters": null,
            "order": "C",
            "shape": [
                1080
            ],
            "zarr_format": 2
        },
        "0/y/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "y"
            ]
        },
        "1/.zattrs": {},
        "1/.zgroup": {
            "zarr_format": 2
        },
        "1/c/.zarray": {
            "chunks": [
                3
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "lz4",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "<u4",
            "fill_value": null,
            "filters": null,
            "order": "C",
            "shape": [
                3
            ],
            "zarr_format": 2
        },
        "1/c/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "c"
            ]
        },
        "1/idr0094/.zarray": {
            "chunks": [
                64,
                64,
                64
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "zstd",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "|u1",
            "fill_value": null,
            "filters": null,
            "order": "C",
            "shape": [
                540,
                540,
                3
            ],
            "zarr_format": 2
        },
        "1/idr0094/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "y",
                "x",
                "c"
            ],
            "direction": [
                [
                    1.0,
                    0.0
                ],
                [
                    0.0,
                    1.0
                ]
            ],
            "ranges": [
                [
                    0.0,
                    255.0
                ],
                [
                    0.0,
                    255.0
                ],
                [
                    0.0,
                    255.0
                ]
            ]
        },
        "1/x/.zarray": {
            "chunks": [
                540
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "lz4",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "<f8",
            "fill_value": "NaN",
            "filters": null,
            "order": "C",
            "shape": [
                540
            ],
            "zarr_format": 2
        },
        "1/x/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "x"
            ]
        },
        "1/y/.zarray": {
            "chunks": [
                540
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "lz4",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "<f8",
            "fill_value": "NaN",
            "filters": null,
            "order": "C",
            "shape": [
                540
            ],
            "zarr_format": 2
        },
        "1/y/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "y"
            ]
        },
        "2/.zattrs": {},
        "2/.zgroup": {
            "zarr_format": 2
        },
        "2/c/.zarray": {
            "chunks": [
                3
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "lz4",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "<u4",
            "fill_value": null,
            "filters": null,
            "order": "C",
            "shape": [
                3
            ],
            "zarr_format": 2
        },
        "2/c/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "c"
            ]
        },
        "2/idr0094/.zarray": {
            "chunks": [
                64,
                64,
                64
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "zstd",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "|u1",
            "fill_value": null,
            "filters": null,
            "order": "C",
            "shape": [
                270,
                270,
                3
            ],
            "zarr_format": 2
        },
        "2/idr0094/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "y",
                "x",
                "c"
            ],
            "direction": [
                [
                    1.0,
                    0.0
                ],
                [
                    0.0,
                    1.0
                ]
            ],
            "ranges": [
                [
                    0.0,
                    255.0
                ],
                [
                    0.0,
                    255.0
                ],
                [
                    0.0,
                    255.0
                ]
            ]
        },
        "2/x/.zarray": {
            "chunks": [
                270
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "lz4",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "<f8",
            "fill_value": "NaN",
            "filters": null,
            "order": "C",
            "shape": [
                270
            ],
            "zarr_format": 2
        },
        "2/x/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "x"
            ]
        },
        "2/y/.zarray": {
            "chunks": [
                270
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "lz4",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "<f8",
            "fill_value": "NaN",
            "filters": null,
            "order": "C",
            "shape": [
                270
            ],
            "zarr_format": 2
        },
        "2/y/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "y"
            ]
        },
        "3/.zattrs": {},
        "3/.zgroup": {
            "zarr_format": 2
        },
        "3/c/.zarray": {
            "chunks": [
                3
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "lz4",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "<u4",
            "fill_value": null,
            "filters": null,
            "order": "C",
            "shape": [
                3
            ],
            "zarr_format": 2
        },
        "3/c/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "c"
            ]
        },
        "3/idr0094/.zarray": {
            "chunks": [
                64,
                64,
                64
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "zstd",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "|u1",
            "fill_value": null,
            "filters": null,
            "order": "C",
            "shape": [
                135,
                135,
                3
            ],
            "zarr_format": 2
        },
        "3/idr0094/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "y",
                "x",
                "c"
            ],
            "direction": [
                [
                    1.0,
                    0.0
                ],
                [
                    0.0,
                    1.0
                ]
            ],
            "ranges": [
                [
                    0.0,
                    252.0
                ],
                [
                    0.0,
                    252.0
                ],
                [
                    0.0,
                    252.0
                ]
            ]
        },
        "3/x/.zarray": {
            "chunks": [
                135
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "lz4",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "<f8",
            "fill_value": "NaN",
            "filters": null,
            "order": "C",
            "shape": [
                135
            ],
            "zarr_format": 2
        },
        "3/x/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "x"
            ]
        },
        "3/y/.zarray": {
            "chunks": [
                135
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "lz4",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "<f8",
            "fill_value": "NaN",
            "filters": null,
            "order": "C",
            "shape": [
                135
            ],
            "zarr_format": 2
        },
        "3/y/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "y"
            ]
        },
        "4/.zattrs": {},
        "4/.zgroup": {
            "zarr_format": 2
        },
        "4/c/.zarray": {
            "chunks": [
                3
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "lz4",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "<u4",
            "fill_value": null,
            "filters": null,
            "order": "C",
            "shape": [
                3
            ],
            "zarr_format": 2
        },
        "4/c/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "c"
            ]
        },
        "4/idr0094/.zarray": {
            "chunks": [
                64,
                64,
                64
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "zstd",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "|u1",
            "fill_value": null,
            "filters": null,
            "order": "C",
            "shape": [
                67,
                67,
                3
            ],
            "zarr_format": 2
        },
        "4/idr0094/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "y",
                "x",
                "c"
            ],
            "direction": [
                [
                    1.0,
                    0.0
                ],
                [
                    0.0,
                    1.0
                ]
            ],
            "ranges": [
                [
                    0.0,
                    182.0
                ],
                [
                    0.0,
                    182.0
                ],
                [
                    0.0,
                    182.0
                ]
            ]
        },
        "4/x/.zarray": {
            "chunks": [
                67
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "lz4",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "<f8",
            "fill_value": "NaN",
            "filters": null,
            "order": "C",
            "shape": [
                67
            ],
            "zarr_format": 2
        },
        "4/x/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "x"
            ]
        },
        "4/y/.zarray": {
            "chunks": [
                67
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "lz4",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "<f8",
            "fill_value": "NaN",
            "filters": null,
            "order": "C",
            "shape": [
                67
            ],
            "zarr_format": 2
        },
        "4/y/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "y"
            ]
        },
        "5/.zattrs": {},
        "5/.zgroup": {
            "zarr_format": 2
        },
        "5/c/.zarray": {
            "chunks": [
                3
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "lz4",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "<u4",
            "fill_value": null,
            "filters": null,
            "order": "C",
            "shape": [
                3
            ],
            "zarr_format": 2
        },
        "5/c/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "c"
            ]
        },
        "5/idr0094/.zarray": {
            "chunks": [
                64,
                64,
                64
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "zstd",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "|u1",
            "fill_value": null,
            "filters": null,
            "order": "C",
            "shape": [
                33,
                33,
                3
            ],
            "zarr_format": 2
        },
        "5/idr0094/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "y",
                "x",
                "c"
            ],
            "direction": [
                [
                    1.0,
                    0.0
                ],
                [
                    0.0,
                    1.0
                ]
            ],
            "ranges": [
                [
                    0.0,
                    116.0
                ],
                [
                    0.0,
                    116.0
                ],
                [
                    0.0,
                    116.0
                ]
            ]
        },
        "5/x/.zarray": {
            "chunks": [
                33
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "lz4",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "<f8",
            "fill_value": "NaN",
            "filters": null,
            "order": "C",
            "shape": [
                33
            ],
            "zarr_format": 2
        },
        "5/x/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "x"
            ]
        },
        "5/y/.zarray": {
            "chunks": [
                33
            ],
            "compressor": {
                "blocksize": 0,
                "clevel": 5,
                "cname": "lz4",
                "id": "blosc",
                "shuffle": 1
            },
            "dtype": "<f8",
            "fill_value": "NaN",
            "filters": null,
            "order": "C",
            "shape": [
                33
            ],
            "zarr_format": 2
        },
        "5/y/.zattrs": {
            "_ARRAY_DIMENSIONS": [
                "y"
            ]
        }
    },
    "zarr_consolidated_format": 1
}

Created with this script.

In this example, the array dimensions are y, x, c, i.e. not all 5 dimensions in the current standard, and in a different order. But, these differences could be removed.

After attempting a few variations on this and putting it into practice, this seems to work well.

Each scale can be used independently. Initially, I tried to avoid the use of coords and use the more concise spatial-dimension rank spacing / scale, origin / translation. However, I found that in an array-based computing environment like scientific Python, where slicing is a bread-and-butter operation, the natural validity of 1D coords that can be sliced is helpful. And, in the development of visualization tools, this is quite handy and avoids on-demand generation.
The logic for transforming the spatial metadata is here.

Additionally, there is a growing xarray community, and compatibility helps everyone. Added as an attr is a direction / orientation matrix, which is important in medical imaging.

I am interested in everyone's thoughts. I am grossly behind on GitHub notifications, but I will check in with the discussion on this issue every day or two.

CC @joshmoore @lassoan @rabernat @constantinpape @danielballan @forman

document client feature support

As the spec evolves, it's increasing likely that there will be parts of it that are not supported by all the clients, either because of technical limitations or the work just hasn't been done/released.

It would be really nice for new and existing users to be able to see what is supported by which clients, or what's NOT supported (without having to search the issues of each repo). This would also be a nice way to show what clients are available to view OME-NGFF data.

I'm thinking of something like a support table, similar to e.g. https://caniuse.com/flexbox (although in that case it's a table for a single feature, with different versions of each tool). I'd suggest we limit it to documenting only the latest release of each tool.

viewer Z-downsampling omero info multiscales factor not=2 URL (not s3) v0.3 axes 3D view labels HCS plate
napari y y y y y y (1) y y
vizarr n y n y y n n y
MoBIE y n y n n y y n
  1. napari 3D view only supported for lowest level of multiscales pyramid

If this looks like a popular idea, where would this doc/table live? Include it in the spec page, or a doc page on this repo?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.