There's an alpha release available now that can be installed using <code class="notran

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

I'll try again when the next alpha comes out <p dir="au

For reference: <div class="snippet-clipboard-content notranslate position-relative

Here's the branch where I've been experimenting with Zarr v3: <a href="https://github.

Run tests against Zarr 3 about bio2zarr HOT 13 OPEN

tomwhite commented on September 13, 2024

Run tests against Zarr 3

from bio2zarr.

Comments (13)

tomwhite commented on September 13, 2024 2

Hi @d-v-b, thanks for commenting! We'd actually like to encode variable-length strings, which use an object dtype and a numcodecs.vlen.VLenUTF8 codec.

I've added a comment explaining our use case to the discussion at zarr-developers/zarr-specs#83

from bio2zarr.

tomwhite commented on September 13, 2024 2

I filed the following issues to improve Zarr Python v3 API compatibility here:

from bio2zarr.

tomwhite commented on September 13, 2024 1

I'll try again when the next alpha comes out

There is no next alpha according to zarr-developers/zarr-python#1777.

from bio2zarr.

d-v-b commented on September 13, 2024 1

fwiw I would love to chart a path to getting more dtypes into zarr v3 (since i happen to be working on the v3 fill value normalization right now). as you noted, there's a dtype extension mechanism built into the spec but we haven't exercised it yet. Could you share or link to a description of how you are using string arrays, either here or in a discussion over in zarr specs? That might help kick-start things.

from bio2zarr.

tomwhite commented on September 13, 2024 1

I updated https://github.com/tomwhite/bio2zarr/tree/zarr-v3 to use the code from zarr-developers/zarr-python#2036 and the test still passes.

from bio2zarr.

jeromekelleher commented on September 13, 2024

I had a quick look at this and it seems like quite a bit of stuff is broken. I hit a wall at finding a way to specify a name for a new array in a Group. This seems pretty basic and I don't have time to chase down the details, so I'll try again when the next alpha comes out.

from bio2zarr.

jeromekelleher commented on September 13, 2024

I just had another go with the tip of the v3 branch, and still hitting walls with array creation. The support for creating v2 arrays seems to be pretty thin, and it's not clear at all to me how we're supposed to go about it. I don't really know where to start tbh.

from bio2zarr.

jeromekelleher commented on September 13, 2024

For reference:

python3 -m pip install pip install git+https://zarr-developers/zarr-python

from bio2zarr.

tomwhite commented on September 13, 2024

I'll take a look

from bio2zarr.

tomwhite commented on September 13, 2024

Here's the branch where I've been experimenting with Zarr v3: https://github.com/tomwhite/bio2zarr/tree/zarr-v3

After making changes to adapt to the different v3 API, it's now failing because Zarr v3 doesn't support string dtypes:

_____________________________________________________________ TestVcfZarrWriterExample.test_encode_partition[0] _____________________________________________________________

self = <tests.test_vcz.TestVcfZarrWriterExample object at 0x7f9dc9a75b40>
icf_path = PosixPath('/private/var/folders/9j/h1v35g4166z6zt816fq7wymc0000gn/T/pytest-of-tom/pytest-1771/data11/example.exploded')
tmp_path = PosixPath('/private/var/folders/9j/h1v35g4166z6zt816fq7wymc0000gn/T/pytest-of-tom/pytest-1771/test_encode_partition_0_0'), partition = 0

    @pytest.mark.parametrize("partition", [0, 1, 2])
    def test_encode_partition(self, icf_path, tmp_path, partition):
        zarr_path = tmp_path / "x.zarr"
        vcf2zarr.encode_init(icf_path, zarr_path, 3, variants_chunk_size=3)
        partition_path = zarr_path / "wip" / "partitions" / f"p{partition}"
        assert not partition_path.exists()
>       vcf2zarr.encode_partition(zarr_path, partition)

tests/test_vcz.py:508: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
bio2zarr/vcf2zarr/vcz.py:1064: in encode_partition
    writer.encode_partition(partition)
bio2zarr/vcf2zarr/vcz.py:683: in encode_partition
    self.encode_id_partition(partition_index)
bio2zarr/vcf2zarr/vcz.py:801: in encode_id_partition
    vid.flush()
bio2zarr/core.py:145: in flush
    sync_flush_1d_array(
bio2zarr/core.py:163: in sync_flush_1d_array
    zarr_array[offset : offset + np_buffer.shape[0]] = np_buffer
../zarr-python/src/zarr/array.py:984: in __setitem__
    self.set_basic_selection(cast(BasicSelection, pure_selection), value, fields=fields)
../zarr-python/src/zarr/array.py:1198: in set_basic_selection
    sync(self._async_array._set_selection(indexer, value, fields=fields, prototype=prototype))
../zarr-python/src/zarr/sync.py:92: in sync
    raise return_result
../zarr-python/src/zarr/sync.py:51: in _runner
    return await coro
../zarr-python/src/zarr/array.py:518: in _set_selection
    value_buffer = prototype.nd_buffer.from_ndarray_like(value)
../zarr-python/src/zarr/buffer.py:339: in from_ndarray_like
    return cls(ndarray_like)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <[AttributeError("'NDBuffer' object has no attribute '_data'") raised in repr()] NDBuffer object at 0x7f9ddd467250>
array = array(['.', '.', 'rs6054257'], dtype=object)

    def __init__(self, array: NDArrayLike):
        # assert array.ndim > 0
>       assert array.dtype != object
E       AssertionError

../zarr-python/src/zarr/buffer.py:286: AssertionError

This is a major limitation for us. The Zarr v3 core spec does not cover strings.

They are likely to be a future extension:

The set of data types specified in v3 is less than in v2. Additional data types will be defined via extensions.

from bio2zarr.

jeromekelleher commented on September 13, 2024

Thanks Tom!

Yeesh, the lack of strings is scary. Looks like we'll be on v2 for a long time then.

from bio2zarr.

tomwhite commented on September 13, 2024

As an experiment I tried creating a VLenUTF8Codec which uses numcodecs.vlen.VLenUTF8. I can successfully run a test that writes a VCF Zarr file and then validates it:

pytest 'tests/test_vcf_examples.py::test_by_validating[sample.vcf.gz]'

from bio2zarr.

jeromekelleher commented on September 13, 2024

Amazing! 🎉

from bio2zarr.

Run tests against Zarr 3 about bio2zarr HOT 13 OPEN

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent