Giter Site home page Giter Site logo

Comments (16)

ErlendHaa avatar ErlendHaa commented on June 24, 2024

The error originates in fsspec. It seams like something that should be handled gracefully by zarr

Traceback (most recent call last):
  File "/home/erlend/scripts/zarr/mdiotest.py", line 22, in <module>
    il_mask, il_headers, il_data = mdio[180,100:1300,:1000]
  File "/home/erlend/.local/lib/python3.9/site-packages/mdio/api/accessor.py", line 390, in __getitem__
    self._traces[item],
  File "/home/erlend/.local/lib/python3.9/site-packages/zarr/core.py", line 788, in __getitem__
    result = self.get_basic_selection(pure_selection, fields=fields)
  File "/home/erlend/.local/lib/python3.9/site-packages/zarr/core.py", line 914, in get_basic_selection
    return self._get_basic_selection_nd(selection=selection, out=out,
  File "/home/erlend/.local/lib/python3.9/site-packages/zarr/core.py", line 957, in _get_basic_selection_nd
    return self._get_selection(indexer=indexer, out=out, fields=fields)
  File "/home/erlend/.local/lib/python3.9/site-packages/zarr/core.py", line 1252, in _get_selection
    self._chunk_getitems(lchunk_coords, lchunk_selection, out, lout_selection,
  File "/home/erlend/.local/lib/python3.9/site-packages/zarr/core.py", line 1985, in _chunk_getitems
    cdatas = self.chunk_store.getitems(ckeys, on_error="omit")
  File "/home/erlend/.local/lib/python3.9/site-packages/zarr/storage.py", line 1361, in getitems
    results = self.map.getitems(keys_transformed, on_error="omit")
  File "/home/erlend/.local/lib/python3.9/site-packages/fsspec/mapping.py", line 101, in getitems
    return {
  File "/home/erlend/.local/lib/python3.9/site-packages/fsspec/mapping.py", line 104, in <dictcomp>
    if on_error == "return" or not isinstance(out[k2], BaseException)

from mdio-python.

tasansal avatar tasansal commented on June 24, 2024

Hi @ErlendHaa!

Thanks for reporting this.

Your interpretation of the missing chunk treatment is correct. When we ingest we don't write empty chunk keys to the store, and Zarr normally understands this and gracefully returns the fill value when a chunk key doesn't exist.

The behavior you're seeing is therefore not expected and we have never seen it before (maybe adlfs issue, we haven't used Azure much). Can you please share the following versions so we can diagnose further?

Python
MDIO
Zarr
Fsspec
Adlfs

Also, are these conda or pip installed?

from mdio-python.

ErlendHaa avatar ErlendHaa commented on June 24, 2024

Sure thing, my environment:

erlend:~$ python3.9 --version
Python 3.9.12

erlend:~$ python3.9 -m pip --version
pip 22.2.2 from /home/erlend/.local/lib/python3.9/site-packages/pip (python 3.9)

erlend:~$ python3.9 -c "import mdio; print(mdio.__version__)"
0.2.0

erlend:~$ python3.9 -c "import zarr; print(zarr.__version__)"                                                                                                                                                                                   
2.12.0

erlend:~$ python3.9 -c "import fsspec; print(fsspec.__version__)"                                                                                                                                                                               
2022.8.2

erlend:~$ python3.9 -c "import adlfs; print(adlfs.__version__)"                                                                                                                                                                                 
2022.0

from mdio-python.

ErlendHaa avatar ErlendHaa commented on June 24, 2024

Too narrow it down a bit a I stored the same .mdio file to disk. Then it reads just fine.

from mdio-python.

tasansal avatar tasansal commented on June 24, 2024

Too narrow it down a bit a I stored the same .mdio file to disk. Then it reads just fine.

This is very helpful, thank you.
It is starting to feel like it is an adlfs and Zarr integration issue, I will drill down a little and report it if that is the case.

from mdio-python.

tasansal avatar tasansal commented on June 24, 2024

@ErlendHaa, I was able to reproduce your issue.

It worked fine on GCP and AWS but I will double check again in case an update broke something.
If it works on other clouds, we can bring this up with Zarr developers and they'll fix it upstream.

The demo file is also zero-padded; just like your file, it will have "empty" chunks that are not on the object store.

Steps:

  1. Create Azure Storage Account defaultmdio. Default settings.
  2. Create container mdio-test.
  3. Grabbed the account key from Azure Portal.
  4. Ran the MDIO Quickstart with the following syntax changes:
from mdio import segy_to_mdio

default_storage_options={'account_name': "defaultmdio", 'account_key': "..."}

segy_to_mdio(
    segy_path="filt_mig.sgy",
    mdio_path_or_buffer="az://mdio-test/filt-mig.mdio",
    index_bytes=(181, 185),
    index_names=("inline", "crossline"),
    storage_options=default_storage_options,
    chunksize=(16, 16, 1024),  # to get the empty chunks because file is small
)
  1. Can't query the file because of KeyError

from mdio-python.

ErlendHaa avatar ErlendHaa commented on June 24, 2024

Great! How should we proceed? Do you want me to make an issue upstream to zarr ?

from mdio-python.

ErlendHaa avatar ErlendHaa commented on June 24, 2024

I tracked down the root cause to the adlfs.AzureBlobFileSystem._expand_path method [1]. More specifically to this continue [2] which strips out paths to none-exising blobs. This method is called by adlfs.AzureBlobFileSystem.cat [3] which again is called by fsspec.FSmap.getitems [4]. The continue basically undermines the "omit" option in getitems and cat by striping the path list anyway. As a result getitem raises on KeyError when trying to index on of the striped out paths.

I guess we can close this issue now, as the bug is clearly unrelated to mdio. I'll make an upstream issue for it. Thanks for the help!

[1] https://github.com/fsspec/adlfs/blob/591485b9d77448cd6e791b49bda8942ef03507bf/adlfs/spec.py#L1672
[2] https://github.com/fsspec/adlfs/blob/591485b9d77448cd6e791b49bda8942ef03507bf/adlfs/spec.py#L1725
[3] https://github.com/fsspec/adlfs/blob/591485b9d77448cd6e791b49bda8942ef03507bf/adlfs/spec.py#L1610
[4] https://github.com/fsspec/filesystem_spec/blob/bb9989ce5bf0ed0c0a5f7d3540c3a59581d259ce/fsspec/mapping.py#L69

from mdio-python.

tasansal avatar tasansal commented on June 24, 2024

@ErlendHaa thanks a lot for all the debugging! You can go ahead and open an issue with Zarr. I'll run a couple more tests on other clouds and if it works ok there I'll close this issue.

Again, thanks a lot!

from mdio-python.

tasansal avatar tasansal commented on June 24, 2024

@ErlendHaa, I opened an issue with a minimal reproducible example.

I tested, and this does NOT happen on Google Cloud or S3.

Thanks for finding this!

from mdio-python.

ErlendHaa avatar ErlendHaa commented on June 24, 2024

Sorry, forgot to reply back! The bug lies with adlfs, not zarr. I submitted a patch for it, which addresses the root cause of the KeyError. Hopefully they'll accept it

from mdio-python.

tasansal avatar tasansal commented on June 24, 2024

Here is reference to the PR by @ErlendHaa

fsspec/adlfs#350

from mdio-python.

tasansal avatar tasansal commented on June 24, 2024

@ErlendHaa, is this issue resolved with the latest adlfs released a couple weeks ago (2022.10.0)?

If it works as expected, we can close this issue. Thanks again!

from mdio-python.

ErlendHaa avatar ErlendHaa commented on June 24, 2024

Sadly, no! I'm not sure what happened there tbh. They seemed to approve my PR, but did not include it in their linted version of it

from mdio-python.

ErlendHaa avatar ErlendHaa commented on June 24, 2024

But as the cause definitely lies with adlfs I thing we can close this one

from mdio-python.

tasansal avatar tasansal commented on June 24, 2024

Sounds good. This is a big problem since many enterprise users use Azure. I will follow up with adlfs.

Opened an issue with adlfs to redo that PR fsspec/adlfs#358

from mdio-python.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.