Giter Site home page Giter Site logo

Comments (19)

trhallam avatar trhallam commented on June 26, 2024

@musiczhzhao I'm sorry but this isn't a priority for me right now. I wasn't actually aware anyone was using the ZGY functionality in Segysak.

I am happy to accept pull requests if you would like to add the functionality. The idea is to minimise memory usage by streaming data from one format to the other.

Does the current functionality not work for your large files?

from segysak.

da-wad avatar da-wad commented on June 26, 2024

@musiczhzhao If you're looking for a pip-installable Python package to look at a small subset of data from ZGY files may I recommend taking a look at pyZGY: https://github.com/equinor/pyzgy

Actually, @trhallam, this package would probably make ZGY support easier to implement for SEGY-SAK as the installation is simple and the syntax mirrors segyio :-)

from segysak.

trhallam avatar trhallam commented on June 26, 2024

@da-wad , that's cool! I didn't know about this package. Definitely happy to devolve zgy handling to pyzgy at some point. Probably just needs some drop in changes to the main part of segysak.segy.

from segysak.

da-wad avatar da-wad commented on June 26, 2024

With a brief glance at _segy_loader.py it looks like the only thing missing from the pyZGY segyio-mimicing interface which you use there is mmap() which could be skipped for ZGY, or implemented in pyZGY as a function returning False if you wanted to keep it strictly drop-in...

The implementation of mirroring segyio.trace does include some caching so that you're not re-reading the same bricks over and over, but it's not designed for sequential reading of the whole file, so to make it performant there would also need to be an option to increase this cache size: https://github.com/equinor/pyzgy/blob/master/pyzgy/loader.py#L30

from segysak.

trhallam avatar trhallam commented on June 26, 2024

Sounds good.

Just a thought, mmap is meant to improve performance in segyio when reading large datasets. Maybe this function could be used to set the cache size with a sensible default keyword argument? Perhaps a bit obscure in terms of the name but it would then be consistent with segyio.

from segysak.

musiczhzhao avatar musiczhzhao commented on June 26, 2024

@musiczhzhao I'm sorry but this isn't a priority for me right now. I wasn't actually aware anyone was using the ZGY functionality in Segysak.

I am happy to accept pull requests if you would like to add the functionality. The idea is to minimise memory usage by streaming data from one format to the other.

Does the current functionality not work for your large files?

Hi @trhallam! Sorry for late response. The current functionality works fine when the whole segy can fit in the memory, but will encounter issue when the size became large.

I am happy to implement this feature, if I can have more information regarding the overall design of the package. Currently as I can see the functionalities of the segy and zgy are separated and data were converted to an internal format as a middle step. If we implement this new function in the _openzgy.py, is there a preferred approach?

Hi @da-wad - I haven't tried the pyZGY package yet but from the "About" looks like it is designed to only do the reading instead of both reading and writing? I will take a closer look.

Zhao

from segysak.

da-wad avatar da-wad commented on June 26, 2024

Hi @musiczhzhao ... that is correct, although the openzgy package which pyZGY adapts to segyio-like syntax does support writing. So it would be possible to build this functionality too, and I'm open to PRs :-)

from segysak.

trhallam avatar trhallam commented on June 26, 2024

Hi @musiczhzhao

ZGY was a bit of an after thought so it is in it's own area at the moment. The fact that pyzgy mirrors segyio though is an opportunity to leverage much of the existing segysak.segy code. We'll want to preserve the current structure but see if we can create some shared code under the hood.

The current model has some dispatch functions at the top level like segy_loader which call the underlying functions to perform different tasks, like loading 3D or 2D data, or doing file conversion. It is the underlying functions that we will need to modify to call either segyio or pyzgy. Then a zgy_loader function would need to be written that models segy_loader. I think preserving the old module structure is preferable, because it would maintain backwards compatibility.

For your work where the ZGY is too big for memory, porting to the segy_loader will offer a few benefits:

  • The ability to load subsets of the data.
  • Ability to convert (stream) the data to HDF5 which facilitates lazy loading.
  • @da-wad has ZFP made it fully into numcodecs yet? I know it was on a dev branch some time back. But it might be worth adding ZARR support as well if all this is happening.

I think the way forward is to do the following.

  • Try to drop pyzgy for segyio in the segysak.segy part of the software and see what breaks.
  • Address breakages either on segysak side or in pyzgy.
  • If OK, refactor the src so that zgy and segy can share the same core machinery but preserve the zgy and segy import paths for backwards compatibility. (This will likely be the biggest job).
  • Port segy tests to zgy.
  • Add ZGY functionality to the CLI tools. (Lower priority)

Are you happy to take this on? I can help with questions if you have any.

from segysak.

da-wad avatar da-wad commented on June 26, 2024

ZFP is still nearly in numcodecs. The blocker is lack of macOS wheels, which I can understand their need for. However, the old Travis CI/CD setup for zfpy with multibuild made it difficult and fragile to implement. Now https://github.com/LLNL/zfpy-wheels is using cibuildwheel through Github Actions it should be trivial™ to start building those and numcodecs can get on with their magic! (See: zarr-developers/numcodecs#303 (comment))

from segysak.

musiczhzhao avatar musiczhzhao commented on June 26, 2024

Thanks @trhallam and @da-wad for the details! Sorry didn't respond earlier as I was on something different.

Is there any suggestions on the easiest approach to drop pyzgy for segyio in the segysak.segy part of the software and see what breaks? I saw that segyio has been imported & used in many places and not sure a straight replacement of segyio with pysegy is a proper way. Also I am assuming no API changes?

Aside from this. I noticed that in segysak there is a requirement that the segy data loaded must be "regular" to be treated as 3D and thus can be further convert to zgy. This is preventing us to convert irregular segy data to zgy. May I quickly ask if this check is required to operate the zgy conversion? Is there a preferred way to improve this or if switching to use pysegy will have this issue resolved? I'm interest in having this part improved.

Some snapshots of the code:

segy/_segy_headers.py
image

segy/_segy_loader.py
image

openzgy/_openzgy.py
image

Best,
Zhao

from segysak.

trhallam avatar trhallam commented on June 26, 2024

Hi Zhao,

I would simply start by replacing statements that say

import segyio

with

import pyzgy as segyio

and see what works and what breaks, are things missing from pyzgy that mirror the segyio behaviour.

You can load irregular data from SEGY using the segy_freeloader method. This gives you more flexibility. The limitation on the zgy writer is because only methods for 3D have been implemented. I think, most people working in Petrel use SEGY for 2D data anyway.

Can you explain what is irregular about your data? If it just has missing trace segysak can handle this, the general principle is that the byte locations describing the trace grid must be unique and orthogonal, e.g cdp or xline/iline or xline/iline/offset for example.

I'm also a little confused by your last comment. It sounds like you are trying to load SEGY and convert it to ZGY. If you are using ZGY then I'd assume you are using Schlumberger software. Why not just convert to ZGY using that? segysak is primarily designed to read SEGY into Python, it is not particularly performant for file conversion.

If this is your aim, you'll need to write quite a bit of code, because the segy_converter code which streams data from SEGY only outputs to Netcdf at the moment.

from segysak.

trhallam avatar trhallam commented on June 26, 2024

@da-wad does pyzgy support writing to ZGY at this time? I couldn't see any write methods in the source.

from segysak.

musiczhzhao avatar musiczhzhao commented on June 26, 2024

Thanks @trhallam for the brilliant idea! Will definitely give a shot.

By irregular I mean different subline have different number of crosslines. "Missing traces" is a better way to understand probably. I do try to use this tool to do some experimental data conversion and I personally don't have convenient access to Schlumberger software unfortunately.

Can you help me to better understand the current limitation of segysak in converting data with missing traces from segy to zgy? Dose the intermediate NetCDF4 impose any constrains? Or we just need to add the ability to convert non 3d data in zgy_writter() method? (I assume the limitation is not from the OSDU python code.)

Best,
Zhao

from segysak.

trhallam avatar trhallam commented on June 26, 2024

Ok, I think there was some confusion here with the original question, the comment in the code is to write a converter from zgy->xarray/netcdf, not to go from segy->zgy.

If you are having trouble loading data with missing traces it is likely you don't have the right byte locations specified on the loader.

Zgy as I understand it doesn't support dead traces because the regular geometry is fundamental to the underlying storage implementation. There may be some internal optimisations for size but the file is still defined by the ranges of iline and xline present in your data.

I think the best way forward here is to first get your data or a subset of it loading with the normal segy_loader, then try to write that subset with the current zgy_writer.

Once that is working, you can make a simpler version of the segy_converter function which loads your segy in smaller pieces before writing to a single Zgy.

The key here, is that you will need to pre-scrape the segy headers to build the full geometry for your volume. Most file formats don't allow dynamic expansion and Zgy is no different. The steps in your converter function should be:

  • scrape the segy headers using segy_header_scrape
  • define the full cube geometry for your ZGY file
  • use segy_loader to iterate loading sub-sets of your data into memory before writing them into your ZGY file

Long term if the package is to support conversion between many file formats it probably needs an interchange API between readers and writers but for the moment I think you could just write a function with the above steps that will meet your needs.

from segysak.

da-wad avatar da-wad commented on June 26, 2024

@trhallam correct, pyzgy does not currently support writing ZGY files.

Regarding irregular 3D SEG-Y files, these could be converted to ZGY format if padded with zeroed traces so that they become regular.

I think it would be easier to give good advice to @musiczhzhao if he explains what he wants to achieve and why?

from segysak.

trhallam avatar trhallam commented on June 26, 2024

@trhallam correct, pyzgy does not currently support writing ZGY files.

Regarding irregular 3D SEG-Y files, these could be converted to ZGY format if padded with zeroed traces so that they become regular.

I think it would be easier to give good advice to @musiczhzhao if he explains what he wants to achieve and why?

segysak already does this padding with zeros as long as the iline/xline byte locations specified can be used to create the full 3D volume grid.

from segysak.

trhallam avatar trhallam commented on June 26, 2024

@musiczhzhao

Not sure if you are still interested in this, but I got around to incorporating pyzgy. It's written in using the new xarray backend API which means lazy loading of ZGY is support for large files.

You can try it out by installing the development branch
https://github.com/trhallam/segysak/tree/xarray_backend

Load ZGY files using the open_dataset method.

import xarray as xr
zgy = xr.open_dataset('file.zgy')

from segysak.

musiczhzhao avatar musiczhzhao commented on June 26, 2024

from segysak.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.