Giter Site home page Giter Site logo

trhallam / segysak Goto Github PK

View Code? Open in Web Editor NEW
83.0 9.0 30.0 63.52 MB

SEGY Swiss Army Knife for Seismic Data

Home Page: https://trhallam.github.io/segysak/

License: GNU General Public License v3.0

Python 100.00%
swung-t20 segy geophysics seismic swung-t21

segysak's People

Contributors

aadm avatar dabiged avatar fabioaco avatar richardscottoz avatar stevejpurves avatar trhallam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

segysak's Issues

0.3.1 segy_writer bug

error out: 'FrozenDict' object does not support item assignment
0.3.0 with no similar error.

dependencies import error - more_itertools

please can you help me I want to read segy header and using your code, but

from segysak.segy import get_segy_texthead

it does not work and it give me

ImportError: cannot import name 'split_when' from 'more_itertools' (/Users/karimcherouana/opt/anaconda3/lib/python3.7/site-packages/more_itertools/init.py)

Originally posted by @ckarim2 in #68 (comment)

Account for lag time in header

Sometimes the head field 'LagTimeA' is defined and each trace has a lag. There is also a field in the binary head for first sample which applied a bulk offset. LagTimeA could be used for bulk offset if it is constant but should probably raise an error if not. We might then need an option to ignore lagtime if we include it on loading.

Investigate adding unit support for dimensions/arrays.

matt [Agile] 3 minutes ago
Right, makes sense... Might be worth a chat with @geo_leeman about how they attach units of measure, defined in pint I think, to arrays in MetPy. It's incredibly convenient, because it means instant ft<->m, ft/s<->m/s<->km/s, etc, etc. So it can be done.

Fix code for return path from xarray to segy

This is currently failing due to changes in the way the seismic is stored, it also need implementation for

  • 2D

  • 2D Pre-stack

  • 3D Pre-stack

  • Tests which convert back segyio segy tests.

Typo in segy_loader Args

In segy_loader's docstring, this line:

xline (int, optional): Cross-line byte location, usally 193 has a typo in "usally"

Need to write tests for functions in _seismic_dataset.py

These tests relate the creation of blank seismic_datasets and should be able to run multiple scenarios. Might be a good reason to use hypothesis.

Examples would be

# Create an empty seismic cube by specifying dimensions as arrays of values.
d1 = create_seismic_dataset(twt=np.arange(1001), d2=5, cdp=np.r_[1, 2, 3], offset=np.r_[100, 200, 351], d5=np.linspace(10, 100, 10))
# This should fail d2 = create_seismic_dataset(twt=np.arange(1001), iline=5, cdp=np.r_[1, 2, 3], offset=np.r_[100, 200, 351], d5=np.linspace(10, 100, 10))
d3 = create_seismic_dataset(depth=3*np.arange(1001), iline=5, xline=np.r_[10, 20, 30], offset=np.r_[100, 200, 351])
d4 = create3d_dataset((10, 15, 20), sample_rate=2, first_iline=10, first_xline=12, xline_step=2)
d5 = create2d_dataset((10, 15, 20), sample_rate=2, first_cdp=10, first_offset=5, offset_step=10)

fail to convert a 2D sgy file into dataframe

thanks for the barvo job.

I got a 77.9 MB sgy file and I was trying to open it as doc Vectorisation goes.

But it seemed that python was killed due to memory issue.

Here is the lines.

`from segysak.segy import segy_loader, well_known_byte_locs, segy_writer

volve_3d_path = pathlib.Path("./2dshot.sgy")

volve_3d = segy_loader(volve_3d_path, **well_known_byte_locs("petrel_3d"))

100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 24.0k/24.0k [00:02<00:00, 8.30k traces/s]
Loading as 3D
Fast direction is CDP
Converting SEGY: 100%|███████████████████████████████████████████████████████████████████████████████████| 24.0k/24.0k [00:12<00:00, 1.93k traces/s]

volve_3d_df = volve_3d.to_dataframe()
[1] 40544 killed python`

Improve segy writer

Current segy writer function ncdf2segy is pretty limited and needs improvements.

  • Add support for more than just 2D
  • Add support for 2D gathers
  • Add support for 3D gathers
  • Add option to only export live traces (Avoids NaN error on coords).
  • Ensure segy export even if header fields are empty.
  • Fix exported default text header to include references to segysak
  • Add extra information to default text header if no text header is supplied.
  • Use segyio group for performance enhancements?

Start sphinx based documentation

Start laying out a pattern for documentation using sphinx and get something up on read the docs

  • setup sphinx
  • publish "something" to readthedocs
  • get a link / badge in the readme

Rate of data being read/load

There is a rate reported as the data is scrapped. The current rate is 'it/s' that stands for iterations per seconds.
Could this be a more meaningful rate? For instance

  • trace/second

  • bites/second
    image (1)

Example: loading and visualising a 3D seismic volume

  • read textual header to understand file
  • (optional/when ready) do header scan to check byte locations are populated
  • use segy_loader with byte location dicts
  • visualse data footprint in X/Y
  • visualise an inline / xline / timeslice (this is now done elsewhere?)
  • visualise a trace
  • compute and visualise trace-trace mean shift over the dataset

F3 dataset get_text_header output is bad

This looks like some type of encoding issue with the segy file, yet its a well known file and a good example of what is out in the wild

get_segy_texthead(mini_segy_file)

'��������/�����`��!��>������������������[�%�\x16��������\x16���������������������������\n��\x16�������`��\x1b�����_?���/��\x1b����������������������������������������������������\n������������������������������������������������������� ... etc 😛 

Publish to pypi

Publish to pypi for linux and windows using Github Actions. Python 3.6+

docs failing

I think due to the size of the auto notebooks rtd times out during the build.

I like the fact that notebooks can be built during the doc build because it checks them for inconsistencies with changes in the codebase.

Alternative options at the moment would be github pages perhaps or pre-building the notebooks with gh actions as part of the testing so they can be published as artifacts a la https://github.com/dfm/rtds-action.

Keen for someone to pick this up and do some testing if they have time.

Get all tests in `tests` to pass

The pytest tests are not all running, if we can get them updated and green then we can hook them up to a github action and keep them green on PRs

Example: Rich Header Statistics

Use the output of segy_header_scrape which is a pandas Dataframe to create a set of rich statistics which help users:

  • Know which byte files have meaningful information.
  • Plot their header information to show trace/iline/xline number directions.
  • Identify gaps in their data
  • ...

Cropping via the CLI command breaks

I get a TypeError: '<=' not supported between instances of 'numpy.ndarray' and 'str' from pandas originating from the segy_loader when running segysak convert examples/data/volve10r12-full-twt-sub3d.sgy --crop 10090 10100 2150 2160

Arb line coordinates

We need a function that creates the sampling points for an arbline from a set of coordinates.

If you have a some points A, B & C that don't follow the grid then you will not get nice divisible distances between them. The question is how should we sample the line. Divide by N and have different sampling along each segment (no), try to use average dxdy from the grid (maybe) but this will create funny sized intervals at the end of each segment.

This function should return a list of coordinates that can be used by seis.xysel()

Example: Matplotlib with Xarray

  • Plot seismic iline and xline.
  • Plot seismic slice
  • Plot gathers
  • Plot cdpx and cdpy locations
  • Create a polygon box for the dataset
  • Create a live trace polygon box for the dataset
  • Plot a seismic slice in cdpx and cdpy, not in iline/xline
  • Plot multiple 2d geometries on a map

Add more!

`fill_cdpna()` causes crash in output

With an input data read from netcdf via dask, applying fill_cdpna() causes an error when writing back the file to netcdf.

One solution is to 'realize' the data and then outputting to disk, e.g.:

cube.compute().seisio.to_netcdf('cube_crop.seisnc')

But this is not applicable if the data size is very large (it would crash simply because there is not enough ram to hold the data in memory).

Example:

cube = open_seisnc(local_data+'cube.seisnc', chunks={'xline': 100, 'iline': 100})
cube.seis.fill_cdpna()
cube = cube.sel(twt=slice(3800,5500))
cube.seisio.to_netcdf('cube_crop.seisnc')

Resulting error log copied from jupyter console:

In [7]: cube.seisio.to_netcdf('cube_crop.seisnc')
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-0f071262ca7b> in <module>
----> 1 near.seisio.to_netcdf(local_data+'md4_near_cond_crop.seisnc')
~/miniconda3/envs/wrk/lib/python3.7/site-packages/segysak/_accessor.py in to_netcdf(self, seisnc, **kwargs)
     46         kwargs["engine"] = "h5netcdf"
     47 
---> 48         self._obj.to_netcdf(seisnc, **kwargs)
     49 
     50 
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   1566             unlimited_dims=unlimited_dims,
   1567             compute=compute,
-> 1568             invalid_netcdf=invalid_netcdf,
   1569         )
   1570 
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1080         # to be parallelized with dask
   1081         dump_to_store(
-> 1082             dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
   1083         )
   1084         if autoclose:
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1126         variables, attrs = encoder(variables, attrs)
   1127 
-> 1128     store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
   1129 
   1130 
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    296         self.set_dimensions(variables, unlimited_dims=unlimited_dims)
    297         self.set_variables(
--> 298             variables, check_encoding_set, writer, unlimited_dims=unlimited_dims
    299         )
    300 
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims)
    334             check = vn in check_encoding_set
    335             target, source = self.prepare_variable(
--> 336                 name, v, check, unlimited_dims=unlimited_dims
    337             )
    338 
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/h5netcdf_.py in prepare_variable(self, name, variable, check_encoding, unlimited_dims)
    287                 dimensions=variable.dims,
    288                 fillvalue=fillvalue,
--> 289                 **kwargs,
    290             )
    291         else:
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5netcdf/core.py in create_variable(self, name, dimensions, dtype, data, fillvalue, **kwargs)
    499             group = group._require_child_group(k)
    500         return group._create_child_variable(keys[-1], dimensions, dtype, data,
--> 501                                             fillvalue, **kwargs)
    502 
    503     def _get_child(self, key):
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5netcdf/core.py in _create_child_variable(self, name, dimensions, dtype, data, fillvalue, **kwargs)
    474             h5ds = self._h5group[name]
    475             if _netcdf_dimension_but_not_variable(h5ds):
--> 476                 self._detach_dim_scale(name)
    477                 del self._h5group[name]
    478 
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5netcdf/core.py in _detach_dim_scale(self, name)
    572             for n, dim in enumerate(var.dimensions):
    573                 if dim == name:
--> 574                     var._h5ds.dims[n].detach_scale(self._all_h5groups[dim])
    575 
    576         for subgroup in self.groups.values():
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5py/_hl/dims.py in detach_scale(self, dset)
    101         """
    102         with phil:
--> 103             h5ds.detach_scale(self._id, dset.id, self._dimension)
    104 
    105     def items(self):
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/h5ds.pyx in h5py.h5ds.detach_scale()
h5py/defs.pyx in h5py.defs.H5DSdetach_scale()
RuntimeError: Unspecified error in H5DSdetach_scale (return value <0)

surface_from_points with 2D data

Is there a way to adjust the interpolation distance of points to a 2D line?
Say I have 2 parallel 2D lines, in separate survey A & B. I have a horizon interpreted on B. When I load survey A in segysak and use surface_from_points to plot the horizon, segysak will project horizon drawn on B and plotted on A

Nice if we can set the maximum distance to project on points to 2D seismic

Thanks

Investigate Loading data as float32 not float64 (default).

Currently data seems to be loaded as float64 for xarray, this may be unnecessary from segy because the data will only have 32bit precision. This might explain why seisnc files are coming out larger in some cases than the equivalent segy.

Example: General manipulations using xarray.

  • Create an example where the seismic cube/2d/gathers are cropped by sub-selection.
  • Collapse gathers to a stack using xarray.sum
  • Rename dimensions using xarray.rename_dims
  • Examples of selecting over dimensions and dimension ranges using .sel and .isel
  • Filtering data using xarray.where
  • Apply numpy functions to a dataset
  • How to apply filters to an xarray efficiently using ndimage
  • How to apply a function along a trace, like an adaptive gain filter.

Add some more guys!

Add isModality() functions for downstream tasks

If modality is the correct term here but it would be easy and useful to add some functions like:

  • is_2d()
  • is_2d_gathers()
  • is_3d()
  • is_3d_gathers()

so that the logic for working that out is encapsulated in the class

catch bad segyio Enum prior to loading/scanning

If the Enum is not valid for SEGYIO then it should be caught and an error raised prior to scanning to catch user input errors.

For example, here 194 is a typo and should be 193

segy.segy_loader(template, iline=189, xline=194)

Notebooks committed with merge conflicts

<<<<<<< HEAD
"/home/aadm/GOOGLEDRIVE/GITHUB/segysak/segysak/segy/_segy_headers.py:9: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)\n",
" from tqdm.autonotebook import tqdm\n"
=======
"/Users/stevejpurves/dev/swung/segysak/segysak/_accessor.py:110: AccessorRegistrationWarning: registration of accessor <class 'segysak._accessor.SeisArbLine'> under name 'seis' for type <class 'xarray.core.dataset.Dataset'> is overriding a preexisting attribute with the same name.\n",
" @xr.register_dataset_accessor(\"seis\")\n"
>>>>>>> eb016c77556cc469c6c413d39442c5072b2b7539
]
}
],
"source": [
"from segysak.segy import segy_loader, get_segy_texthead, segy_header_scan, segy_header_scrape"
]
},
{
"cell_type": "code",
"execution_count": 4,

(and more in this file...)

Was hoping to use segysak to answer this question over in PyVista: pyvista/pyvista-support#320

Volume with nans returns nan with percentiles

If the segy volume has dead traces or nan values in it percentiles does not work.

NaN values will need to be filtered before adding nan values.

Could also include an attribute to count the number of real values in a cube.

Convert CLI from argparse to click.

Subcomands:

  • ascii: ascii out of headers/ebcidc

  • scan: scan headers ebcidc

  • nc: process netcdf hdf5 input files

  • segy: process segy input files

  • Improve README docs to reflect changes.

conda environment

Should we be providing a conda environment file alongside or in place of the requirements.txt?

direct segy interaction

After performing a full header scan it is possible to interact directly with segy. Perhaps there is scope to explore a new class in segysak that makes some of the functionality of xarray or pandas available to the user to access segy in a more straight forward way. This would remove the need for hdf5 conversion on simple use cases and potentially help users quickly explore or modify their segy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.