trhallam / segysak Goto Github PK

View Code? Open in Web Editor NEW

83.0 9.0 30.0 63.52 MB

SEGY Swiss Army Knife for Seismic Data

Home Page: https://trhallam.github.io/segysak/

License: GNU General Public License v3.0

Python 100.00%

swung-t20 segy geophysics seismic swung-t21

segysak's People

Contributors

Stargazers

Watchers

segysak's Issues

Investigate whether segyio Group class can be used to speed up reading (non-3d) data.

segyio added a file accessor called Groups which can be used to access traces common header key values.

with segyio.open('file.segy') as f:
    g = f.group([keys...])
    for h in g[[keys...]].header:
        print(h)

0.3.1 segy_writer bug

error out: 'FrozenDict' object does not support item assignment
0.3.0 with no similar error.

Try and decide from the headers if file is 2d/3d/gathers.

This requires someone to thing about dynamic ranges and heuristic views of the data, what questions and logic can software ask to work this out?

dependencies import error - more_itertools

please can you help me I want to read segy header and using your code, but

from segysak.segy import get_segy_texthead

it does not work and it give me

ImportError: cannot import name 'split_when' from 'more_itertools' (/Users/karimcherouana/opt/anaconda3/lib/python3.7/site-packages/more_itertools/init.py)

Originally posted by @ckarim2 in #68 (comment)

Account for lag time in header

Sometimes the head field 'LagTimeA' is defined and each trace has a lag. There is also a field in the binary head for first sample which applied a bulk offset. LagTimeA could be used for bulk offset if it is constant but should probably raise an error if not. We might then need an option to ignore lagtime if we include it on loading.

Investigate adding unit support for dimensions/arrays.

matt [Agile] 3 minutes ago
Right, makes sense... Might be worth a chat with @geo_leeman about how they attach units of measure, defined in pint I think, to arrays in MetPy. It's incredibly convenient, because it means instant ft<->m, ft/s<->m/s<->km/s, etc, etc. So it can be done.

Fix code for return path from xarray to segy

This is currently failing due to changes in the way the seismic is stored, it also need implementation for

2D
2D Pre-stack
3D Pre-stack
Tests which convert back segyio segy tests.

Typo in segy_loader Args

In segy_loader's docstring, this line:

xline (int, optional): Cross-line byte location, usally 193 has a typo in "usally"

Need to write tests for functions in _seismic_dataset.py

These tests relate the creation of blank seismic_datasets and should be able to run multiple scenarios. Might be a good reason to use hypothesis.

Examples would be

# Create an empty seismic cube by specifying dimensions as arrays of values.
d1 = create_seismic_dataset(twt=np.arange(1001), d2=5, cdp=np.r_[1, 2, 3], offset=np.r_[100, 200, 351], d5=np.linspace(10, 100, 10))
# This should fail d2 = create_seismic_dataset(twt=np.arange(1001), iline=5, cdp=np.r_[1, 2, 3], offset=np.r_[100, 200, 351], d5=np.linspace(10, 100, 10))
d3 = create_seismic_dataset(depth=3*np.arange(1001), iline=5, xline=np.r_[10, 20, 30], offset=np.r_[100, 200, 351])
d4 = create3d_dataset((10, 15, 20), sample_rate=2, first_iline=10, first_xline=12, xline_step=2)
d5 = create2d_dataset((10, 15, 20), sample_rate=2, first_cdp=10, first_offset=5, offset_step=10)

Create a nice/pretty printout of the header scan results to help users understand their header.

the function segy_header_scan currently scans the first N traces and returns information about the values in the header fields.

This needs to be beautified and documented so people understand how to use it.

Logo is broken on pypi

https://pypi.org/project/segysak/ logo doesn't show up

fail to convert a 2D sgy file into dataframe

thanks for the barvo job.

I got a 77.9 MB sgy file and I was trying to open it as doc Vectorisation goes.

But it seemed that python was killed due to memory issue.

Here is the lines.

`from segysak.segy import segy_loader, well_known_byte_locs, segy_writer

volve_3d_path = pathlib.Path("./2dshot.sgy")

volve_3d = segy_loader(volve_3d_path, **well_known_byte_locs("petrel_3d"))

100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 24.0k/24.0k [00:02<00:00, 8.30k traces/s]
Loading as 3D
Fast direction is CDP
Converting SEGY: 100%|███████████████████████████████████████████████████████████████████████████████████| 24.0k/24.0k [00:12<00:00, 1.93k traces/s]

volve_3d_df = volve_3d.to_dataframe()
[1] 40544 killed python`

Improve segy writer

Current segy writer function ncdf2segy is pretty limited and needs improvements.

Add support for more than just 2D
Add support for 2D gathers
Add support for 3D gathers
Add option to only export live traces (Avoids NaN error on coords).
Ensure segy export even if header fields are empty.
Fix exported default text header to include references to segysak
Add extra information to default text header if no text header is supplied.
Use segyio group for performance enhancements?

Start sphinx based documentation

Start laying out a pattern for documentation using sphinx and get something up on read the docs

setup sphinx
publish "something" to readthedocs
get a link / badge in the readme

Rate of data being read/load

There is a rate reported as the data is scrapped. The current rate is 'it/s' that stands for iterations per seconds.
Could this be a more meaningful rate? For instance

trace/second
bites/second

Pretty Print output from get_segy_texthead

This function returns a the textual header in the form of a single string, but it would be good to have this formatted in a readable way

Example: loading and visualising a 3D seismic volume

read textual header to understand file
(optional/when ready) do header scan to check byte locations are populated
use segy_loader with byte location dicts
visualse data footprint in X/Y
visualise an inline / xline / timeslice (this is now done elsewhere?)
visualise a trace
compute and visualise trace-trace mean shift over the dataset

F3 dataset get_text_header output is bad

This looks like some type of encoding issue with the segy file, yet its a well known file and a good example of what is out in the wild

get_segy_texthead(mini_segy_file)

'��������/�����`��!��>������������������[�%�\x16��������\x16���������������������������\n��\x16�������`��\x1b�����_?���/��\x1b����������������������������������������������������\n������������������������������������������������������� ... etc 😛

Publish to pypi

Publish to pypi for linux and windows using Github Actions. Python 3.6+

Documentation: Add comments in FAQ and elsewhere regarding exporting of non-convertable values for header.

Values being exported to the segy header must be converted to int before they can be written. If they cannot be written an error is raised.

Improve the error information raised when this fails.
Improve documentation so that users are aware the int conversion takes place.

Extract data along well track

It looks like this is a great fit for the exact approach in the xarray docs for advanced interpolation http://xarray.pydata.org/en/stable/interpolation.html#advanced-interpolation

docs failing

I think due to the size of the auto notebooks rtd times out during the build.

I like the fact that notebooks can be built during the doc build because it checks them for inconsistencies with changes in the codebase.

Alternative options at the moment would be github pages perhaps or pre-building the notebooks with gh actions as part of the testing so they can be published as artifacts a la https://github.com/dfm/rtds-action.

Keen for someone to pick this up and do some testing if they have time.

Visualizing VSP data

Hi Tony,

How do I display/plot VSP data using segysak?

Cheers
Kush

Get all tests in `tests` to pass

The pytest tests are not all running, if we can get them updated and green then we can hook them up to a github action and keep them green on PRs

Example: Rich Header Statistics

Use the output of segy_header_scrape which is a pandas Dataframe to create a set of rich statistics which help users:

Know which byte files have meaningful information.
Plot their header information to show trace/iline/xline number directions.
Identify gaps in their data
...

The mini F3 dataset loads as a 2D line when naively calling segy_loader

seismic = segy_loader(mini_segy_file)

file:
https://s3.us-east-2.amazonaws.com/seismic.euclidity.com/F3/f3_seismic.sgy

Cropping via the CLI command breaks

I get a TypeError: '<=' not supported between instances of 'numpy.ndarray' and 'str' from pandas originating from the segy_loader when running segysak convert examples/data/volve10r12-full-twt-sub3d.sgy --crop 10090 10100 2150 2160

Arb line coordinates

We need a function that creates the sampling points for an arbline from a set of coordinates.

If you have a some points A, B & C that don't follow the grid then you will not get nice divisible distances between them. The question is how should we sample the line. Divide by N and have different sampling along each segment (no), try to use average dxdy from the grid (maybe) but this will create funny sized intervals at the end of each segment.

This function should return a list of coordinates that can be used by seis.xysel()

Example Ticket....Choose one and open up an issue related to it.

Converting these tickets to issues for more detailed discussion and tracking.

Example: Extract data from a 3D volume using a horizon surface

load 3D twt data
load a horizon e.g. top hugin (into an Xarray?)
extract data over the horizon into a 2D map
extract data over a horizon +/-N ms into a smaller 3D cube

Example: Matplotlib with Xarray

Plot seismic iline and xline.
Plot seismic slice
Plot gathers
Plot cdpx and cdpy locations
Create a polygon box for the dataset
Create a live trace polygon box for the dataset
Plot a seismic slice in cdpx and cdpy, not in iline/xline
Plot multiple 2d geometries on a map

Add more!

Complete and Review COC and Contribution Guide

Borrowed from Verde project as a draft but needs to be modified for this project.

`fill_cdpna()` causes crash in output

With an input data read from netcdf via dask, applying fill_cdpna() causes an error when writing back the file to netcdf.

One solution is to 'realize' the data and then outputting to disk, e.g.:

cube.compute().seisio.to_netcdf('cube_crop.seisnc')

But this is not applicable if the data size is very large (it would crash simply because there is not enough ram to hold the data in memory).

Example:

cube = open_seisnc(local_data+'cube.seisnc', chunks={'xline': 100, 'iline': 100})
cube.seis.fill_cdpna()
cube = cube.sel(twt=slice(3800,5500))
cube.seisio.to_netcdf('cube_crop.seisnc')

Resulting error log copied from jupyter console:

In [7]: cube.seisio.to_netcdf('cube_crop.seisnc')
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-0f071262ca7b> in <module>
----> 1 near.seisio.to_netcdf(local_data+'md4_near_cond_crop.seisnc')
~/miniconda3/envs/wrk/lib/python3.7/site-packages/segysak/_accessor.py in to_netcdf(self, seisnc, **kwargs)
     46         kwargs["engine"] = "h5netcdf"
     47 
---> 48         self._obj.to_netcdf(seisnc, **kwargs)
     49 
     50 
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   1566             unlimited_dims=unlimited_dims,
   1567             compute=compute,
-> 1568             invalid_netcdf=invalid_netcdf,
   1569         )
   1570 
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1080         # to be parallelized with dask
   1081         dump_to_store(
-> 1082             dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
   1083         )
   1084         if autoclose:
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1126         variables, attrs = encoder(variables, attrs)
   1127 
-> 1128     store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
   1129 
   1130 
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    296         self.set_dimensions(variables, unlimited_dims=unlimited_dims)
    297         self.set_variables(
--> 298             variables, check_encoding_set, writer, unlimited_dims=unlimited_dims
    299         )
    300 
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims)
    334             check = vn in check_encoding_set
    335             target, source = self.prepare_variable(
--> 336                 name, v, check, unlimited_dims=unlimited_dims
    337             )
    338 
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/h5netcdf_.py in prepare_variable(self, name, variable, check_encoding, unlimited_dims)
    287                 dimensions=variable.dims,
    288                 fillvalue=fillvalue,
--> 289                 **kwargs,
    290             )
    291         else:
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5netcdf/core.py in create_variable(self, name, dimensions, dtype, data, fillvalue, **kwargs)
    499             group = group._require_child_group(k)
    500         return group._create_child_variable(keys[-1], dimensions, dtype, data,
--> 501                                             fillvalue, **kwargs)
    502 
    503     def _get_child(self, key):
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5netcdf/core.py in _create_child_variable(self, name, dimensions, dtype, data, fillvalue, **kwargs)
    474             h5ds = self._h5group[name]
    475             if _netcdf_dimension_but_not_variable(h5ds):
--> 476                 self._detach_dim_scale(name)
    477                 del self._h5group[name]
    478 
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5netcdf/core.py in _detach_dim_scale(self, name)
    572             for n, dim in enumerate(var.dimensions):
    573                 if dim == name:
--> 574                     var._h5ds.dims[n].detach_scale(self._all_h5groups[dim])
    575 
    576         for subgroup in self.groups.values():
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5py/_hl/dims.py in detach_scale(self, dset)
    101         """
    102         with phil:
--> 103             h5ds.detach_scale(self._id, dset.id, self._dimension)
    104 
    105     def items(self):
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/h5ds.pyx in h5py.h5ds.detach_scale()
h5py/defs.pyx in h5py.defs.H5DSdetach_scale()
RuntimeError: Unspecified error in H5DSdetach_scale (return value <0)

surface_from_points with 2D data

Is there a way to adjust the interpolation distance of points to a 2D line?
Say I have 2 parallel 2D lines, in separate survey A & B. I have a horizon interpreted on B. When I load survey A in segysak and use surface_from_points to plot the horizon, segysak will project horizon drawn on B and plotted on A

Nice if we can set the maximum distance to project on points to 2D seismic

Thanks

segysak/segy.py is too big and growing, refactor

Maybe once things settle a bit this should move to a folder and subfile with appropriate init.py setup for ipywidgets initialisation and maintaining imports

missings dep

add dask, tqdm as deps

Investigate Loading data as float32 not float64 (default).

Currently data seems to be loaded as float64 for xarray, this may be unnecessary from segy because the data will only have 32bit precision. This might explain why seisnc files are coming out larger in some cases than the equivalent segy.

The Full F3 SEGY does not load

using a naive call to

seismic = segy_loader(full_segy_file)

in a notebook causes the kernel to die after the header scan

file is available here: https://s3.us-east-2.amazonaws.com/seismic.euclidity.com/F3/f3_seismic_full.sgy

xarray accessor

http://xarray.pydata.org/en/stable/generated/xarray.register_dataset_accessor.html to allow access to not dimension coordinates e.g. CDP_X, CDP_Y

Implement `pyzgy` for ZGY files as `segyio` drop in replacement.

Moving issue from tutorial page to actual repository.

trhallam/segysak-t21-tutorial#3 (comment)

Hi,

We noticed that there is a TODO item in _openzgy.py to create a converter which does just a few inlines at a time for large files. Do > we have a estimation when this can be done?

Best,
Zhao

Example: General manipulations using xarray.

Create an example where the seismic cube/2d/gathers are cropped by sub-selection.
Collapse gathers to a stack using xarray.sum
Rename dimensions using xarray.rename_dims
Examples of selecting over dimensions and dimension ranges using .sel and .isel
Filtering data using xarray.where
Apply numpy functions to a dataset
How to apply filters to an xarray efficiently using ndimage
How to apply a function along a trace, like an adaptive gain filter.

Add some more guys!

Add isModality() functions for downstream tasks

If modality is the correct term here but it would be easy and useful to add some functions like:

is_2d()
is_2d_gathers()
is_3d()
is_3d_gathers()

so that the logic for working that out is encapsulated in the class

catch bad segyio Enum prior to loading/scanning

If the Enum is not valid for SEGYIO then it should be caught and an error raised prior to scanning to catch user input errors.

For example, here 194 is a typo and should be 193

segy.segy_loader(template, iline=189, xline=194)

when loading segy into xarray keep a running record of trace percentiles for plotting

clip = abs(np.percentile(data, 0.999))

The clip value can be used for vmin vmax in imshow and saves us having to calculate it otf later. Which might be expensive for large volumes.

Props to @aadm

Notebooks committed with merge conflicts

segysak/examples/notebooks/example_loading_and_visualising_3d_data.ipynb

Lines 42 to 58 in a798d28

    
           <<<<<<< HEAD 
        
                 "/home/aadm/GOOGLEDRIVE/GITHUB/segysak/segysak/segy/_segy_headers.py:9: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)\n", 
        
                 "  from tqdm.autonotebook import tqdm\n" 
        
           ======= 
        
                 "/Users/stevejpurves/dev/swung/segysak/segysak/_accessor.py:110: AccessorRegistrationWarning: registration of accessor <class 'segysak._accessor.SeisArbLine'> under name 'seis' for type <class 'xarray.core.dataset.Dataset'> is overriding a preexisting attribute with the same name.\n", 
        
                 "  @xr.register_dataset_accessor(\"seis\")\n" 
        
           >>>>>>> eb016c77556cc469c6c413d39442c5072b2b7539 
        
                ] 
        
               } 
        
              ], 
        
              "source": [ 
        
               "from segysak.segy import segy_loader, get_segy_texthead, segy_header_scan, segy_header_scrape" 
        
              ] 
        
             }, 
        
             { 
        
              "cell_type": "code", 
        
              "execution_count": 4,

(and more in this file...)

Was hoping to use segysak to answer this question over in PyVista: pyvista/pyvista-support#320

Volume with nans returns nan with percentiles

If the segy volume has dead traces or nan values in it percentiles does not work.

NaN values will need to be filtered before adding nan values.

Could also include an attribute to count the number of real values in a cube.

Convert CLI from argparse to click.

Subcomands:

ascii: ascii out of headers/ebcidc
scan: scan headers ebcidc
nc: process netcdf hdf5 input files
segy: process segy input files
Improve README docs to reflect changes.

conda environment

Should we be providing a conda environment file alongside or in place of the requirements.txt?

direct segy interaction

After performing a full header scan it is possible to interact directly with segy. Perhaps there is scope to explore a new class in segysak that makes some of the functionality of xarray or pandas available to the user to access segy in a more straight forward way. This would remove the need for hdf5 conversion on simple use cases and potentially help users quickly explore or modify their segy.

Look into zarr and zfp support,

See if HDF support can also include zarr and zfp support via xarray.

	<<<<<<< HEAD
	"/home/aadm/GOOGLEDRIVE/GITHUB/segysak/segysak/segy/_segy_headers.py:9: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)\n",
	" from tqdm.autonotebook import tqdm\n"
	=======
	"/Users/stevejpurves/dev/swung/segysak/segysak/_accessor.py:110: AccessorRegistrationWarning: registration of accessor <class 'segysak._accessor.SeisArbLine'> under name 'seis' for type <class 'xarray.core.dataset.Dataset'> is overriding a preexisting attribute with the same name.\n",
	" @xr.register_dataset_accessor(\"seis\")\n"
	>>>>>>> eb016c77556cc469c6c413d39442c5072b2b7539
	]
	}
	],
	"source": [
	"from segysak.segy import segy_loader, get_segy_texthead, segy_header_scan, segy_header_scrape"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,

trhallam / segysak Goto Github PK

segysak's People

Contributors

Stargazers

Watchers

Forkers

segysak's Issues

Recommend Projects

Recommend Topics

Recommend Org