trhallam / segysak Goto Github PK
View Code? Open in Web Editor NEWSEGY Swiss Army Knife for Seismic Data
Home Page: https://trhallam.github.io/segysak/
License: GNU General Public License v3.0
SEGY Swiss Army Knife for Seismic Data
Home Page: https://trhallam.github.io/segysak/
License: GNU General Public License v3.0
segyio added a file accessor called Groups which can be used to access traces common header key values.
with segyio.open('file.segy') as f:
g = f.group([keys...])
for h in g[[keys...]].header:
print(h)
error out: 'FrozenDict' object does not support item assignment
0.3.0 with no similar error.
This requires someone to thing about dynamic ranges and heuristic views of the data, what questions and logic can software ask to work this out?
please can you help me I want to read segy header and using your code, but
from segysak.segy import get_segy_texthead
it does not work and it give me
ImportError: cannot import name 'split_when' from 'more_itertools' (/Users/karimcherouana/opt/anaconda3/lib/python3.7/site-packages/more_itertools/init.py)
Originally posted by @ckarim2 in #68 (comment)
Sometimes the head field 'LagTimeA' is defined and each trace has a lag. There is also a field in the binary head for first sample which applied a bulk offset. LagTimeA could be used for bulk offset if it is constant but should probably raise an error if not. We might then need an option to ignore lagtime if we include it on loading.
matt [Agile] 3 minutes ago
Right, makes sense... Might be worth a chat with @geo_leeman about how they attach units of measure, defined in pint I think, to arrays in MetPy. It's incredibly convenient, because it means instant ft<->m, ft/s<->m/s<->km/s, etc, etc. So it can be done.
This is currently failing due to changes in the way the seismic is stored, it also need implementation for
2D
2D Pre-stack
3D Pre-stack
Tests which convert back segyio segy tests.
In segy_loader
's docstring, this line:
xline (int, optional): Cross-line byte location, usally 193
has a typo in "usally"
These tests relate the creation of blank seismic_datasets and should be able to run multiple scenarios. Might be a good reason to use hypothesis.
Examples would be
# Create an empty seismic cube by specifying dimensions as arrays of values.
d1 = create_seismic_dataset(twt=np.arange(1001), d2=5, cdp=np.r_[1, 2, 3], offset=np.r_[100, 200, 351], d5=np.linspace(10, 100, 10))
# This should fail d2 = create_seismic_dataset(twt=np.arange(1001), iline=5, cdp=np.r_[1, 2, 3], offset=np.r_[100, 200, 351], d5=np.linspace(10, 100, 10))
d3 = create_seismic_dataset(depth=3*np.arange(1001), iline=5, xline=np.r_[10, 20, 30], offset=np.r_[100, 200, 351])
d4 = create3d_dataset((10, 15, 20), sample_rate=2, first_iline=10, first_xline=12, xline_step=2)
d5 = create2d_dataset((10, 15, 20), sample_rate=2, first_cdp=10, first_offset=5, offset_step=10)
the function segy_header_scan currently scans the first N traces and returns information about the values in the header fields.
This needs to be beautified and documented so people understand how to use it.
https://pypi.org/project/segysak/ logo doesn't show up
thanks for the barvo job.
I got a 77.9 MB sgy file and I was trying to open it as doc Vectorisation goes.
But it seemed that python was killed due to memory issue.
Here is the lines.
`from segysak.segy import segy_loader, well_known_byte_locs, segy_writer
volve_3d_path = pathlib.Path("./2dshot.sgy")
volve_3d = segy_loader(volve_3d_path, **well_known_byte_locs("petrel_3d"))
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 24.0k/24.0k [00:02<00:00, 8.30k traces/s]
Loading as 3D
Fast direction is CDP
Converting SEGY: 100%|███████████████████████████████████████████████████████████████████████████████████| 24.0k/24.0k [00:12<00:00, 1.93k traces/s]
volve_3d_df = volve_3d.to_dataframe()
[1] 40544 killed python`
Current segy writer function ncdf2segy
is pretty limited and needs improvements.
Start laying out a pattern for documentation using sphinx and get something up on read the docs
This function returns a the textual header in the form of a single string, but it would be good to have this formatted in a readable way
This looks like some type of encoding issue with the segy file, yet its a well known file and a good example of what is out in the wild
get_segy_texthead(mini_segy_file)
'��������/�����`��!��>������������������[�%�\x16��������\x16���������������������������\n��\x16�������`��\x1b�����_?���/��\x1b����������������������������������������������������\n������������������������������������������������������� ... etc 😛
Publish to pypi for linux and windows using Github Actions. Python 3.6+
Values being exported to the segy header must be converted to int before they can be written. If they cannot be written an error is raised.
It looks like this is a great fit for the exact approach in the xarray docs for advanced interpolation http://xarray.pydata.org/en/stable/interpolation.html#advanced-interpolation
I think due to the size of the auto notebooks rtd times out during the build.
I like the fact that notebooks can be built during the doc build because it checks them for inconsistencies with changes in the codebase.
Alternative options at the moment would be github pages perhaps or pre-building the notebooks with gh actions as part of the testing so they can be published as artifacts a la https://github.com/dfm/rtds-action.
Keen for someone to pick this up and do some testing if they have time.
Hi Tony,
How do I display/plot VSP data using segysak?
Cheers
Kush
The pytest tests are not all running, if we can get them updated and green then we can hook them up to a github action and keep them green on PRs
Use the output of segy_header_scrape
which is a pandas Dataframe to create a set of rich statistics which help users:
seismic = segy_loader(mini_segy_file)
file:
https://s3.us-east-2.amazonaws.com/seismic.euclidity.com/F3/f3_seismic.sgy
I get a TypeError: '<=' not supported between instances of 'numpy.ndarray' and 'str'
from pandas originating from the segy_loader when running segysak convert examples/data/volve10r12-full-twt-sub3d.sgy --crop 10090 10100 2150 2160
We need a function that creates the sampling points for an arbline from a set of coordinates.
If you have a some points A, B & C that don't follow the grid then you will not get nice divisible distances between them. The question is how should we sample the line. Divide by N and have different sampling along each segment (no), try to use average dxdy from the grid (maybe) but this will create funny sized intervals at the end of each segment.
This function should return a list of coordinates that can be used by seis.xysel()
Converting these tickets to issues for more detailed discussion and tracking.
Add more!
Borrowed from Verde project as a draft but needs to be modified for this project.
With an input data read from netcdf via dask, applying fill_cdpna()
causes an error when writing back the file to netcdf.
One solution is to 'realize' the data and then outputting to disk, e.g.:
cube.compute().seisio.to_netcdf('cube_crop.seisnc')
But this is not applicable if the data size is very large (it would crash simply because there is not enough ram to hold the data in memory).
Example:
cube = open_seisnc(local_data+'cube.seisnc', chunks={'xline': 100, 'iline': 100})
cube.seis.fill_cdpna()
cube = cube.sel(twt=slice(3800,5500))
cube.seisio.to_netcdf('cube_crop.seisnc')
Resulting error log copied from jupyter console:
In [7]: cube.seisio.to_netcdf('cube_crop.seisnc')
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-7-0f071262ca7b> in <module>
----> 1 near.seisio.to_netcdf(local_data+'md4_near_cond_crop.seisnc')
~/miniconda3/envs/wrk/lib/python3.7/site-packages/segysak/_accessor.py in to_netcdf(self, seisnc, **kwargs)
46 kwargs["engine"] = "h5netcdf"
47
---> 48 self._obj.to_netcdf(seisnc, **kwargs)
49
50
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
1566 unlimited_dims=unlimited_dims,
1567 compute=compute,
-> 1568 invalid_netcdf=invalid_netcdf,
1569 )
1570
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
1080 # to be parallelized with dask
1081 dump_to_store(
-> 1082 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
1083 )
1084 if autoclose:
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
1126 variables, attrs = encoder(variables, attrs)
1127
-> 1128 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
1129
1130
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
296 self.set_dimensions(variables, unlimited_dims=unlimited_dims)
297 self.set_variables(
--> 298 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims
299 )
300
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims)
334 check = vn in check_encoding_set
335 target, source = self.prepare_variable(
--> 336 name, v, check, unlimited_dims=unlimited_dims
337 )
338
~/miniconda3/envs/wrk/lib/python3.7/site-packages/xarray/backends/h5netcdf_.py in prepare_variable(self, name, variable, check_encoding, unlimited_dims)
287 dimensions=variable.dims,
288 fillvalue=fillvalue,
--> 289 **kwargs,
290 )
291 else:
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5netcdf/core.py in create_variable(self, name, dimensions, dtype, data, fillvalue, **kwargs)
499 group = group._require_child_group(k)
500 return group._create_child_variable(keys[-1], dimensions, dtype, data,
--> 501 fillvalue, **kwargs)
502
503 def _get_child(self, key):
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5netcdf/core.py in _create_child_variable(self, name, dimensions, dtype, data, fillvalue, **kwargs)
474 h5ds = self._h5group[name]
475 if _netcdf_dimension_but_not_variable(h5ds):
--> 476 self._detach_dim_scale(name)
477 del self._h5group[name]
478
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5netcdf/core.py in _detach_dim_scale(self, name)
572 for n, dim in enumerate(var.dimensions):
573 if dim == name:
--> 574 var._h5ds.dims[n].detach_scale(self._all_h5groups[dim])
575
576 for subgroup in self.groups.values():
~/miniconda3/envs/wrk/lib/python3.7/site-packages/h5py/_hl/dims.py in detach_scale(self, dset)
101 """
102 with phil:
--> 103 h5ds.detach_scale(self._id, dset.id, self._dimension)
104
105 def items(self):
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/h5ds.pyx in h5py.h5ds.detach_scale()
h5py/defs.pyx in h5py.defs.H5DSdetach_scale()
RuntimeError: Unspecified error in H5DSdetach_scale (return value <0)
Is there a way to adjust the interpolation distance of points to a 2D line?
Say I have 2 parallel 2D lines, in separate survey A & B. I have a horizon interpreted on B. When I load survey A in segysak and use surface_from_points to plot the horizon, segysak will project horizon drawn on B and plotted on A
Nice if we can set the maximum distance to project on points to 2D seismic
Thanks
Maybe once things settle a bit this should move to a folder and subfile with appropriate init.py setup for ipywidgets initialisation and maintaining imports
add dask, tqdm as deps
Currently data seems to be loaded as float64 for xarray, this may be unnecessary from segy because the data will only have 32bit precision. This might explain why seisnc files are coming out larger in some cases than the equivalent segy.
using a naive call to
seismic = segy_loader(full_segy_file)
in a notebook causes the kernel to die after the header scan
file is available here: https://s3.us-east-2.amazonaws.com/seismic.euclidity.com/F3/f3_seismic_full.sgy
http://xarray.pydata.org/en/stable/generated/xarray.register_dataset_accessor.html to allow access to not dimension coordinates e.g. CDP_X, CDP_Y
Moving issue from tutorial page to actual repository.
trhallam/segysak-t21-tutorial#3 (comment)
Hi,
We noticed that there is a TODO item in _openzgy.py to create a converter which does just a few inlines at a time for large files. Do > we have a estimation when this can be done?
Best,
Zhao
Add some more guys!
If modality is the correct term here but it would be easy and useful to add some functions like:
so that the logic for working that out is encapsulated in the class
If the Enum is not valid for SEGYIO then it should be caught and an error raised prior to scanning to catch user input errors.
For example, here 194 is a typo and should be 193
segy.segy_loader(template, iline=189, xline=194)
clip = abs(np.percentile(data, 0.999))
The clip value can be used for vmin vmax in imshow and saves us having to calculate it otf later. Which might be expensive for large volumes.
Props to @aadm
(and more in this file...)
Was hoping to use segysak to answer this question over in PyVista: pyvista/pyvista-support#320
If the segy volume has dead traces or nan values in it percentiles does not work.
NaN values will need to be filtered before adding nan values.
Could also include an attribute to count the number of real values in a cube.
Subcomands:
ascii: ascii out of headers/ebcidc
scan: scan headers ebcidc
nc: process netcdf hdf5 input files
segy: process segy input files
Improve README docs to reflect changes.
Should we be providing a conda environment file alongside or in place of the requirements.txt?
After performing a full header scan it is possible to interact directly with segy. Perhaps there is scope to explore a new class in segysak that makes some of the functionality of xarray
or pandas
available to the user to access segy in a more straight forward way. This would remove the need for hdf5 conversion on simple use cases and potentially help users quickly explore or modify their segy.
See if HDF support can also include zarr and zfp support via xarray.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.