tgsai / mdio-python Goto Github PK
View Code? Open in Web Editor NEWCloud native, scalable storage engine for various types of energy data.
Home Page: https://mdio.dev/
License: Apache License 2.0
Cloud native, scalable storage engine for various types of energy data.
Home Page: https://mdio.dev/
License: Apache License 2.0
We need CLI options to just look at:
mdio data has a "create" field in it's "zattrs" file.
The propose feature would add time zone information to this datetime in some canonical way.
In a scenario where timezone information is unavailable; the meta data should record that it does not have that information; and not default to something like UTC.
Sometimes users need ability to run very sparse grids.
Disable the sparity check/warning with extra option.
In some cases, there can be duplicate 0s in a segy that cause an error/warning to be displayed and the mdio ingestion fails.
Can provide user with a waring but still ingest the segy to mdio.
Do you have C++ API for MDIO?
If there is, where can I find it?
Thanks
Initial work on shot ingestions flexibility was addressed in PR #180 and contains a detailed description of the problem.
Although #180 contains a solution it requires an a priori knowledge of the configuration of the headers to successfully ingest correctly.
Add an option to automatically detect files with structure similar to Type B and ingest with wrapped channels (Type A)
segy_to_mdio(
segy_path="prefix/shot_file.segy",
mdio_path_or_buffer="s3://bucket/shot_file.mdio",
index_bytes=(17, 137, 13),
index_lengths=(4, 2, 4),
index_names=("shot", "cable", "channel"),
chunksize=(8, 2, 128, 1024),
grid_overrides={"AutoChannelWrap": True},
)
Currently, we can only convert a SEG-Y file to mdio when the SEG-Y file is on a filesystem as make use of segyio
.
We need to support converting SEG-Y file without downloading the entirety of it.
Hi,
I know that google's tensorstore is rather fresh and not officially supported by google (hopefully it will be), but I thought I'd make an issue / future feature request on it anyway as it offer significant performance improvements over the other backends.
I did some measurement that might be of interest
Cube dimensions: 6400x3200x1500
Environment: Cloud 2 cloud - client running in same datacenter (Azure) as where the data is stored.
Fetching a depth slice:
[mdio - zarr] Read slice of size 77.51423645019531 MB in 231.33055152895395 s
[mdio - dask] Read slice of size 77.51423645019531 MB in 16.46115321794059 s
[tensorstore] Read slice of size 77.51423645019531 MB in 5.266401566041168 s
For reference, here is the tensorstore script that I used to read the underlying zarr-array from a mdio-file:
import time
import tensorstore as ts
def fetch():
dataset = ts.open({
'driver': 'zarr',
'kvstore' : {
'driver': 'http',
'base_url': 'https://account.blob.core.windows.net?<sas>',
'path': 'somefile.mdio/data/chunked_012'
}
}).result()
start = time.perf_counter()
zslice = dataset[:,:,200]
data = zslice.read().result()
print(f'[tensorstore] Read slice of size {data.nbytes / (1024 * 1024)} MB in {time.perf_counter() - start} s')
if __name__ == '__main__':
fetch()
They current mdio project toml limits the Python version:
python = ">=3.8,<3.11"
Now that Python 3.11 has been release, the new feature would add support for it.
Not a trivial task. This would require the following, but needs to be done.
We enable caching of numba
functions so we don't compile them every time. However, this is not "zip safe". Docker happens to install packages as zip files.
We get this error.
RuntimeError: cannot cache function 'ieee2ibm': no locator available for file '/usr/local/lib/python3.9/site-packages/mdio/segy/ibm_float.py'
Solution: Wait till numba
0.57
See numba/numba#4908.
Band-aid solution: set environment variable NUMBA_CACHE_DIR=/tmp
OBN will potentially be read as ingested as 6D:
Streamer data could be ingested as:
When you have a chance, can you please write some tests for #248:
Currently, these tests exist for 3D stack; you can extend them or make new ones :) Thanks a lot!
We need tests to ensure backward compatibility primarily for MDIOReader and MDIOWriter.
Improve logging to let users know the reason for sparse grid and take self-corrective action on whether to ingest with autoindex or autochannelwrap.
This has become an issue for streamer data where ingestion with autochannelwrap usually works. In the unusual case of unwrapped channels with duplicate traces then segy2mdio ingestion will fail with a sparse index error. In this case the user would like to see an what the first repeated channel was to aid diagnosis and a suggestion to use autoindex in the case that the job fails.
Is there a way to avoid compression (Blosc) when we run segy_to_mdio? Will this work:
In src/mdio/segy/blocked_io.py
85 elif lossless is None:
86 trace_compressor = header_compressor = None
If yes, could you please support this in MDIO. That way people who do not want any compression can just ship lossless=None explicitly to segy_to_mdio. Thanks.
We should perhaps consider having a GPU Direct Storage backend for mdio. kvikio.zarr.GDSStore
enables it available through kvikio
.
Some relevant links here:
zarr-developers/zarr-python#934
https://zarr.readthedocs.io/en/stable/release.html#release-2-13-0
It would be nice to have geometry abstractions for standard interface and custom / geometry specific exception handling.
This has the following pros:
unwrap_channels,
and we still have a base implementation that should follow the extensible interface.Long shot, but cython
or numba
jit versions may be even better. xarray
can also be used to handle named dimensions etc.
Would inherit from a base class like
# mdio/segy/geometry.py
from abc import ABC
class SeismicGeometry(ABC):
def __init__(self, args, kwargs):
...
@abstractmethod
def __iter__(self):
...
@abstractmethod
def __getitem__(self):
...
@abstractmethod
def xy_to_grid(self, x, y, method="nearest"):
...
@property
@abstractmethod
def num_traces(self):
...
Then we would have 3D as something like
from mdio.segy.geometry import SeismicGeometry
class SeismicStack3d(SeismicGeometry):
def __init__(self, inlines, crosslines, samples):
# set attributes, initialize grid etc.
def __iter__(self):
# logic to iterate traces on spatial il/xl grid
@abstractmethod
def xy_to_grid(self, x, y, method="nearest"):
# logic to convert CDP-X CDP-Y to inline and crossline
@property
def num_traces(self):
return self.get_size("inline") * self.get_size("crossline")
Or 3D shots like
from mdio.segy.geometry import SeismicGeometry
class SeismicShot3d(SeismicGeometry):
def __init__(self, shots, cables, channels, samples):
# set attributes, initialize grid etc.
def __iter__(self):
# logic to iterate traces on shot grid
@abstractmethod
def xy_to_grid(self, x, y, method="nearest"):
# logic to convert SHOT-X SHOT-Y to shot number
@property
def num_traces(self):
return self.get_size("shot") * self.get_size("cable") * self.get_size("channel")
@def unwrap_channels(self, channels_per_streamer: int):
return self.channel % channels_per_streamer + 1
and so on.
I think I can install skbuild manually and fix this, but was wondering if this was going to be an issue for others?
└──> python3 -m pip install multidimio
Collecting multidimio
Downloading multidimio-0.2.0-py3-none-any.whl (60 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.2/60.2 kB 1.0 MB/s eta 0:00:00
Requirement already satisfied: click>=8.1.3 in /opt/homebrew/lib/python3.10/site-packages (from multidimio) (8.1.3)
Collecting zarr<3.0.0,>=2.12.0
Downloading zarr-2.12.0-py3-none-any.whl (185 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 185.8/185.8 kB 3.0 MB/s eta 0:00:00
Collecting click-params<0.4.0,>=0.3.0
Downloading click_params-0.3.0-py3-none-any.whl (12 kB)
Collecting numba<0.56.0,>=0.55.2
Downloading numba-0.55.2-cp310-cp310-macosx_11_0_arm64.whl (2.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 15.1 MB/s eta 0:00:00
Collecting segyio<2.0.0,>=1.9.3
Downloading segyio-1.9.6.tar.gz (1.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 17.2 MB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/private/var/folders/tm/v4szwz995hg06nhvxjlvv7th0000gn/T/pip-install-q00euj7_/segyio_cbb75c3d7b1a4aaeb6b361e8b4253d68/setup.py", line 3, in <module>
import skbuild
ModuleNotFoundError: No module named 'skbuild'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.```
I think it is a good idea to have a Colab and/or Binder badge for the tutorial notebooks.
Here is an action on the GitHub marketplace.
Currently, MDIOWriter
does not have ability to control synchronization anyway. It is extraneous to the MDIOWriter
class.
I think MDIOWriter
is useful if it has ability to expose synchronization for users. I don't know what form it takes but I am currently having a use case where I am having to calculate it to align with the chunk boundaries. This can perhaps be made simple enough for users?
What do you think?
cc @tasansal
Currently, MDIOReader
and MDIOWriter
prints the Python object. It would be useful to have a nice printable representation.
class InfoReporter:
def __init__(self, obj):
self.obj = obj
def __repr__(self):
items = self.obj.info_items()
return info_text_report(items)
def _repr_html_(self):
items = self.obj.info_items()
return info_html_report(items)
from here is a good model to follow.
A dev container will improve the developer experience providing a quick way to get started and more consistent experience.
Add the following capability:
mdio cp -i input_file -o output_file
Some types of segy data do not have a discrete integer bin on input. E.g. 2D CDP data might be regular in CDP and time but the offset dimension maybe represented by a floating point offset.
To handle cases like this the proposal is to have an ingestion header "index" which will just use a counter starting at 1and append traces to the "index" header.
Segy2mdio conversion works properly with lossless but when lossy library is added it says file is not found.
Installed using pip install multidimio[lossy]
It is looking for a file "/metadata/live_mask/0/0/0"
OS windows10; python 3.9; MDIO v0.1.6
SEGY files generated by Petrel crash with an error related to the EBCDIC.
UnicodeDecodeError Traceback (most recent call last)
Cell In [14], line 1
----> 1 segy_to_mdio(
2 segy_path=f,
3 mdio_path_or_buffer="tst3d_cube.mdio",
4 index_bytes=(5, 9),
5 index_names=("inline", "crossline"),
6 )
File C:\appl\python\env\mdio_tst\lib\site-packages\mdio\converters\segy.py:173, in segy_to_mdio(segy_path, mdio_path_or_buffer, index_bytes, index_names, index_lengths, chunksize, endian, lossless, compression_tolerance, storage_options, overwrite)
169 # Read file specific metadata, build grid, and live trace mask.
170 with segyio.open(
171 filename=segy_path, mode="r", ignore_geometry=True, endian=endian
172 ) as segy_handle:
--> 173 text_header = parse_text_header(segy_handle)
174 binary_header = parse_binary_header(segy_handle)
175 num_traces = segy_handle.tracecount
File C:\appl\python\env\mdio_tst\lib\site-packages\mdio\segy\parsers.py:66, in parse_text_header(segy_handle)
54 def parse_text_header(segy_handle: segyio.SegyFile) -> list[str]:
55 """Parse text header from bytearray
to python list
of str
per line.
56
57 The segyio
library returns the text header as a bytearray
instance.
(...)
64 Parsed text header in list with lines as elements.
65 """
---> 66 text_header = segy_handle.text[0].decode()
67 text_header = [
68 text_header[char_idx : char_idx + 80]
69 for char_idx in range(0, len(text_header), 80)
70 ]
71 return text_header
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 155: invalid continuation byte
Petrel EBCDIC appear to require encoding='ISO-8859-1' rather than the default UTF-8
Using
segy_handle.text[0].decode(errors='ignore')
should get around this issue
The newly added rechunking functionality needs tests
Reads 'mappin' instead of 'mapping'.
A request has been made for the threshold for the GridTraceSparsityError to be configurable. This makes sense for some types of pre-stack data (such as land and OBN).
Current configuration throws an error when the proposed grid is 10 time the number of segy traces.
@tasansal, what is you preferred implementation for this? Command-line or environment variable?
Hi!
Is there any plans to publish and version the underlying format. Similar to the zarr-specification [1] ? Great work on this project btw, it looks really promising!
In the MDIO documentation, in the explanation CLI Reference about the chunks (below), maybe improve the documentation (e.g. put a period "." between "2-byte" and "Chunks") ?
3D Seismic Shot Data (Byte Locations Vary): Let’s assume streamer number is at byte 213 as 2-bytes Chunks: 8 shots x 2 cables x 256 channels x 512 samples –header-locations 9,213,13 –header-names shot,cable,chan –header-lengths 4,2,4,4 –chunk-size 8,2,256,512
Hi!
I'm playing around with mdio and came across this error on one of my test files
mdio = MDIOReader(
"abfs://somefile.mdio",
return_metadata=True,
storage_options={ ... }
)
_ = mdio[:,:,:100] // KeyError: somefile.mdio/data/chunked_012/5/3/24'
The files is stored in an azure blob store. The error is correct in the sense that the chunk does not exist. The survey is irregular and the segy is padded around the edges to make a nice regular grid. Fill-value is 0, both in zarr and the original segy. Hence I wonder if the missing chunks is a zarr-optimisation:
There is no need for all chunks to be present within an array store. If a chunk is not present then it is considered to be in an uninitialized state. An uninitialized chunk MUST be treated as if it was uniformly filled with the value of the “fill_value” field in the array metadata. If the “fill_value” field isnull then the contents of the chunk are undefined [1].
But I havn't investigated further. It might also be user-error. Any suggestions for the root cause or a solution for this?
Thanks,
[1] https://zarr.readthedocs.io/en/stable/spec/v2.html#chunks
We have a use case where we need multiple chunking schemes for the same file. It needs to take a list of strings with different chunking patterns (access_pattern
keyword) instead of a single string.
mdio_to_segy requires a significant amount of memory for large datasets. This seems related to dask/dask#10535
Test dask release: 2023.10.0
@markspec thinks that it might be a useful idea to have the ability to pass a Client
to MDIOReader
.
Currently ingesting on:
leads to a sparse representation of the data if gun is added as an indexing key. This could be handled by performing modulo division of the ffid by the number of guns in most standard towed-streamer acquisitions.
Enabling ingestion with gun will lead to improved support for seismic processing of pre-stack data (as many pre-stack operations are on common channel gathers), ML training for common-channel processes and data visualization and QC on common-channel gathers.
copy_mdio
function does not honor the separator when a file is copied.
Most of our files are ingested with the separator being /
but when it gets copied, the separator becomes .
In segyio, the buf
is ALWAYS big endian regardless of the segy bytorder. Thus, def header_scan_worker()
will result in incorrect header values for the snippet below from src/mdio/segy/_workers.py
:
formats = [type_.numpy_dtype.newbyteorder(endian) for type_ in byte_types]
An issue was raised with segyio issues #559. Confirmed in discussion on Slack.
Example of behavior can be found at anthonytorlucci/segyio_header_buf.
When user error leads to incorrect values being read for headers, the live mask can be extremely large leading to OOM error. MDIO should QC the dimensions before generating a live mask and raise an exception if a clear error has been made with some meaningful warnings.
The following code will trigger a (near) endless recursion of subprocesses, causing all the conversions to be attempted in parallel, failing all of them:
(I am not using Dask)
import os
from mdio import segy_to_mdio
input_segy = 'C:\\Demo\\Data\\ST10010ZC11_PZ_PSDM_KIRCH_FAR_D.MIG_FIN.POST_STACK.3D.JS-017536.segy'
for level in [0, 0.01, 0.02, 1.0, 10.0]:
compressed_mdio_file = os.path.join(os.path.splitext(input_segy)[0] + '_' + str(level) + '.mdio')
segy_to_mdio(segy_path=input_segy,
mdio_path_or_buffer=compressed_mdio_file,
index_bytes=(189, 193),
index_names=('inline', 'crossline'),
compression_tolerance=level,
lossless=(level == 0))
print(f'Converted {input_segy} @ {level} tolerance, to {compressed_mdio_file}')
print('Complete')
What I see is a constant looping of the following output:
C:\sources\mdio_test (main)
λ python mdio_test.py
Scanning SEG-Y for geometry attributes: 0%| | 0/6 [00:00<?, ?block/s]
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\sources\nDBenchmark\mdio_test.py", line 9, in <module>
segy_to_mdio(segy_path=input_segy,
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\site-packages\mdio\converters\segy.py", line 177, in segy_to_mdio
dimensions, index_headers = get_grid_plan(
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\site-packages\mdio\segy\utilities.py", line 53, in get_grid_plan
index_headers = parse_trace_headers(
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\site-packages\mdio\segy\parsers.py", line 120, in parse_trace_headers
with Pool(num_workers) as pool:
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\pool.py", line 212, in __init__
self._repopulate_pool()
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\pool.py", line 326, in _repopulate_pool_static
w.start()
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\spawn.py", line 125, in _main
prepare(preparation_data)
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\sources\nDBenchmark\mdio_test.py", line 9, in <module>
segy_to_mdio(segy_path=input_segy,
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\site-packages\mdio\converters\segy.py", line 177, in segy_to_mdio
dimensions, index_headers = get_grid_plan(
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\site-packages\mdio\segy\utilities.py", line 53, in get_grid_plan
index_headers = parse_trace_headers(
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\site-packages\mdio\segy\parsers.py", line 120, in parse_trace_headers
with Pool(num_workers) as pool:
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\context.py", line 119, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild,
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\pool.py", line 212, in __init__
self._repopulate_pool()
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\pool.py", line 303, in _repopulate_pool
return self._repopulate_pool_static(self._ctx, self.Process,
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\pool.py", line 326, in _repopulate_pool_static
w.start()
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "C:\Users\mstormo\.pyenv\pyenv-win\versions\3.8.9\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
In the MDIO ‘get started in 10 minutes’, in the example below, it says dimensions = “inline”, but the plot seems to be for a crossline. Should we use dimensions=”crossline” here instead?
crossline_index = int(mdio.coord_to_index(100, dimensions="inline"))
xl_mask, xl_headers, xl_data = mdio[:, crossline_index]
Hey guys
Nice work with SEG-Y loader! At our team, we use our own library to interact with SEG-Y data, so I've decided to give a try to MDIO and compare the results of multiple approaches and libraries.
For my tests, I've used a ~21GB sized SEG-Y with IEEE float32 values (no IBM float shenanigans here).
The cube is post-stack, so it is a cube for seismic interpretation: therefore, it has a meaningful regular 3D structure.
Using the same function you provided in tutorials, get_size
, I've got the following results:
SEG-Y: 21504.17 MB
MDIO: 3943.37 MB
MDIO LOSSY+: 1723.78 MB
SEG-Y QUANTIZED: 5995.96 MB
The LOSSY+ MDIO file was created by using compression_tolerance=(std of amplitude values)
.
The SEG-Y QUANTIZED file was created by quantizing the data and writing SEG-Y file (according to the standard) with int8 values.
The system is using Intel(R) Xeon(R) Gold 6242R CPU, just in case that may be of interest.
Multiple formats are tested against the tasks of loading slices (2D arrays) across three dimensions: INLINE, CROSSLINE, SAMPLES.
Also the ability to load sub-volumes (3D arrays) is tested.
For more advanced usage I have tests for loading batches of data: more on that later.
For tests, I use following engines:
segyio
-- public functions from this great librarysegfast
-- our in-house library for loading any SEG-Y cubessegfast with segyio engine
-- essentially, a better cooked segyio
where we use their private methodsseismiqb
-- our library for seismic interpretation (optimized for post-stack cubes) onlyseismiqb HDF5
-- converts the data to HDF5 (very similar to zarr you use)segfast quantized
-- automatically quantized (optimally in some information sense) SEG-Y data is written with int8 dtypeTo this slew of engines, I've added MDIO
loader, which looks very simple:
slide_i = mdio_file[idx, :, :][2]
slide_x = mdio_file[:, idx, :][2]
slide_d = mdio_file[:, :, idx][2]
I also used mdio_file._traces[idx, :, :]
but have not noticed significant differences.
An image is better than a thousand words, so a bar-plot of timings for loading INLINE slices:
The situation does not get better on CROSSLINE / SAMPLES axes either:
Note that even naive segyio
, which takes a full sweep across file to get depth-slice, has the same speed.
Some of the reasons for this slowness are apparent: during the conversion process, the default chunk_size for ZARR is 64x64x64
. Therefore, loading 2D slices is not the forte of this exact chunking.
Unfortunately, even when it comes to 3D sub-volumes, the situation is not much better:
Even with this being the best (and only) scenario for chunked storage, it is still not as fast as plain SEG-Y storage, even with no quantization.
This leaves a few questions:
1x64x64
or somewhat like that), but that would be a mess of lots of files;I hope you can help me with those questions!
This is regarding regularized CDP offset gathers and is pre-stack.
We need to define good default chunk sizes that satisfy multiple use cases (see access patterns below).
3D acquisition:
2D acquisition:
+ Time or depth for both.
The distributed workers flatten the chunks along the first dimension to write to SEG-Y.
Huge files >2TB use a lot of memory during export.
The output sharding strategy needs to be optimized:
mdio-python/src/mdio/converters/mdio.py
Lines 177 to 241 in 03b9e4f
and
mdio-python/src/mdio/segy/creation.py
Line 111 in 03b9e4f
Add a few examples/tutorials showing how mdio can be used with ML/DL libraries.
I'm trying to convert a sample segy file to a mdio file with
Python Version 3.11.7 on Mac M2 is having issues it seems (or perhaps I'm doing something wrong)
(I also tried 3.9.18 and got a similar error)
Hopefully this helps in debugging:
$ python3 -m pip install multidimio
Requirement already satisfied: multidimio in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (0.6.0)
Requirement already satisfied: click<9.0.0,>=8.1.7 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from multidimio) (8.1.7)
Requirement already satisfied: click-params<0.6.0,>=0.5.0 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from multidimio) (0.5.0)
Requirement already satisfied: dask>=2023.10.0 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from multidimio) (2024.1.1)
Requirement already satisfied: fsspec>=2023.9.1 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from multidimio) (2024.2.0)
Requirement already satisfied: numba<0.60.0,>=0.59.0rc1 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from multidimio) (0.59.0)
Requirement already satisfied: psutil<6.0.0,>=5.9.5 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from multidimio) (5.9.8)
Requirement already satisfied: segyio<2.0.0,>=1.9.3 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from multidimio) (1.9.12)
Requirement already satisfied: tqdm<5.0.0,>=4.66.1 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from multidimio) (4.66.1)
Requirement already satisfied: urllib3<2.0.0,>=1.26.18 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from multidimio) (1.26.18)
Requirement already satisfied: zarr<3.0.0,>=2.16.1 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from multidimio) (2.16.1)
Requirement already satisfied: deprecated<2.0.0,>=1.2.14 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from click-params<0.6.0,>=0.5.0->multidimio) (1.2.14)
Requirement already satisfied: validators<0.23,>=0.22 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from click-params<0.6.0,>=0.5.0->multidimio) (0.22.0)
Requirement already satisfied: cloudpickle>=1.5.0 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from dask>=2023.10.0->multidimio) (3.0.0)
Requirement already satisfied: packaging>=20.0 in /Users/macuser/.local/lib/python3.11/site-packages (from dask>=2023.10.0->multidimio) (23.2)
Requirement already satisfied: partd>=1.2.0 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from dask>=2023.10.0->multidimio) (1.4.1)
Requirement already satisfied: pyyaml>=5.3.1 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from dask>=2023.10.0->multidimio) (6.0.1)
Requirement already satisfied: toolz>=0.10.0 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from dask>=2023.10.0->multidimio) (0.12.1)
Requirement already satisfied: importlib-metadata>=4.13.0 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from dask>=2023.10.0->multidimio) (7.0.1)
Requirement already satisfied: llvmlite<0.43,>=0.42.0dev0 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from numba<0.60.0,>=0.59.0rc1->multidimio) (0.42.0)
Requirement already satisfied: numpy<1.27,>=1.22 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from numba<0.60.0,>=0.59.0rc1->multidimio) (1.26.3)
Requirement already satisfied: asciitree in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from zarr<3.0.0,>=2.16.1->multidimio) (0.3.3)
Requirement already satisfied: fasteners in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from zarr<3.0.0,>=2.16.1->multidimio) (0.19)
Requirement already satisfied: numcodecs>=0.10.0 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from zarr<3.0.0,>=2.16.1->multidimio) (0.12.1)
Requirement already satisfied: wrapt<2,>=1.10 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from deprecated<2.0.0,>=1.2.14->click-params<0.6.0,>=0.5.0->multidimio) (1.16.0)
Requirement already satisfied: zipp>=0.5 in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from importlib-metadata>=4.13.0->dask>=2023.10.0->multidimio) (3.17.0)
Requirement already satisfied: locket in /Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages (from partd>=1.2.0->dask>=2023.10.0->multidimio) (1.0.0)
$ cat test.py
from mdio import segy_to_mdio
segy_to_mdio(
segy_path="volve.segy",
mdio_path_or_buffer="volve1.mdio",
index_bytes=(189, 193),
index_names=("inline", "crossline"),
lossless = True
)
$ python3 test.py
Scanning SEG-Y for geometry attributes: 0%| | 0/6 [00:00<?, ?block/s]Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
prepare(preparation_data)
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
prepare(preparation_data)
_fixup_main_from_path(data['init_main_from_path'])
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
_fixup_main_from_path(data['init_main_from_path'])
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen runpy>", line 291, in run_path
File "<frozen runpy>", line 98, in _run_module_code
File "<frozen runpy>", line 88, in _run_code
File "/Users/macuser/Exp/cname/mdio-python/macuser/test.py", line 3, in <module>
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
segy_to_mdio(
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages/mdio/converters/segy.py", line 352, in segy_to_mdio
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen runpy>", line 291, in run_path
File "<frozen runpy>", line 98, in _run_module_code
File "<frozen runpy>", line 88, in _run_code
File "/Users/macuser/Exp/cname/mdio-python/macuser/test.py", line 3, in <module>
segy_to_mdio(
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages/mdio/converters/segy.py", line 352, in segy_to_mdio
dimensions, chunksize, index_headers = get_grid_plan(
dimensions, chunksize, index_headers = get_grid_plan(
^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages/mdio/segy/utilities.py", line 66, in get_grid_plan
^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages/mdio/segy/utilities.py", line 66, in get_grid_plan
index_headers = parse_trace_headers(
^^^^^^^^^^^^^^^^^^^^
index_headers = parse_trace_headers(
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages/mdio/segy/parsers.py", line 119, in parse_trace_headers
^^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages/mdio/segy/parsers.py", line 119, in parse_trace_headers
lazy_work = executor.map(
^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/process.py", line 837, in map
lazy_work = executor.map(
^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/process.py", line 837, in map
results = super().map(partial(_process_chunk, fn),
results = super().map(partial(_process_chunk, fn),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/_base.py", line 608, in map
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/_base.py", line 608, in map
fs = [self.submit(fn, *args) for args in zip(*iterables)]
fs = [self.submit(fn, *args) for args in zip(*iterables)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/_base.py", line 608, in <listcomp>
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/_base.py", line 608, in <listcomp>
fs = [self.submit(fn, *args) for args in zip(*iterables)]
fs = [self.submit(fn, *args) for args in zip(*iterables)]
^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/process.py", line 808, in submit
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/process.py", line 808, in submit
self._adjust_process_count()
self._adjust_process_count()
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/process.py", line 767, in _adjust_process_count
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/process.py", line 767, in _adjust_process_count
self._spawn_process()
self._spawn_process()
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/process.py", line 785, in _spawn_process
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/process.py", line 785, in _spawn_process
p.start()
p.start()
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/process.py", line 121, in start
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
return Popen(process_obj)
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in __init__
^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
super().__init__(process_obj)
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/popen_fork.py", line 19, in __init__
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/popen_fork.py", line 19, in __init__
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
self._launch(process_obj)
self._launch(process_obj)
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 42, in _launch
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
prep_data = spawn.get_preparation_data(process_obj._name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/spawn.py", line 164, in get_preparation_data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/spawn.py", line 164, in get_preparation_data
_check_not_importing_main()
prepare(preparation_data)
_check_not_importing_main()
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/spawn.py", line 140, in _check_not_importing_main
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/spawn.py", line 140, in _check_not_importing_main
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
To fix this issue, refer to the "Safe importing of main module"
section in https://docs.python.org/3/library/multiprocessing.html
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
To fix this issue, refer to the "Safe importing of main module"
section in https://docs.python.org/3/library/multiprocessing.html
_fixup_main_from_path(data['init_main_from_path'])
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen runpy>", line 291, in run_path
File "<frozen runpy>", line 98, in _run_module_code
File "<frozen runpy>", line 88, in _run_code
File "/Users/macuser/Exp/cname/mdio-python/macuser/test.py", line 3, in <module>
segy_to_mdio(
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages/mdio/converters/segy.py", line 352, in segy_to_mdio
dimensions, chunksize, index_headers = get_grid_plan(
^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages/mdio/segy/utilities.py", line 66, in get_grid_plan
index_headers = parse_trace_headers(
^^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages/mdio/segy/parsers.py", line 119, in parse_trace_headers
lazy_work = executor.map(
^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/process.py", line 837, in map
results = super().map(partial(_process_chunk, fn),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/_base.py", line 608, in map
fs = [self.submit(fn, *args) for args in zip(*iterables)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/_base.py", line 608, in <listcomp>
fs = [self.submit(fn, *args) for args in zip(*iterables)]
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/process.py", line 808, in submit
self._adjust_process_count()
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/process.py", line 767, in _adjust_process_count
self._spawn_process()
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/process.py", line 785, in _spawn_process
p.start()
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/spawn.py", line 164, in get_preparation_data
_check_not_importing_main()
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/multiprocessing/spawn.py", line 140, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
To fix this issue, refer to the "Safe importing of main module"
section in https://docs.python.org/3/library/multiprocessing.html
Scanning SEG-Y for geometry attributes: 0%| | 0/6 [00:00<?, ?block/s]
Traceback (most recent call last):
File "/Users/macuser/Exp/cname/mdio-python/macuser/test.py", line 3, in <module>
segy_to_mdio(
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages/mdio/converters/segy.py", line 352, in segy_to_mdio
dimensions, chunksize, index_headers = get_grid_plan(
^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages/mdio/segy/utilities.py", line 66, in get_grid_plan
index_headers = parse_trace_headers(
^^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages/mdio/segy/parsers.py", line 139, in parse_trace_headers
headers = list(lazy_work)
^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/site-packages/tqdm/std.py", line 1182, in __iter__
for obj in iterable:
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/process.py", line 620, in _chain_from_iterable_of_lists
for element in iterable:
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/_base.py", line 619, in result_iterator
yield _result_or_cancel(fs.pop())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/_base.py", line 317, in _result_or_cancel
return fut.result(timeout)
^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/Users/macuser/.pyenv/versions/3.11.7/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
=============================================================
On Python 3.8.18 on Ubuntu 22.04.03 LTS, I get the following error:
$ python3 test.py
Scanning SEG-Y for geometry attributes: 100%|██████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 7.85block/s]
Traceback (most recent call last):
File "test.py", line 3, in <module>
segy_to_mdio(
File "/home/ubuntu_user/.pyenv/versions/3.8.18/lib/python3.8/site-packages/mdio/converters/segy.py", line 329, in segy_to_mdio
write_attribute(name="created", zarr_group=zarr_root, attribute=iso_datetime)
File "/home/ubuntu_user/.pyenv/versions/3.8.18/lib/python3.8/site-packages/mdio/core/utils_write.py", line 17, in write_attribute
zarr_group.attrs[name] = attribute
File "/home/ubuntu_user/.pyenv/versions/3.8.18/lib/python3.8/site-packages/zarr/attrs.py", line 89, in __setitem__
self._write_op(self._setitem_nosync, item, value)
File "/home/ubuntu_user/.pyenv/versions/3.8.18/lib/python3.8/site-packages/zarr/attrs.py", line 83, in _write_op
return f(*args, **kwargs)
File "/home/ubuntu_user/.pyenv/versions/3.8.18/lib/python3.8/site-packages/zarr/attrs.py", line 94, in _setitem_nosync
d = self._get_nosync()
File "/home/ubuntu_user/.pyenv/versions/3.8.18/lib/python3.8/site-packages/zarr/attrs.py", line 47, in _get_nosync
d = self.store._metadata_class.parse_metadata(data)
File "/home/ubuntu_user/.pyenv/versions/3.8.18/lib/python3.8/site-packages/zarr/meta.py", line 104, in parse_metadata
meta = json_loads(s)
File "/home/ubuntu_user/.pyenv/versions/3.8.18/lib/python3.8/site-packages/zarr/util.py", line 75, in json_loads
return json.loads(ensure_text(s, "utf-8"))
File "/home/ubuntu_user/.pyenv/versions/3.8.18/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/home/ubuntu_user/.pyenv/versions/3.8.18/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/ubuntu_user/.pyenv/versions/3.8.18/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Here is a simple repro:
$ conda create --prefix <path_to_your_env> python=3.9
$ conda activate <path_to_your_env>
$ pip install multidimio
$ mdio --version
raises RuntimeError
with the error message as follows:
RuntimeError: 'mdio' is not installed. Try passing 'package_name' instead.
Read the docs is failing.
Investigation shows this is the issue:
Need to update Furo to newer version once this is merged.
mdio copy function is very slow. It starts the copy but nothing happens for a long time on the cpu. Need to profile.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.