Comments (4)
I found a solution that works nicely without xarray. It requires adding two packages as dependencies for kamodo-ccmc, but removes two other packages. The two packages needed are s3fs and h5netcdf, both installable via pip. h5py and netCDF4 are no longer needed.
The previous method of accessing nc files with netCDF4.Dataset can be replaced with the method below, which works for files stored in s3 buckets and in normal storage.
from h5netcdf.legacyapi import Dataset as Dataset_leg
import s3fs
def Dataset(filename, access='r'):
if filename[:2] == 's3':
s3=s3fs.S3FileSystem(anon=False)
fgrab = s3.open(filename, access+'b')
return Dataset_leg(fgrab)
else:
return Dataset_leg(filename, access)
Notice that the new definition of 'Dataset' automatically performs the correct operation based on whether the file is stored in an s3 bucket or in normal storage. This should be tested on WACCM-X files because the h0 files are produced with the 'NETCDF3_64BIT_OFFSET' option due to the large file sizes generated. This code can go into the reader_utilities.py script and be imported from there so that only the import statements in the effected readers need to be changed.
The normal file search method using glob will need to be replaced by the code below, which also automatically performs the correct operation based on the filename. This code would go nicely in the reader_utilities.py script, from which glob should be imported for all uses. Then, only the import statement in the readers will need to be changed.
from glob import glob as glob_leg
import s3fs
def glob(file_pattern):
if file_pattern[:2] == 's3':
s3 = s3fs.S3FileSystem(anon=False)
s3_files = sorted(s3.glob(file_pattern))
return ['s3://'+f for f in s3_files]
else:
return glob_leg(file_pattern)
The code to replace calls to h5py is
import h5netcdf as h5py # works for s3 and efs
import s3fs
def convert(filename, access='r'):
if filename[:2] == 's3':
s3=s3fs.S3FileSystem(anon=False)
fgrab = s3.open(filename, access+'b')
return [fgrab]
else:
return [filename, access]
h5_data = h5py.File(*convert(filename))
where convert should be stored in the reader_utilities.py script. This remains to be tested in the relevant readers. h5netcdf and h5netcdf.legacyapi.Dataset both break for the normal/efs case if the file object is given instead of the filename. Note that this does NOT enable writing netcdf/h5 files to s3, so file conversions on the cloud will not be supported.
Since all of the file formats after file conversions are either .h5, .nc or .out files, this reduces the remaining file access problem to reading the two text files produced by each reader and the general I/O in SF_output.py. The open statements in the read_timelist function in reader_utilities.py should be replaced with a call to the function below, which should offer the same resulting behavior for text files on local/efs or s3 storage. This has not been tested.
import s3fs
def _open(filename):
if filename[:2] == 's3':
s3 = s3fs.S3FileSystem(anon=False)
return s3.open(filename)
else:
return open(filename)
Reading the csv and ascii files from s3 may be as simple as replacing line 149 in SFcsv_reader with a call to the function above. The behavior of this function with the csv package has not been tested. Writing csv and ascii files directly to s3 buckets may be possible with the function, but this has not been tested. A related issue on xarray's github may be useful if others are interested in writing files to s3.
from kamodo.
Does _open() not work as file argument to IdlFile()?
from kamodo.
No, because spacepy performs the open command. The change has to happen on the spacepy side for the s3 issue.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[5], line 1
----> 1 MW.Variable_Search('magnetic', model, file_dir)
File ~/efs/raringuette/Kamodo/kamodo_ccmc/flythrough/model_wrapper.py:279, in Variable_Search(search_string, model, file_dir, return_dict)
277 return new_dict
278 elif file_dir != '' and model != '':
--> 279 ko_var_dict = Model_Variables(model, file_dir, return_dict=True)
280 new_dict = {name: [value[0], value[-4]+'-'+value[-3],
281 value[-2], value[-1]] for name, value in
282 ko_var_dict.items() if search_string in value[0].lower()}
283 if new_dict == {}:
File ~/efs/raringuette/Kamodo/kamodo_ccmc/flythrough/model_wrapper.py:184, in Model_Variables(model, file_dir, return_dict)
182 else:
183 reader = Model_Reader(model)
--> 184 ko = reader(file_dir, variables_requested='all')
186 # either return or print nested_dictionary
187 if return_dict:
File ~/efs/raringuette/Kamodo/kamodo_ccmc/readers/swmfgm_4D.py:128, in MODEL.<locals>.MODEL.__init__(self, file_dir, variables_requested, filetime, verbose, gridded_int, printfiles, **kwargs)
126 patterns = unique([basename(f)[:10] for f in files])
127 # get time grid from files
--> 128 dt = sp.IdlFile(RU._open(files[0]),
129 sort_unstructured=False).attrs['time']
130 if dt is not None: # filedate given not always at midnight
131 self.filedate = datetime.strptime(
132 dt.isoformat()[:10], '%Y-%m-%d').replace(
133 tzinfo=timezone.utc)
File ~/users_conda_envs/PyHCs3/lib/python3.10/site-packages/spacepy/pybats/__init__.py:1220, in IdlFile.__init__(self, filename, iframe, header, keep_case, sort_unstructured, *args, **kwargs)
1216 super(IdlFile, self).__init__(*args, **kwargs) # Init as PbData.
1218 # Gather information about the file: format, endianess (if necessary),
1219 # number of picts/frames, etc.:
-> 1220 fmt, endchar, inttype, floattype = _probe_idlfile(filename)
1221 self.attrs['file'] = filename # Save file name.
1222 self.attrs['format'] = fmt # Save file format.
File ~/users_conda_envs/PyHCs3/lib/python3.10/site-packages/spacepy/pybats/__init__.py:807, in _probe_idlfile(filename)
804 inttype = np.dtype(np.int32)
805 floattype = np.dtype(np.float32)
--> 807 with open(filename, 'rb') as f:
808 # On the first try, we may fail because of wrong-endianess.
809 # If that is the case, swap that endian and try again.
810 inttype.newbyteorder(endian)
812 try:
813 # Try to parse with little endian byte ordering:
TypeError: expected str, bytes or os.PathLike object, not TextIOWrapper
from kamodo.
This issue is solved in the pull request #131, both for netCDF4 and netCDF3 files (and for h5 files, too), with the exceptions noted is this issue.
from kamodo.
Related Issues (20)
- Module kamodo_ccmc.readers.OpenGGCM.readOpenGGCM not found HOT 2
- Requesting a DOI for Kamodo
- Support Python 3.8, 3.9, 3.10, and 3.11
- interpolate_amrdata_extension_build.py errors on Windows10 HOT 2
- Missing Project Description
- Several packages are missing in requrements.txt HOT 6
- Improve README - Equations
- The README file in this repo has 2 bad links - [404:NotFound]
- Kamodo won't accept units 'N', 'S', or 'Ohm' HOT 4
- Errors calling kamodo from C++ HOT 2
- Getting empty argument list from kamodo
- Invalid path for windows.
- Flythrough won't import HOT 1
- SFcsv_reader leaving unclosed file HOT 1
- Installation notes HOT 2
- Move documentation from binary pdf files to plain text files
- Add to pip
- Add this to conda-forge
- requirements.txt HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kamodo.