lesommer / oocgcm Goto Github PK
View Code? Open in Web Editor NEWoocgcm is a python library for the analysis of large gridded geophysical dataset.
Home Page: http://oocgcm.rtfd.io
License: Apache License 2.0
oocgcm is a python library for the analysis of large gridded geophysical dataset.
Home Page: http://oocgcm.rtfd.io
License: Apache License 2.0
following a discussion with @toddringler :
it would be nice to have a internal representation of the geometric features used for defining spatial integrals (horizontal averages, flux across section).
this internal representation could be defined from / exported to geojson files.
I am not sur whether I am using CF standart names 'projection_x_coordinate' and 'projection_y_coordinate' appropriately.
The current version of oocgm assumes that projection coordinates refer to lat/lon and have no unit. I this correct ?
a slice of a grid object defined with chunks appears to be disfunctionnal.
eg.
from oocgcm.oceanmodels.nemo import grids
_grd = grids.nemo_2d_grid(...,chunks=xr_chunks)
grd = _grd[500:1500,500:1500]
this works :
gssh = grd.horizontal_gradient(buoyancy)
but this raises an exception
ghb = grd.norm_of_vectorfield(grd.horizontal_gradient(buoyancy))
see PyDom.eos
In practice this means that one need to use coordinates from grid objects in order
to plot fields and one cannot rely on xarray plotting methods.
This may be an intended choice ... ?
if yes : see this list of xml parser and this library
The functionnality available in GriddedData.py for coarsening a 2d field or reorganizing an array for building statistics out from values in a box would be a neat feature for oocgcm. But the current method works only for 2d numpy array. how to proceed ?
The COMODO norm aims at simplifying the comparison of output from different ocean models. It is a priori a norm for file format.
We can think of using the COMODO standart for naming grid metric arrays in oocgcm.modelgrid.
I have created a Window object that inherits from DataArray. I had some issues with recursive call of the constructor inside the print and plot method. I achieved to fix it for the print method by looking at the DataArray code but I had to override the plot method for the moment. Has anyone tried to create an object that inherits from DataArray ?
I think it is an important issue that needs to be address for future development. I am not sure that the structure of xarray objects has made the inheritance easy.
We have decided to set automatically the unit of the quantities computed with modelgrids (ex : derivatives). Should we use a external package for computing the new units ? which one ?
In the long term, it might be more pythonic for methods related to e.g. a scalar field (eg : its gradient, smoothing etc...) to be attributes of the dataArray itself.
grd = oocgcm.oceanmodels.nemo.grids.nemo_2d_grid(...)
ssh = oocgcm.oceanmodels.nemo.io.GriddedScalarArray('filessh.nc',grid=grd,grid_location='t')
ssh[0].gradient().divergence().plot() # plots the laplacian of ssh
Hello,
Here is my example: I am reading a u,v field, respectively on U-grid and V-grid but also with respective vertical coordinates: depthu and depthv. I want to compute curl, which will be centered on f-grid. I point out that the issue is similar with vector norm (centered on T-grid)
vec_xr = gridsc.VectorField2d(vi_xr, vj_xr,
x_component_grid_location='u',
y_component_grid_location='v')
#EE mode
vcurl_xr=grd.vertical_component_of_curl(vec_xr)
print vi_xr
print vj_xr
print vec_xr
print vcurl_xr
Here is the output:
<xarray.DataArray (t: 5, depthu: 19, y: 3454, x: 5422)>
dask.array<elemwis..., shape=(5, 19, 3454, 5422), dtype=float64, chunksize=(1, 1, 1727, 2711)>
Coordinates:
nav_lat (y, x) float32 26.5648 26.5648 26.5648 26.5648 26.5648 ...
nav_lon (y, x) float32 -81.4512 -81.4346 -81.4179 -81.4012 ...
* depthu (depthu) float32 0.480455 1.55879 2.79421 4.18731 5.73867 ...
* x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
* y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
* t (t) datetime64[ns] 2008-01-07T12:00:00 ...
time_centered (t) datetime64[ns] 2008-01-07T12:00:00 ...
Attributes:
grid_location: u
<xarray.DataArray (t: 5, depthv: 19, y: 3454, x: 5422)>
dask.array<elemwis..., shape=(5, 19, 3454, 5422), dtype=float64, chunksize=(1, 1, 1727, 2711)>
Coordinates:
nav_lat (y, x) float32 26.5648 26.5648 26.5648 26.5648 26.5648 ...
nav_lon (y, x) float32 -81.4512 -81.4346 -81.4179 -81.4012 ...
* depthv (depthv) float32 0.480455 1.55879 2.79421 4.18731 5.73867 ...
* x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
* y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
* t (t) datetime64[ns] 2008-01-07T12:00:00 ...
time_centered (t) datetime64[ns] 2008-01-07T12:00:00 ...
Attributes:
grid_location: v
VectorField2d(x_component=<xarray.DataArray (t: 5, depthu: 19, y: 3454, x: 5422)>
dask.array<elemwis..., shape=(5, 19, 3454, 5422), dtype=float64, chunksize=(1, 1, 1727, 2711)>
Coordinates:
nav_lat (y, x) float32 26.5648 26.5648 26.5648 26.5648 26.5648 ...
nav_lon (y, x) float32 -81.4512 -81.4346 -81.4179 -81.4012 ...
* depthu (depthu) float32 0.480455 1.55879 2.79421 4.18731 5.73867 ...
* x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
* y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
* t (t) datetime64[ns] 2008-01-07T12:00:00 ...
time_centered (t) datetime64[ns] 2008-01-07T12:00:00 ...
Attributes:
grid_location: u, y_component=<xarray.DataArray (t: 5, depthv: 19, y: 3454, x: 5422)>
dask.array<elemwis..., shape=(5, 19, 3454, 5422), dtype=float64, chunksize=(1, 1, 1727, 2711)>
Coordinates:
nav_lat (y, x) float32 26.5648 26.5648 26.5648 26.5648 26.5648 ...
nav_lon (y, x) float32 -81.4512 -81.4346 -81.4179 -81.4012 ...
* depthv (depthv) float32 0.480455 1.55879 2.79421 4.18731 5.73867 ...
* x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
* y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
* t (t) datetime64[ns] 2008-01-07T12:00:00 ...
time_centered (t) datetime64[ns] 2008-01-07T12:00:00 ...
Attributes:
grid_location: v, x_component_grid_location='u', y_component_grid_location='v')
<xarray.DataArray 'vertical component of the curl' (t: 5, depthv: 19, y: 3454, x: 5422, depthu: 19)>
dask.array<elemwis..., shape=(5, 19, 3454, 5422, 19), dtype=float64, chunksize=(1, 1, 1727, 2711, 1)>
Coordinates:
* depthv (depthv) float32 0.480455 1.55879 2.79421 4.18731 5.73867 ...
* x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
* y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
* t (t) datetime64[ns] 2008-01-07T12:00:00 ...
time_centered (t) datetime64[ns] 2008-01-07T12:00:00 ...
* depthu (depthu) float32 0.480455 1.55879 2.79421 4.18731 5.73867 ...
Attributes:
grid_location: f
Original coordinates of the u,v components are broadcast through the operation (here, curl) and the result is a 4-coordinate variable.
It can be very consequent in memory and writing operations, because it creates a 4-D variable which is N times bigger than the expected 3-D variable, with N as the vertical size.
Best regards,
Simon
Have you already thought about how you wanted to that Julien?
I just discovered this sgrid convention:
https://publicwiki.deltares.nl/display/NETCDF/Deltares+proposal+for+Staggered+Grid+data+model+(SGRID)
It has a python implementation:
https://github.com/sgrid/pysgrid
This could potentially be an alternative to pycomodo, which is not always 100% satisfactory.
Eventually, 3D grids should come with methods for projecting 3D fields onto 2D isosurfaces (ex : temperature on an isopycnal).
This issue gathers informations / ideas as to how to proceed.
random list of question / ideas
probably using masking with where followed by a reduction along the dimension considered.
When a first minimal version is ready for dissemination, we should prepare an automatic procedure for distributing oocgcm through PyPI and conda.
Hi all,
My problem is the following : I want to process simple statistics using the norm of a vector (u/v fields).
When using norm_of_vectorfield, memory gets busier than when writing manually a norm (as if grids were collocated). Fatal outcome is that norm_of_vectorfield may lead to overfill memory, whereas memory is well bounded with a xarray manual writing of the norm.
Here is a simple script showing (watching the monitor) that memory gets higher with norm_of_vectorfield.
import sys, os
import xarray as xr
from xarray import ufuncs as uf
from contextlib import contextmanager
import time
from oocgcm.oceanmodels.nemo import grids
from oocgcm.core import grids as gridsc
#-----------------------------------------------------------------------------------------------------------
@contextmanager
def timeit_context(name):
startTime = time.time()
yield
elapsedTime = time.time() - startTime
print('{} takes {} s'.format(name, int(elapsedTime)))
#-----------------------------------------------------------------------------------------------------------
chunks = (1727, 2711)
xr_chunks = {'x': chunks[-1], 'y': chunks[-2]}
xr_chunks_u = {'x': chunks[-1], 'y': chunks[-2], 'time_counter':1, 'depthu': 1}
xr_chunks_v = {'x': chunks[-1], 'y': chunks[-2], 'time_counter':1, 'depthv': 1}
# - Parameter
natl60_path = '/home7/pharos/othr/NATL60/'
coordfile = natl60_path + 'NATL60-I/NATL60_coordinates_v4.nc'
maskfile = natl60_path + 'NATL60-I/NATL60_v4.1_cdf_byte_mask.nc'
grd = grids.nemo_2d_grid(nemo_coordinate_file=coordfile, nemo_byte_mask_file=maskfile, chunks=xr_chunks)
#30 times, 2 fields, 300 vertical levels
filenameu = natl60_path+'NATL60-MJM155-S/5d/2008/NATL60-MJM155_y2008m0*d*.5d_gridU.nc'
filenamev = natl60_path+'NATL60-MJM155-S/5d/2008/NATL60-MJM155_y2008m0*d*.5d_gridV.nc'
# open u and v components over 10 levels
vi_xr = xr.open_mfdataset(filenameu,chunks=xr_chunks_u)['vozocrtx'].isel(depthu=slice(0,10))
vj_xr = xr.open_mfdataset(filenamev,chunks=xr_chunks_v)['vomecrty'].isel(depthv=slice(0,10))
# vector object
vec_xr = gridsc.VectorField2d(vi_xr,
vj_xr,
x_component_grid_location='u',
y_component_grid_location='v')
#1 Manual vector norm
with timeit_context(' #1'):
print '# euclidian norm'
vnorm_xr=uf.sqrt(uf.square(vec_xr[0])+uf.square(vec_xr[1]))
print vnorm_xr.mean(dim='time_counter').isel(depthv=9,depthu=9).values
#2 oocgcm vector norm
with timeit_context(' #2'):
print '# euclidian norm 2'
vnorm2_xr=grd.norm_of_vectorfield(vec_xr)
print vnorm2_xr.mean(dim='time_counter').isel(depthv=9,depthu=9).values
print 'end'
Thank you very much for any advice,
Simon
Two different questions that have motivated some profiling test. see
I now think that oocgcm should work on xarray only.
Without import statements of the subpackages, it's harder to play around with/experiment with oocgcm as a new user, since you can't access any of the subpackages
In [1]: import oocgcm
In [2]: dir(oocgcm)
Out[2]:
['__all__',
'__builtins__',
'__cached__',
'__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__path__',
'__spec__',
'__version__',
'version']
In [3]: oocgcm.core
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-3-4aa17fd422c5> in <module>()
----> 1 oocgcm.core
AttributeError: module 'oocgcm' has no attribute 'core'
In [4]: import oocgcm.core
In [5]: oocgcm.core
Out[5]: <module 'oocgcm.core' from '/Users/spencer/miniconda3/lib/python3.5/site-packages/oocgcm-0.1.0-py3.5.egg/oocgcm/core/__init__.py'>
So simply adding e.g. from . import core
etc. to the top-level __init__.py
would help. I'm happy to do this as a PR...are there any of the subpackages that you don't want in the public API? Otherwise I can just add them all.
we could think of using something similar to :
mpas xarray wrapper
for equation of state related quantities we need to test the compatibility of chunks and grid_location of xarray dataarrays before applying a function that works on numpy arrays. This can be done with a decorator.
Data objects (scalar, vectors ...) should probably be dataarray that are augmented with grid information (as done with grid_location) and should be in an independent part of the code.
Vector fields (horizontal velocity, gradient of scalar field) are not defined on the same grid.
my current understanding of xarray dataset is that they are defined on the same grid. we can therefore think of providing a new (minimal) datastructure for vector field, or actually two of them (2D/3D vector field).
When a first minimal version of the code is ready for dissemination, we should prepare the procedure for sending the doc on readthedocs.
We could think of a think wrapper for xarray.plot and iris. see
xarray plots
Hello,
My problem is as follows: I want to read a self-made netcdf file with io.return_xarray_mfdataset.
The netcdf header gives:
group: netcdf4 {
dimensions:
y = 3454 ;
x = 5422 ;
t = 1 ;
variables:
float nav_lat(y, x) ;
nav_lat:axis = "Y" ;
nav_lat:standard_name = "latitude" ;
nav_lat:long_name = "Latitude" ;
nav_lat:units = "degrees_north" ;
nav_lat:nav_model = "grid_T" ;
float nav_lon(y, x) ;
nav_lon:axis = "X" ;
nav_lon:standard_name = "longitude" ;
nav_lon:long_name = "Longitude" ;
nav_lon:units = "degrees_east" ;
nav_lon:nav_model = "grid_T" ;
double time_centered(t) ;
time_centered:standard_name = "time" ;
time_centered:long_name = "Time axis" ;
time_centered:title = "Time" ;
time_centered:time_origin = "1958-01-01 00:00:00" ;
time_centered:bounds = "time_centered_bounds" ;
time_centered:units = "seconds since 1958-01-01" ;
time_centered:calendar = "gregorian" ;
double t(t) ;
t:axis = "T" ;
t:standard_name = "time" ;
t:long_name = "Time axis" ;
t:title = "Time" ;
t:time_origin = "1958-01-01 00:00:00" ;
t:bounds = "time_counter_bounds" ;
t:units = "seconds since 1958-01-01" ;
t:calendar = "gregorian" ;
float sossheig(t, y, x) ;
sossheig:_FillValue = 0.f ;
sossheig:long_name = "sea surface height" ;
sossheig:units = "m" ;
sossheig:online_operation = "average" ;
sossheig:interval_operation = "40s" ;
sossheig:interval_write = "5d" ;
sossheig:coordinates = "nav_lon nav_lat time_centered" ;
} // group netcdf4
My code is:
from oocgcm.core import io
chunks = (1727, 2711)
xr_chunks_tmean = {'y': chunks[-2], 'x': chunks[-1], 't':1}
vmean_xrt =io.return_xarray_mfdataset(filemean, chunks=xr_chunks_tmean)[vdict[vkey]['vname']][:]
I get the error output
ValueError: some chunks keys are not dimensions on this object: ['y', 'x', 't']
numpy.gradient compute the spatial derivatives with a second order scheme except near boundaries where the derivatives are evaluated with a first order scheme. oocgcm.core.grids._horizontal_gradient wraps numpy.gradient for xarray.DataArray with dask.
in order for the result not to depend on the chunk size, we need to set depth and use dask.array.map_overlap.
but it is not clear how to specify that the calculation should be first order near the boundary when the chunk is close to the boundary. numpy.gradient sees the ghost cell and therefore applies the second order derivative...
the current implementation explicitely sets a np.nan boundary condition to avoid errors.
this could be a question for dask mailing list.
I have worked on adding the grid of the mitgcm, it requires a few hack to make it work with the current API:
I have successfully computed some gradients but the laplacian does not work. I think there is some work to do to test if oocgcm apply well to different models.
I have pushed a branch to keep working on it, @rabernat you might be interested in.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.