lesommer / oocgcm Goto Github PK

View Code? Open in Web Editor NEW

38.0 38.0 11.0 3 MB

oocgcm is a python library for the analysis of large gridded geophysical dataset.

Home Page: http://oocgcm.rtfd.io

License: Apache License 2.0

Python 100.00%

dask geoscientific ocean ocean-models python python-packages xarray

oocgcm's People

Contributors

Stargazers

Watchers

Forkers

rabernat serazing rpennel alaindomissy spencerahill suyashbire1 noc-msm roxyboy wydh

oocgcm's Issues

use geojson for defining spatial diagnostics.

following a discussion with @toddringler :

it would be nice to have a internal representation of the geometric features used for defining spatial integrals (horizontal averages, flux across section).

this internal representation could be defined from / exported to geojson files.

see : https://github.com/MPAS-Dev/geometric_features

Shall I rename projection coordinates ?

I am not sur whether I am using CF standart names 'projection_x_coordinate' and 'projection_y_coordinate' appropriately.

The current version of oocgm assumes that projection coordinates refer to lat/lon and have no unit. I this correct ?

Problem when slicing a NEMO 2d grid objects defined with dask chunks.

a slice of a grid object defined with chunks appears to be disfunctionnal.

eg.

from oocgcm.oceanmodels.nemo import grids
_grd = grids.nemo_2d_grid(...,chunks=xr_chunks)
grd = _grd[500:1500,500:1500]

this works :

gssh = grd.horizontal_gradient(buoyancy)

but this raises an exception

ghb = grd.norm_of_vectorfield(grd.horizontal_gradient(buoyancy))

Porting equation of state related quantities from PyDom.

see PyDom.eos

PyDom.eos to oocgcm to oocgcm.oceanfuncs.
possibly in a oocgcm.oceanmodels.eos first for model specific versions of eos.
possibly with numba for non-trivial equations.

write oocgcm setup.py

lon/lat information in xarray_dataset does not go through grid functions (norm_of_vectorfield for example)

In practice this means that one need to use coordinates from grid objects in order
to plot fields and one cannot rely on xarray plotting methods.
This may be an intended choice ... ?

Should we use xml files and a xml parser for cf/pycomodo sdtnames (+xios) ?

if yes : see this list of xml parser and this library

Use rst syntax for in-line documentation

How to reimplement grid coarsening and statistics-in-boxes in oocgcm ?

The functionnality available in GriddedData.py for coarsening a 2d field or reorganizing an array for building statistics out from values in a box would be a neat feature for oocgcm. But the current method works only for 2d numpy array. how to proceed ?

Add method for Okubo-Weiss parameter in core.generic_2d_grid

Should the naming convention for arrays follow the COMODO specifications ?

The COMODO norm aims at simplifying the comparison of output from different ocean models. It is a priori a norm for file format.

We can think of using the COMODO standart for naming grid metric arrays in oocgcm.modelgrid.

Inheritance of DataArray

I have created a Window object that inherits from DataArray. I had some issues with recursive call of the constructor inside the print and plot method. I achieved to fix it for the print method by looking at the DataArray code but I had to override the plot method for the moment. Has anyone tried to create an object that inherits from DataArray ?
I think it is an important issue that needs to be address for future development. I am not sure that the structure of xarray objects has made the inheritance easy.

Should we use a external package to deal with units ?

We have decided to set automatically the unit of the quantities computed with modelgrids (ex : derivatives). Should we use a external package for computing the new units ? which one ?

get a zenodo doi for each release

Subclassing DataArray to create a GriddedArray structure that refers to the grid ?

In the long term, it might be more pythonic for methods related to e.g. a scalar field (eg : its gradient, smoothing etc...) to be attributes of the dataArray itself.

grd = oocgcm.oceanmodels.nemo.grids.nemo_2d_grid(...)
ssh = oocgcm.oceanmodels.nemo.io.GriddedScalarArray('filessh.nc',grid=grd,grid_location='t')
ssh[0].gradient().divergence().plot() # plots the laplacian of ssh

Should we use github wiki or a doc folder rendered on readthedocs ?

Suggestion: correct coordinates after curl/norm

Hello,

Here is my example: I am reading a u,v field, respectively on U-grid and V-grid but also with respective vertical coordinates: depthu and depthv. I want to compute curl, which will be centered on f-grid. I point out that the issue is similar with vector norm (centered on T-grid)

            vec_xr = gridsc.VectorField2d(vi_xr, vj_xr,
                                   x_component_grid_location='u',
                                   y_component_grid_location='v')

            #EE mode
            vcurl_xr=grd.vertical_component_of_curl(vec_xr)
            print vi_xr
            print vj_xr
            print vec_xr
            print vcurl_xr

Here is the output:

<xarray.DataArray (t: 5, depthu: 19, y: 3454, x: 5422)>
dask.array<elemwis..., shape=(5, 19, 3454, 5422), dtype=float64, chunksize=(1, 1, 1727, 2711)>
Coordinates:
    nav_lat        (y, x) float32 26.5648 26.5648 26.5648 26.5648 26.5648 ...
    nav_lon        (y, x) float32 -81.4512 -81.4346 -81.4179 -81.4012 ...
  * depthu         (depthu) float32 0.480455 1.55879 2.79421 4.18731 5.73867 ...
  * x              (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
  * y              (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
  * t              (t) datetime64[ns] 2008-01-07T12:00:00 ...
    time_centered  (t) datetime64[ns] 2008-01-07T12:00:00 ...
Attributes:
    grid_location: u

<xarray.DataArray (t: 5, depthv: 19, y: 3454, x: 5422)>
dask.array<elemwis..., shape=(5, 19, 3454, 5422), dtype=float64, chunksize=(1, 1, 1727, 2711)>
Coordinates:
    nav_lat        (y, x) float32 26.5648 26.5648 26.5648 26.5648 26.5648 ...
    nav_lon        (y, x) float32 -81.4512 -81.4346 -81.4179 -81.4012 ...
  * depthv         (depthv) float32 0.480455 1.55879 2.79421 4.18731 5.73867 ...
  * x              (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
  * y              (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
  * t              (t) datetime64[ns] 2008-01-07T12:00:00 ...
    time_centered  (t) datetime64[ns] 2008-01-07T12:00:00 ...
Attributes:
    grid_location: v

VectorField2d(x_component=<xarray.DataArray (t: 5, depthu: 19, y: 3454, x: 5422)>
dask.array<elemwis..., shape=(5, 19, 3454, 5422), dtype=float64, chunksize=(1, 1, 1727, 2711)>
Coordinates:
    nav_lat        (y, x) float32 26.5648 26.5648 26.5648 26.5648 26.5648 ...
    nav_lon        (y, x) float32 -81.4512 -81.4346 -81.4179 -81.4012 ...
  * depthu         (depthu) float32 0.480455 1.55879 2.79421 4.18731 5.73867 ...
  * x              (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
  * y              (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
  * t              (t) datetime64[ns] 2008-01-07T12:00:00 ...
    time_centered  (t) datetime64[ns] 2008-01-07T12:00:00 ...
Attributes:
    grid_location: u, y_component=<xarray.DataArray (t: 5, depthv: 19, y: 3454, x: 5422)>
dask.array<elemwis..., shape=(5, 19, 3454, 5422), dtype=float64, chunksize=(1, 1, 1727, 2711)>
Coordinates:
    nav_lat        (y, x) float32 26.5648 26.5648 26.5648 26.5648 26.5648 ...
    nav_lon        (y, x) float32 -81.4512 -81.4346 -81.4179 -81.4012 ...
  * depthv         (depthv) float32 0.480455 1.55879 2.79421 4.18731 5.73867 ...
  * x              (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
  * y              (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
  * t              (t) datetime64[ns] 2008-01-07T12:00:00 ...
    time_centered  (t) datetime64[ns] 2008-01-07T12:00:00 ...
Attributes:
    grid_location: v, x_component_grid_location='u', y_component_grid_location='v')

<xarray.DataArray 'vertical component of the curl' (t: 5, depthv: 19, y: 3454, x: 5422, depthu: 19)>
dask.array<elemwis..., shape=(5, 19, 3454, 5422, 19), dtype=float64, chunksize=(1, 1, 1727, 2711, 1)>
Coordinates:
  * depthv         (depthv) float32 0.480455 1.55879 2.79421 4.18731 5.73867 ...
  * x              (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
  * y              (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
  * t              (t) datetime64[ns] 2008-01-07T12:00:00 ...
    time_centered  (t) datetime64[ns] 2008-01-07T12:00:00 ...
  * depthu         (depthu) float32 0.480455 1.55879 2.79421 4.18731 5.73867 ...
Attributes:
    grid_location: f

Original coordinates of the u,v components are broadcast through the operation (here, curl) and the result is a 4-coordinate variable.

It can be very consequent in memory and writing operations, because it creates a 4-D variable which is N times bigger than the expected 3-D variable, with N as the vertical size.

Best regards,
Simon

Add a class in order to access vertical coordinates and metrics

Have you already thought about how you wanted to that Julien?

pycomodo vs sgrid

I just discovered this sgrid convention:
https://publicwiki.deltares.nl/display/NETCDF/Deltares+proposal+for+Staggered+Grid+data+model+(SGRID)

It has a python implementation:
https://github.com/sgrid/pysgrid

This could potentially be an alternative to pycomodo, which is not always 100% satisfactory.

Slicing/isosurfaces methods for 3D grids.

Eventually, 3D grids should come with methods for projecting 3D fields onto 2D isosurfaces (ex : temperature on an isopycnal).

This issue gathers informations / ideas as to how to proceed.

What type of tests should we provide ?

random list of question / ideas

provide nc data
a test forlder
use pytest / travis.

How to implement spatial integration methods in oocgcm ?

probably using masking with where followed by a reduction along the dimension considered.

automatic distribution with conda and PyPI

When a first minimal version is ready for dissemination, we should prepare an automatic procedure for distributing oocgcm through PyPI and conda.

Can not manage memory properly with norm_of_vectorfield

Hi all,
My problem is the following : I want to process simple statistics using the norm of a vector (u/v fields).
When using norm_of_vectorfield, memory gets busier than when writing manually a norm (as if grids were collocated). Fatal outcome is that norm_of_vectorfield may lead to overfill memory, whereas memory is well bounded with a xarray manual writing of the norm.

Here is a simple script showing (watching the monitor) that memory gets higher with norm_of_vectorfield.

import sys, os
import xarray as xr
from xarray import ufuncs as uf
from contextlib import contextmanager
import time
from oocgcm.oceanmodels.nemo import grids
from oocgcm.core import grids as gridsc

#-----------------------------------------------------------------------------------------------------------
@contextmanager
def timeit_context(name):
    startTime = time.time()
    yield
    elapsedTime = time.time() - startTime
    print('{} takes {} s'.format(name, int(elapsedTime)))

#-----------------------------------------------------------------------------------------------------------

chunks = (1727, 2711)
xr_chunks = {'x': chunks[-1], 'y': chunks[-2]}
xr_chunks_u = {'x': chunks[-1], 'y': chunks[-2], 'time_counter':1, 'depthu': 1}
xr_chunks_v = {'x': chunks[-1], 'y': chunks[-2], 'time_counter':1, 'depthv': 1}

# - Parameter
natl60_path = '/home7/pharos/othr/NATL60/'
coordfile = natl60_path + 'NATL60-I/NATL60_coordinates_v4.nc'
maskfile = natl60_path + 'NATL60-I/NATL60_v4.1_cdf_byte_mask.nc'

grd = grids.nemo_2d_grid(nemo_coordinate_file=coordfile, nemo_byte_mask_file=maskfile, chunks=xr_chunks)


#30 times, 2 fields, 300 vertical levels
filenameu = natl60_path+'NATL60-MJM155-S/5d/2008/NATL60-MJM155_y2008m0*d*.5d_gridU.nc'
filenamev = natl60_path+'NATL60-MJM155-S/5d/2008/NATL60-MJM155_y2008m0*d*.5d_gridV.nc'


# open u and v components over 10 levels
vi_xr = xr.open_mfdataset(filenameu,chunks=xr_chunks_u)['vozocrtx'].isel(depthu=slice(0,10))
vj_xr = xr.open_mfdataset(filenamev,chunks=xr_chunks_v)['vomecrty'].isel(depthv=slice(0,10))

# vector object
vec_xr = gridsc.VectorField2d(vi_xr, 
                              vj_xr,
                              x_component_grid_location='u',
                              y_component_grid_location='v') 

#1 Manual vector norm
with timeit_context('     #1'):
    print '# euclidian norm'   
    vnorm_xr=uf.sqrt(uf.square(vec_xr[0])+uf.square(vec_xr[1]))
    print vnorm_xr.mean(dim='time_counter').isel(depthv=9,depthu=9).values

#2 oocgcm vector norm
with timeit_context('     #2'):     
    print '# euclidian norm 2'
    vnorm2_xr=grd.norm_of_vectorfield(vec_xr)
    print vnorm2_xr.mean(dim='time_counter').isel(depthv=9,depthu=9).values


print 'end'

Thank you very much for any advice,
Simon

Choose the licence and copy the text in LICENCE file.

Should modelgrids work dask or xarray ? Should we also keep a numpy version ?

Two different questions that have motivated some profiling test. see

the discussion on xarray google group
and the results of the profiling test

I now think that oocgcm should work on xarray only.

Add imports of sub-packages to top-level init.py

Without import statements of the subpackages, it's harder to play around with/experiment with oocgcm as a new user, since you can't access any of the subpackages

In [1]: import oocgcm

In [2]: dir(oocgcm)
Out[2]:
['__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 'version']

In [3]: oocgcm.core
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-4aa17fd422c5> in <module>()
----> 1 oocgcm.core

AttributeError: module 'oocgcm' has no attribute 'core'

In [4]: import oocgcm.core

In [5]: oocgcm.core
Out[5]: <module 'oocgcm.core' from '/Users/spencer/miniconda3/lib/python3.5/site-packages/oocgcm-0.1.0-py3.5.egg/oocgcm/core/__init__.py'>

So simply adding e.g. from . import core etc. to the top-level __init__.py would help. I'm happy to do this as a PR...are there any of the subpackages that you don't want in the public API? Otherwise I can just add them all.

Implement time filtering with xarray Rolling object.

split modelgrids.py into core.grids, core.io, nemo.grids and nemo.io

What tools for facilitating the creation of xarrays from NEMO output files?

we could think of using something similar to :
mpas xarray wrapper

Allow grid object to describe periodic grids (with a periodic x-axis).

A decorator for testing chunks/grid_location.

for equation of state related quantities we need to test the compatibility of chunks and grid_location of xarray dataarrays before applying a function that works on numpy arrays. This can be done with a decorator.

Data objects (scalar, vectors ...)

Data objects (scalar, vectors ...) should probably be dataarray that are augmented with grid information (as done with grid_location) and should be in an independent part of the code.

What sort of data structure for vector fields ?

Vector fields (horizontal velocity, gradient of scalar field) are not defined on the same grid.
my current understanding of xarray dataset is that they are defined on the same grid. we can therefore think of providing a new (minimal) datastructure for vector field, or actually two of them (2D/3D vector field).

automatic build of the documentation on readthedocs

When a first minimal version of the code is ready for dissemination, we should prepare the procedure for sending the doc on readthedocs.

What plotting facilities should be provided with oocgcm ?

We could think of a think wrapper for xarray.plot and iris. see
xarray plots

Problem while reading a self-made netcdf file with io.return_xarray_mfdataset

Hello,

My problem is as follows: I want to read a self-made netcdf file with io.return_xarray_mfdataset.
The netcdf header gives:

group: netcdf4 {
dimensions:
y = 3454 ;
x = 5422 ;
t = 1 ;
variables:
float nav_lat(y, x) ;
nav_lat:axis = "Y" ;
nav_lat:standard_name = "latitude" ;
nav_lat:long_name = "Latitude" ;
nav_lat:units = "degrees_north" ;
nav_lat:nav_model = "grid_T" ;
float nav_lon(y, x) ;
nav_lon:axis = "X" ;
nav_lon:standard_name = "longitude" ;
nav_lon:long_name = "Longitude" ;
nav_lon:units = "degrees_east" ;
nav_lon:nav_model = "grid_T" ;
double time_centered(t) ;
time_centered:standard_name = "time" ;
time_centered:long_name = "Time axis" ;
time_centered:title = "Time" ;
time_centered:time_origin = "1958-01-01 00:00:00" ;
time_centered:bounds = "time_centered_bounds" ;
time_centered:units = "seconds since 1958-01-01" ;
time_centered:calendar = "gregorian" ;
double t(t) ;
t:axis = "T" ;
t:standard_name = "time" ;
t:long_name = "Time axis" ;
t:title = "Time" ;
t:time_origin = "1958-01-01 00:00:00" ;
t:bounds = "time_counter_bounds" ;
t:units = "seconds since 1958-01-01" ;
t:calendar = "gregorian" ;
float sossheig(t, y, x) ;
sossheig:_FillValue = 0.f ;
sossheig:long_name = "sea surface height" ;
sossheig:units = "m" ;
sossheig:online_operation = "average" ;
sossheig:interval_operation = "40s" ;
sossheig:interval_write = "5d" ;
sossheig:coordinates = "nav_lon nav_lat time_centered" ;
} // group netcdf4

My code is:
from oocgcm.core import io
chunks = (1727, 2711)
xr_chunks_tmean = {'y': chunks[-2], 'x': chunks[-1], 't':1}
vmean_xrt =io.return_xarray_mfdataset(filemean, chunks=xr_chunks_tmean)[vdict[vkey]['vname']][:]
I get the error output
ValueError: some chunks keys are not dimensions on this object: ['y', 'x', 't']

Write a .travis.yml file.

Problem with boundary values in .core.grids._horizontal_gradient

numpy.gradient compute the spatial derivatives with a second order scheme except near boundaries where the derivatives are evaluated with a first order scheme. oocgcm.core.grids._horizontal_gradient wraps numpy.gradient for xarray.DataArray with dask.

in order for the result not to depend on the chunk size, we need to set depth and use dask.array.map_overlap.

but it is not clear how to specify that the calculation should be first order near the boundary when the chunk is close to the boundary. numpy.gradient sees the ghost cell and therefore applies the second order derivative...

the current implementation explicitely sets a np.nan boundary condition to avoid errors.

this could be a question for dask mailing list.

oocgcm does not trivially apply to mitgcm outputs

I have worked on adding the grid of the mitgcm, it requires a few hack to make it work with the current API:

I do not have a mask with the mitgcm files so I used the 'cell_vertical_fraction_at_t_location' instead. It seems to work for the moment even if it is not the best solution
I skipped the chunk checking
I have relaxed the condition on the required array and have added a function to generate the the cell_size if needed

I have successfully computed some gradients but the laplacian does not work. I think there is some work to do to test if oocgcm apply well to different models.

I have pushed a branch to keep working on it, @rabernat you might be interested in.