ncas-cms / cfdm Goto Github PK
View Code? Open in Web Editor NEWA Python reference implementation of the CF data model
Home Page: http://ncas-cms.github.io/cfdm
License: MIT License
A Python reference implementation of the CF data model
Home Page: http://ncas-cms.github.io/cfdm
License: MIT License
On the latest master
the ordered
method on a Constructs
class appears to successfully return only for cell methods, returning a ValueError
for a Construct
object containing any other valid and same-type constructs, e.g:
>>> import cfdm
>>> f = cfdm.example_field(6)
>>> c = f.constructs()
>>> c
<Constructs: auxiliary_coordinate(4), coordinate_reference(1), dimension_coordinate(1), domain_axis(2)>
>>> a = c.filter_by_type('auxiliary_coordinate')
>>> a
<Constructs: auxiliary_coordinate(4)>
>>> a.ordered()
{'cell_method'} {'auxiliary_coordinate'}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sadie/cfdm/cfdm/core/constructs.py", line 1240, in ordered
raise ValueError(
ValueError: Can't order un-orderable construct type: <Constructs: auxiliary_coordinate(4)>
>>> b.ordered()
{'cell_method'} {'domain_axis'}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sadie/cfdm/cfdm/core/constructs.py", line 1240, in ordered
raise ValueError(
ValueError: Can't order un-orderable construct type: <Constructs: domain_axis(2)>
where the output lines before the tracebacks are emerging because I added the following print
call into the method for debugging:
diff --git a/cfdm/core/constructs.py b/cfdm/core/constructs.py
index 575ffd342..f09490f8d 100644
--- a/cfdm/core/constructs.py
+++ b/cfdm/core/constructs.py
@@ -1237,6 +1235,7 @@ class Constructs(abstract.Container):
"Can't order multiple construct types: {!r}".format(self)
)
+ print(self._ordered_constructs, set(self._constructs))
if self._ordered_constructs != set(self._constructs):
raise ValueError(
"Can't order un-orderable construct type: {!r}".format(self)
and the first print item is always {'cell_method'}
, demonstrating that only cell method constructs seem to be able to be ordered by the method. But that doesn't seem right, especially as this is a generic construct method as implied by the docstring?
It appears that, once same-type constructs are input, it reaches final logic that only deals with cell method constructs because only they get added to the _ordered_constructs
instance attribute dict in the line:
Line 248 in e908dc8
Note that this behaviour gets passed downstream and also manifests in cf-python (the initial code snippet above has the same results when cf
is substituted for cfdm
).
@davidhassell it would be useful to hear you thoughts: should other types of constructs be processed and if they should, is it the case, as I suspect and as implied by this line in the docstring:
Lines 1205 to 1206 in e908dc8
that cell methods should be treated as a special case such that some logic is missing to handle the ordering of all other constructs? Thanks.
Follow-on from #31. Now the infrastructure for logging is in place, we can make good use of it with some extensions [feel free to add to this listing, anyone!]:
#31 did not add new logging calls, it simply replaced print calls existing at that point, which only existed on any equals
methods & read
or write
functions. We should:
verbose
kwarg to any function that ends up with a significant amount of log calls. Improved display in log calls of objects for readability. For example, using pprint.pformat
to print dictionary or list structures with an item per line, to make it easy to pick out a particular item especially where the structure has many items so it would otherwise be difficult. I have already changed a few messages in read_write.netcdf.netcdfread
as such, e.g:
cfdm/cfdm/read_write/netcdf/netcdfread.py
Lines 670 to 673 in 11fd079
Currently log messages go, as the equivalent print()
statements did, to STDOUT as pure messages i.e. no extra metadata such as datetime stamps are included, just log level, logger name, & the message itself. However I think it would be beneficial to have at least one new handler to provide (all logging messages plus) (date)time stamps, calls made by the user, and exceptions which get raised outside of the logging system.
I think a file handler is best such that the user can specify a path where a named dedicated log file gets written out (& rolled if there is the potential for it to get large enough) with every possible detail (i.e. set cfdm.LOG_LEVEL('DEBUG')
for that handler). This would be great for user support & debugging purposes.
Instead of linking just to the first line of the relevant method from a '[source]' link in the API reference of the documentation, it would be better to link to the full method. In other words, the link would go to a page for the relevant module in the codebase and have highlighted multiple lines covering the extent of the method, rather than just the one with the def declaring it.
Thanks to @sadielbartholomew for fixing this in cf-python - the solution will be the same her.e
For cfdm
as for cf-python defined in an equivalent issue NCAS-CMS/cf-python#83; the cfdm
codebase is now PEP8-compliant under the interpretation & scope of the pycodestyle
library, with the exception of several rules I have explicitly excluded (where there is no easy way to exclude instead on a per-case/line basis, sadly). We should review these exclusions, in this case being:
Lines 22 to 31 in 7319bf8
& decide whether pycodestyle
is the right tool, among many options, for our requirements on linting.
When converting the data type of an output array with the datatype
keyoword of cfdm.write
, the _FillValue
is not converted, leading to a netCDF error:
>>> import cfdm
>>> f = cfdm.example_field(1)
>>> f.set_property('_FillValue', 45.)
>>> cfdm.write(f, 'delme.nc', datatype={numpy.dtype('float64'): numpy.dtype('float32')})
<snip>
AttributeError: NetCDF: Not a valid data type or _FillValue type mismatch
The solution is simply to make sure that _FillValue
and missing_value
attributes are converted if required.
Equivalent to NCAS-CMS/cf-python#70.
As per the description in NCAS-CMS/cfunits#15, though in this case I have not yet tried to add 3.9 jobs to the workflow to run the test suite (if setting up the environment for cfunits fails at the moment, it certainly will do the same for cfdm).
In short: in late 2020 or early 2021 once 3.9 is more established and probably supported by at least some dependencies, we should check whether our dependencies allow us to support 3.9, and document (perhaps package) whether or not 3.9 is supported.
The (benign) warning was UserWarning: Warning: converting a masked element to nan.
This resulted from the masked points being converted to NaNs.
Implement simple geometries, as put in to the CF-1.8. See:
cf-convention/cf-conventions#155
cf-convention/cf-conventions#156
In cfdm.write
, allow dimension coordinate constructs' netCDF names to be added to the coordinates
netCDF attribute, if the user desires. Currently they are always omitted. Either way is CF-compliant.
Implement groups for CF-1.8. See cf-convention/cf-conventions#144 and cf-convention/cf-conventions#203
When setting datum and coordinate conversion parameters via the coordinate reference construct attributes, the new settings do not appear in the parent coordinate reference construct:
In [1]: import cfdm
In [2]: cr = cfdm.CoordinateReference()
In [3]: cr.datum.set_parameter('test', 123)
In [4]: cr.dump()
Coordinate Reference:
This is very low on the prioritisation list, but I've noticed there are several cases of logic to either return or print a constructed string keyword parameter depending on the value of a Boolean parameter display, essentially (in the most common case) the following:
def <method name>(<args, other kwargs>, display=True):
<...
construct 'string' var
....>
string = '\n'.join(string)
if display:
print(string)
else:
return string
and this would be a great candidate for logic to apply via a decorator. It should be fairly to implement such a decorator, too.
cfdm does not use any custom exceptions, other than DeprecationError
in mixin.netcdf
. We could likely make some classes a bit cleaner & improve user feedback on errors etc. by creating & applying some custom exception classes for which to delegate some error handling.
(Copied, minus noise & with a little tidying, from #64 (comment))
I can flesh this out further e.g. with some potential inheritance structure as we think about it & work out what might be useful, but firstly to record some potential candidates for useful exceptions:
warnings
lib warnings that the user can opt to enable or disable by interfacing with the logging framework);The presence of the the valid_min
, valid_max
or valid_range
attributes causes data to be masked for its "out-of-range" values. This is sometime surprising, particularly if the data has been modified between the read and write operations.
Proposal:
Add a warn_valid
keyword parameter, default True
to cfdm.read
and cfdm.write
that warns when such properties are present. The cfdm.write
case will only warn if the data contains out-of-range values. Since data is not actually read by cfdm.read
(lazy loading), it is not possible to check for out-of-range data during the read, so merely the presence of any of these properties will trigger the warning.
If warn_valid=False
then warnings are suppressed.
>>> import cfdm
>>> cfdm.environment(paths=False)
Platform: Linux-5.4.0-53-generic-x86_64-with-debian-bullseye-sid
HDF5 library: 1.10.5
netcdf library: 4.6.3
python: 3.7.0
netCDF4: 1.5.4
cftime: 1.3.0
numpy: 1.18.4
netcdf_flattener: 1.2.0
cfdm: 1.8.7.0
>>> import cfdm; f = cfdm.read('~/thetao_Omon_MIROC6_abrupt-4xCO2_r1i1p1f1_gn_344001-344912.cdl')[0]
>>> cfdm.write(f, 'delme.nc')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-e145e8c98469> in <module>()
----> 1 cfdm.write(f, 'delme.nc')
~/cfdm/cfdm/read_write/write.py in write(fields, filename, fmt, overwrite, global_attributes, variable_attributes, file_descriptors, external, Conventions, datatype, least_significant_digit, endian, compress, fletcher32, shuffle, string, verbose, warn_valid, group, coordinates, _implementation)
460 string=string, verbose=verbose,
461 warn_valid=warn_valid, group=group,
--> 462 coordinates=coordinates, extra_write_vars=None)
~/cfdm/cfdm/decorators.py in verbose_override_wrapper(self, *args, **kwargs)
184 # enabling
185 try:
--> 186 return method_with_verbose_kwarg(self, *args, **kwargs)
187 except Exception:
188 raise
~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in write(self, fields, filename, fmt, overwrite, global_attributes, variable_attributes, file_descriptors, external, Conventions, datatype, least_significant_digit, endian, compress, fletcher32, shuffle, scalar, string, extra_write_vars, verbose, warn_valid, group, coordinates)
4466 # ------------------------------------------------------------
4467 for f in fields:
-> 4468 self._write_field(f)
4469
4470 # ------------------------------------------------------------
~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_field(self, f, add_to_seen, allow_data_insert_dimension)
3390 for key, anc in sorted(
3391 self.implementation.get_domain_ancillaries(f).items()):
-> 3392 self._write_domain_ancillary(f, key, anc)
3393
3394 # ------------------------------------------------------------
~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_domain_ancillary(self, f, key, anc)
2192
2193 # Create a new domain ancillary variable
-> 2194 self._write_netcdf_variable(ncvar, ncdimensions, anc)
2195
2196 g['key_to_ncvar'][key] = ncvar
~/cfdm/cfdm/read_write/netcdf/netcdfwrite.py in _write_netcdf_variable(self, ncvar, ncdimensions, cfvar, omit, extra, fill, data_variable)
2533 if g['group']:
2534 groups = self._groups(ncvar)
-> 2535 for ncdim in ncdimensions:
2536 ncdim_groups = self._groups(ncdim)
2537 if not groups.startswith(ncdim_groups):
TypeError: 'NoneType' object is not iterable
This so we can stop doing things like:
>>> old = cfdm.log_level('DEBUG')
>>> <execute some code>
>>> cfdm.log_level(old)
and start doing things like:
>>> with cfdm.log_level('DEBUG'):
... <execute some code>
...
>>>
The getter/setter function for each constant will have to return a Constant
object, rather than the constant itself, which defines the __enter__
and __exit__
methods. The existing functions will be happy taking as input either their current value type (e.g. str
) or new Constant
instance - i.e. all existing code will still work the same.
PR to follow.
The pair of strings input to _add_message
throughout NetCDFRead
must exist as keys respectively in the _code0
and _code1
class variable dicts:
cfdm/cfdm/read_write/netcdf/netcdfread.py
Lines 38 to 90 in 6f508f2
else a KeyError
will be thrown obscuring the true message we want to provide to the end-user via that method:
cfdm/cfdm/read_write/netcdf/netcdfread.py
Lines 3023 to 3024 in 6f508f2
On more than one occasion now the strings were not present as keys in those dictionaries when they should have been & this has led to bugs for reading netCDF, e.g. as fixed in 201ba62 (for which I checked every message component is present as it should be, & will do a double check shortly, but we should consider the scenario where new messages will likely be added in development).
Moreover there are some secondary issues:
_code{0,1}
are currently arbitrary (I believe), awaiting decisions on potential standardisation as error/warning codes under the CF Conventions;'is not in file': 3
& 'does not exist in file': 11
are perhaps interchangeable.As discussed recently, for the above reasons we might want to re-consider how to implement the messaging. I think we want to preserve the two-component standardisation encapsulated by a combination of _code0
and _code1
keys, as we have at present, but in a way that also:
_add_message
.Initial work towards doctesting has implied, and after further investigation I can confirm, that the copy
method for the ABC Container
(i.e. cfdm.core.Container.copy
) that is documented as being a deep copying operation is in fact only displaying the behaviour of a shallow copy.
For example, note how the setting of a _custom
dict component of g
is also reflected in f
when it is an item within a container, appearing to be a reference rather than a copy of that item, but not reflected in f
f when a simple object:
>>> # Setup
>>> import cfdm
>>> f = cfdm.core.abstract.container.Container()
>>> f._custom
{}
>>> f._custom['feature'] = ['f']
>>> f._custom
{'feature': ['f']}
# Apply the copy, expecting it to be deep
>>> g = f.copy()
>>> g._custom['feature'][0] = 'g'
>>> g._custom
{'feature': ['g']}
# ...but note how the change is also reflected in f:
>>> f._custom
{'feature': ['g']}
# ...though changing the top-level value for g does not influence f:
>>> g._custom['feature'] = 'gee whiz'
>>> g._custom
{'feature': 'gee whiz'}
>>> f._custom
{'feature': ['g']}
>>> cfdm.environment(paths=False)
Platform: Linux-4.15.0-54-generic-x86_64-with-glibc2.10
HDF5 library: 1.10.6
netcdf library: 4.7.4
Python: 3.8.5
netCDF4: 1.5.4
numpy: 1.19.4
cfdm.core: 1.8.8.1
cftime: 1.3.0
netcdf_flattener: 1.2.0
cfdm: 1.8.8.1
Equivalent to NCAS-CMS/cf-python#37. We have decided to add logging here (to cfdm
) first as cf-python
builds on top of it so as a base it is a more natural starting point, & there is less code to cover to fully implement the logging as a POC for possible adjustment.
We have also decided the logging should be promoted as a user feature for configurable feedback, not just as a developer aid.
We will go with Python's standard logging
module as it is excellent & certainly sufficient for the requirements here.
The navigation menu (left pane) of the documentation seems to have gotten a little longer with some new sections, & with the screen size of my (work) laptop has become just long enough to extend off the screen vertically:
Nothing important is cut off at the moment with my screen view as an example, but whilst I know that is close to the end of the menu, users may not & may be frustrated at not being able to access the later parts of the menu without difficulty (zoom out etc.). It is not possible to scroll as there is no scrollbar (which is a style choice I agree with) & even forcing it by highlighting the text as works with the readthedocs theme doesn't work in this theme.
To alleviate potential for such problems, I suggest move some of the top-level headings under a more general page? 'Subclassing', 'Philosophy', 'Performance' & 'Versioning' are good candidates as they are all small pages & aren't especially standard docs sections. Maybe the former three could live under an umbrella 'Implementation' heading to go after the 'CF Data Model' section in listing?
Overall there are various approaches that would solve this, but I personally would rather not add a scrollbar to the nav menu, or change to a nested tree type of menu, as the menu looks very clean & clear as it is (just a little too long).
This arises from NCAS-CMS/cf-python#78
I've just started using cfdm ... still finding my way around the many classes. I'm trying to implement a vertical coordinate reference system for an atmosphere_hybrid_height_coordinate .. using the "more complete" example in the tutorial (https://ncas-cms.github.io/cfdm/1.7.1/tutorial.html ) .. which shows all the structures I need. In my code I get an error message (copied below) about an unexpected argument. I can reproduce the error if I take your script from the tutorial (which works as it is) and comment out the line "tas.set_construct(horizontal_crs)". Then, as in the script I want to create, you only have a vertical coordinate reference. The script still executes fine. tas.dump()
also works as expected, but cfdm.write( "tas.nc", tas )
produces the following:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/cfdm/read_write/write.py", line 357, in write
verbose=verbose)
File "/usr/local/lib/python3.5/dist-packages/cfdm/read_write/netcdf/netcdfwrite.py", line 3402, in write
self._write_field(f)
File "/usr/local/lib/python3.5/dist-packages/cfdm/read_write/netcdf/netcdfwrite.py", line 2699, in _write_field
self._create_vertical_datum(ref, owning_coord_key)
File "/usr/local/lib/python3.5/dist-packages/cfdm/read_write/netcdf/netcdfwrite.py", line 2853, in _create_vertical_datum
datum=self.implementation.get_datum(ref))
TypeError: initialise_CoordinateReference() got an unexpected keyword argument 'coordinates'
The new CF domain variable (cf-convention/cf-conventions#301) requires a CF data model domain construct. In the existing data model, the domain is represented by an abstract Domain concept, but the new CF-netCDF domain variable "promotes" the domain to construct status, on a similar footing to the Field class.
Implementation notes:
NCAS-CMS/cf-python#197 has highlighted that cfdm
as-is is not able to interpret as definite field(s) some cases of CDL inputs which provide only schema and/or coordinate information, e.g. as produced from a ncdump -h
or ncdump -c
. We realised it may not in fact be possible given the nature of the CF data model to unambiguously map CDL with metadata but no data arrays provided onto fields, though we aren't certain.
The direct problem resulting from this is that such missing data is not accounted for so errors may emerge when such CDLis read-in.
Ultimately we should:
For the forthcoming release, to address NCAS-CMS/cf-python#197, I am catching the MaskError
s to raise them as a user-friendly message stating that the CDL metadata is insufficient for conversion to field constructs, i.e. assuming case (b) and raising a ValueError
. This is sufficient for the release but should be re-evaluated in the longer term.
Even if we end up going with this approach after the review, I would like to create a custom error class to raise for related errors, rather than using a Python built-in Exception.
The file cm4twc_dump_file.nc
contains subgroups and has an unlimited dimension, currently of size 13. However, when read it gives the unlimited dimension as size 0:
>>> cfdm.read('cm4twc_dump_file.nc')[0]
<Field: transfer_i(time(0), altitude(1), latitude(4), longitude(3)) 1>
And downstream errors occur, e.g. when trying to subspace the size zero dimension.
This is a bug in cfdm.read
(well cfdm.read_write.netcdf.NetCDFRead
to be exact), which takes the dimension sizes from the flattened version of the file, but the flattened file does not know the unlimited dimension size, because the flattened file contains no arrays.
This is easily fixed by getting the dimension size from the original grouped file instead. PR to follow.
>>> cfdm.environment(paths=False)
Platform: Linux-5.4.0-62-generic-x86_64-with-debian-bullseye-sid
HDF5 library: 1.10.5
netcdf library: 4.6.3
Python: 3.7.0
netCDF4: 1.5.4
numpy: 1.18.4
cfdm.core: 1.8.8.0
cftime: 1.3.0
netcdf_flattener: 1.2.0
cfdm: 1.8.8.0
Most of the docstring code examples within cfdm.core
modules suggest there will be cfdm
-like user-friendly outputs, whereas running the code in fact returns outputs of the default Python object representation with the class name and CPython object id, like <cfdm.core.data.data.Data object at 0x7f80e9c7a3d0>
or <cfdm.core.interiorring.InteriorRing object at 0x7f80e9636910>
.
For example, compare:
>>> import cfdm.core
>>> d = cfdm.core.Data(range(10))
>>> c = cfdm.core.DimensionCoordinate()
>>> c.set_data(d)
>>> d
<cfdm.core.data.data.Data object at 0x7f833e063430>
>>> c
<cfdm.core.dimensioncoordinate.DimensionCoordinate object at 0x7f833cc90f70>
>>> c.get_data()
<cfdm.core.data.data.Data object at 0x7f833cc94130>
with the equivalent using the core.CoordinateReference
:
>>> import cfdm
>>> d = cfdm.Data(range(10))
>>> c = cfdm.DimensionCoordinate()
>>> c.set_data(d)
>>> d
<Data(10): [0, ..., 9]>
>>> c
<DimensionCoordinate: (10) >
>>> c.get_data()
<Data(10): [0, ..., 9]>
where relevant docstring examples suggest the cfdm
behaviour, e.g:
cfdm/cfdm/core/dimensioncoordinate.py
Lines 93 to 100 in 4a04ebc
This should be improved as really we want accurate docstrings not just for (and distinguishing between) cfdm.core
and cfdm but also cf-python
which inherits many docstrings for methods it does not overload. And ideally they will all be doctest-able ensure validity.
(I encountered this whilst reviewing the examples with the aim to incrementally ensure they are all appropriate and functionally sound via doctest
, for which this issue has particular relevance).
We noted this was because cfdm.core
does not have __repr__
methods defined by design, those being left for definition in cfdm
.
We agreed there are at least two potential solutions, namely:
__repr__
methods defined in the cfdm
modules to cfdm.core
, so the only difference between docstrings in cfdm.core
, cfdm
and cf
for them to work in all cases is the package name, which is handled by {{package}}
docstring substitutions.cfdm.core
docstrings to show the true outputs, i.e. the default Python object representation. That would be complicated by the fact those representations include a memory address which will usually change, but we could probably just replace those with an ellipsis as both a user-facing and understandable marker and a means recognised by doctest for ignoring certain text and assuming whatever lies there is acceptable.We want to evaluate the meaning and desired behaviour of external cell measures within the CF data model, for example towards a consistent and rational approach to upstream aggregation in cf-python.
Can not write to 'NETCDF3_64BIT_OFFSET'
and 'NETCDF3_64BIT_DATA'
format files:
In [1]: import cfdm
In [2]: cfdm.environment()
Platform: Linux-4.15.0-72-generic-x86_64-with-debian-stretch-sid
python: 3.7.3
future: 0.17.1
HDF5 library: 1.10.2
netcdf library: 4.6.1
netCDF4: 1.4.2
numpy: 1.16.2
cfdm: 1.7.11
In [4]: f = cfdm.read('cfdm/test/test_file.nc')
In [6]: cfdm.write(f, 'out.nc', fmt='NETCDF3_64BIT_OFFSET')
<snip>
ValueError: Unknown output file format: NETCDF3_64BIT_OFFSET
In [7]: cfdm.write(f, 'out.nc', fmt='NETCDF3_64BIT_DATA')
<snip>
ValueError: Unknown output file format: NETCDF3_64BIT_DATA
This should be possible, as per the documentation in cfdm.write
.
As stated in an email thread:
An overview of creation and writing near the top of the tutorial (where there is already a read overview) would be beneficial.
This is in response to a(n expert) user request:
a couple of tutorials on "how to create CF compliant data" would be really useful. We could tie them to useful examples, like: creating a multimodel ensemble dataset and "converting a grib file to cf compliant netcdf"
Hello David,
the script you gave me works fine now, and I've been able to modify it to adjust the names of variables in the output netcdf file by using either nc_set_variable
or nc_set_dimension
, with one exception: there is a grid_mapping
variable which is generated with the name rotated_latitude_longitude
and carries the datum: I can't find a construct which corresponds to this variable. Is there a way of changing the name-in-file of the grid_mapping
variable?
It would be nice to have a function that could return one of a number of well defined field constructs that are useful for exploring cfdm and can be used for doc string examples
As detected by the "Run test suite" GH Actions workflow for the latest commit, ce9a5a4, a sub-test fails on Mac OS but not on a Linux (Ubuntu) OS in the setup stage on an in-place sed
command:
======================================================================
ERROR: test_read_CDL (test_read_write.read_writeTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/runner/runners/2.169.0/work/cfdm/cfdm/cfdm/test/test_read_write.py", line 216, in test_read_CDL
shell=True, check=True)
File "/Users/runner/miniconda3/envs/cfdm-latest/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'sed -i "1 i\ \ " /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmp27bjmfaj.cfdm_test' returned non-zero exit status 1.
This post enlightens that Mac OS treats in-place sed
commands slightly differently, requiring an extension to be specified.
It will be a quick fix that I will put in now, but raising an intermediate GH Issue as a note since it is likely we could run into this setup issue again.
This is related to the way in which geometry containers are parsed (in netcdfread.py
and netcdfwrite.py
), and is being investigated ....
Some methods are missing from the API reference, such that when (some) are cited elsewhere in the documentation they are linking not to the internal methods as intended but to the equivalently-named methods in the Python documentation, presumably via intersphinx. (If other methods were missing but not of same name as a built-in, they would not get linked, which is not misleading but still not ideal.)
I have noticed this at least for at least the max
and min
methods listed under the "see also" directive for certain classes, e.g. for Data.sum
. This is despite Data.max
being defined in the codebase and working just fine on some field's data.
Note I've looked into this briefly and I can see it's not an autodoc
extension scoping issue since we set the module correctly in the templates, via (for the Data.sum
case) the .. currentmodule:: cfdm
in the method.rst template.
So we need to ensure all objects in the reference have all possible methods listed under one of the autosummary
lists for their class under the class/
dir. I will do that now for max
and min
, since I have spotted them, but at release-time we should find some means to check that all non-private methods in the codebase are cited in an autosummary
and hence are generated when the docs are built. Ideally we can find a Sphinx tool that can check that for us, else write a small script to check.
We've noted we'd like a collapsible drop-down menu (in the sidebar, for example) for selecting and changing the version of the documentation being shown, where equivalent pages are mapped across versions, rather than pages at specific versions needing to be accessed via their index pages as a starting point.
Such a menu is provided for any docs hosted with ReadTheDocs (at least covering limited aliased versions e.g. 'latest', 'stable'), but for self-hosted docs not using the theme sphinx_rtd_theme
, a bit of manual configuration & templating seems to be required. It's definitely possible without too much difficulty or code, as I have seen from some examples (see some listed below), but it is not easy to trace the parts of the docs source and config that result in the versioning in each case.
Note that the completion of this can and should go hand-in-hand with addressing #28 (once a structure to process versions is in, it becomes trivial to add some new text to all pages of a certain version, by templating, and the versioning extensions below provide this as a configuration option) and since it relates to documentation customisation, it would be good to tackle #50 simultaneously also.
Some helpful resources I've found after a little investigation:
We noted it would be useful, and particularly after discussion arising related to NCAS-CMS/cf-python#69, to have methods that can get, set, delete and check for the existence of trailing string length dimensions.
With naming in line with methods in the existing API, the intuitive case would be for a set of four methods named nc_{get, set, del, has}_string_length_dimension
.
As already implemented in cf-python
: NCAS-CMS/cf-python@b82f507 & the consecutive commits up to NCAS-CMS/cf-python@c45b7d9, plus NCAS-CMS/cf-python@441a2d6.
In this case, we might have to be careful to check the downstream behaviour in cf-python
is unaffected, since methods using the in-place decorator inherited from subclassing by cf-python
& then used inside its own methods with similar decorators could do something fatal like recurse, leading to a RecursionError: maximum recursion depth exceeded
as I observed frequently during development of the equivalents in that library when I hadn't implemented super()
cases correctly.
Profiling cfdm.read
suggests that that significant performance improvements can be found by removing calls to pprint.pformat
and eliminating unnecessary deep copies. Other optimisations (e.g. f-strings) can also be applied.
Reading the file test_file.nc
that is produced by the test suite with new code gives a speed up of 40% (0.08s, 0.048s)
The coordinates are subspaced, but not their bounds (if present).
The first result I see upon a Google search for "cfdm documentation" links to an old version (1.7.1) of the documentation. However, there is no indication that it is not the latest & greatest.
Since it is likely, as this illustrates, that users may end up viewing older versions of the documentation inadvertently, we should add a visible & explicit warning that the pages & table of contents on display belong to an old version. Other libraries often do this, for example, NumPy displays:
& similarly Python displays:
The simplest way to do this would be to inject an RST warning directive (.. warning::
) to the top of all content pages for older versions, as that directive provides a ready-made red text box seems to be the UI design trend for this, as above, & which will draw the necessary attention.
If a (scalar or auxiliary) coordinate variable has bounds referenced from it (typically with the bounds
netCDF attribute) but the referenced bounds variable is not in the file, then a KeyError occurs. What should happen is that the non-compliance is logged and the read continues, creating a coordinate construct without bounds.
It would be really useful to be able to read CDL files directly into cfdm, rather than having to first convert to binary netCDF files. Can this be added?
IPython supports 'rich' display within Jupyter Notebooks (or see here for a great blog post about it), such that we could implement a _repr_html_
method in appropriate classes to output a real HTML table rather than the 'makeshift' tables we are constrained to returning in standard interpreter scenarios.
In particular, this would be beneficial to implement for any non-minimal-detail inspection call with a construct, e.g. for a field print(f)
& f.dump()
, as they can output a lot of information & we want it to be as easy as possible for users to pick out what they are interested in.
As well as the obvious separation of components in the output, with HTML tables you get basic cell shading & lines & bold text to make the output easier to digest. If we really wanted to push the boat out, we could even implement something more sophisticated to make rows or groups of them collapsible, as per the xarray
example in the blog post linked above.
As a demonstration, I've coded up a basic tabular output for the minimal detail inspection of a field via (i.e. repr
-> _repr_html_
for the field in notebooks). I used it simply to get a basic example to show and note I think a table is overkill for this context in practice; really I want to tabularise similarly the str
and dump representations. The result (Out[3]
):
is produced by this example method inside the Field
class:
def _repr_html_(self):
"""
Outputs a HTML table representation within Jupyter notebooks.
"""
# HTML tags to use to compose the table in HTML
blank_table = '<table style="width:50%">{}</table>'
blank_row_container = "<tr>{}</tr>"
heading_row_content = "<th colspan='{}'>{}</th>"
data_row_content = "<td>{}</td>"
# Extract some info as processed otherwise into one_line_description
x = [self._unique_domain_axis_identities()[axis] for axis in
self.get_data_axes(default=())]
axes_rows = [data_row_content.format(data) for data in x]
# Construct and populate table
type_of_construct = heading_row_content.format(
1, str(self.__class__.__name__) + ":")
identity_info = heading_row_content.format(
len(axes_rows) - 1,
"{} (units of {})".format(
self.identity(''),
self.get_property('units', None)
)
)
heading_row = blank_row_container.format(
type_of_construct + identity_info)
return blank_table.format(heading_row + "".join(axes_rows))
If we think this is a good idea, we should consider:
_repr_html_
for;As of version 1.8.5, cfdm no longer works with Python 2.7, due to API changes in the logging
package (#35).
The implementation of netCDF groups (#13) will require the import of the netcdf-flattener library, which is Python 3 only.
Therefore Python 2.7 support will be formally withdrawn at the next release: 1.8.6
The docstring description for the items
, keys
and values
methods for Constructs
instances implies they return dict
s or list
s, as they would have in Python 2, instead of the dict-like and list-like objects dict_items
, dict_keys
or dict_values
that are returned in Python 3:
Lines 754 to 782 in 1f34db7
for example:
>>> a = cfdm.example_field(0)
>>> a.constructs.items()
dict_items([('dimensioncoordinate0', <DimensionCoordinate: latitude(5) degrees_north>), ('dimensioncoordinate1', <DimensionCoordinate: longitude(8) degrees_east>), ('dimensioncoordinate2', <DimensionCoordinate: time(1) days since 2018-12-01 >), ('domainaxis0', <DomainAxis: size(5)>), ('domainaxis1', <DomainAxis: size(8)>), ('domainaxis2', <DomainAxis: size(1)>), ('cellmethod0', <CellMethod: area: mean>)])
>>> a.constructs.keys()
dict_keys(['domainaxis0', 'domainaxis1', 'domainaxis2', 'dimensioncoordinate0', 'dimensioncoordinate1', 'dimensioncoordinate2', 'cellmethod0'])
>>> a.constructs.values()
dict_values([<DimensionCoordinate: latitude(5) degrees_north>, <DimensionCoordinate: longitude(8) degrees_east>, <DimensionCoordinate: time(1) days since 2018-12-01 >, <DomainAxis: size(5)>, <DomainAxis: size(8)>, <DomainAxis: size(1)>, <CellMethod: area: mean>])
Given this is very likely a change resulting from the Python 2 to 3 port, I want to raise this to check what we want to return before fleshing out the return type in the documentation for further clarity. Do we want to return these view objects, or dict
s or list
s as applicable by converting via dict()
or list()
before returning?
I think the view objects are preferable because they are iterators so more efficient for iterations which seem to be by far the most common use case. But I see we also have an __iter__
method available to use if iterators are required, so maybe we do want to return the standard Python structures instead(?):
Lines 275 to 283 in 1f34db7
Unless there is a requirement to return the alternative, I suggest we stick with returning the view objects but in the docstrings clarify that dict
(list
, as appropriate) can be called to convert from the view objects if a dict
(list
) is strictly required.
>>> f.data.array
[[[-- -- -- -- -- -- -- -- --]
[9.0 10.0 11.0 12.0 13.0 14.0 15.0 16.0 17.0]]]
>>> cfdm.write(f, 'file.nc')
>>> g = cfdm.read('file.nc')[0]
>>> g.data.array
[[[9.96920997e+36 9.96920997e+36 9.96920997e+36 9.96920997e+36
9.96920997e+36 9.96920997e+36 9.96920997e+36 9.96920997e+36
9.96920997e+36]
[9.00000000e+00 1.00000000e+01 1.10000000e+01 1.20000000e+01
1.30000000e+01 1.40000000e+01 1.50000000e+01 1.60000000e+01
1.70000000e+01]]]
This only happens for netCDF4 formats. NetCDF3 formats are OK.
Implement string data-types for CF-1.8. See cf-convention/cf-conventions#139
Apply prompt toggling using the extension sphinx-toggleprompt as detailed in NCAS-CMS/cf-python#92.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.