esgf / esgf-pyclient Goto Github PK
View Code? Open in Web Editor NEWSearch client for the ESGF Search API
Home Page: https://esgf-pyclient.readthedocs.io/en/latest/
License: BSD 3-Clause "New" or "Revised" License
Search client for the ESGF Search API
Home Page: https://esgf-pyclient.readthedocs.io/en/latest/
License: BSD 3-Clause "New" or "Revised" License
I am trying to get started with some simple queries and I noticed that if I don't give a value for "facets" I get a 500 Server Error:
from pyesgf.search import SearchConnection
conn = SearchConnection('https://esgf-node.llnl.gov/esg-search/', distrib=True)
ctx = conn.new_context(variable='tas', time_frequency='mon')
ctx.hit_count
...
HTTPError: 500 Server Error: 500 for url: https://esgf-node.llnl.gov/esg-search/search?format=application%2Fsolr%2Bjson&limit=0&distrib=false&type=Dataset&variable=tas&time_frequency=mon&facets=%2A
But if I set a value for facets (e.g., ctx = conn.new_context(variable='tas', time_frequency='mon', facets='null')
), the search is returned successfully.
I think %2A
, which appears to be the default value for facets should interpreted as a wildcard (*
).
Is this expected behavior? Should I just specify some null value for facets (e.g., 0)?
When I run this script:
import logging
import pyesgf.search
def example():
logging.basicConfig(format="%(asctime)s [%(process)d] %(levelname)-8s "
"%(name)s,%(lineno)s\t%(message)s")
pyesgf.search.connection.log.setLevel(logging.DEBUG)
conn = pyesgf.search.SearchConnection(
url='http://esgf-node.llnl.gov/esg-search')
ctx = conn.new_context(project='CMIP5')
ctx.search(ignore_facet_check=True)
if __name__ == '__main__':
example()
the code crashes with the following output:
DEBUG:pyesgf.search.connection:Query dict is MultiDict([('format', 'application/solr+json'), ('limit', 0), ('distrib', 'true'), ('type', 'Dataset'), ('project', 'CMIP5')])
DEBUG:pyesgf.search.connection:Query request is http://esgf-node.llnl.gov/esg-search/search?format=application%2Fsolr%2Bjson&limit=0&distrib=true&type=Dataset&project=CMIP5
Traceback (most recent call last):
File "/home/bandela/src/esmvalgroup/esmvalcore/try_filesearch.py", line 96, in <module>
example()
File "/home/bandela/src/esmvalgroup/esmvalcore/try_filesearch.py", line 92, in example
ctx.search(ignore_facet_check=True)
File "/home/bandela/conda/envs/esmvaltool/lib/python3.9/site-packages/pyesgf/search/context.py", line 126, in search
sc.__update_counts(ignore_facet_check=ignore_facet_check)
File "/home/bandela/conda/envs/esmvaltool/lib/python3.9/site-packages/pyesgf/search/context.py", line 207, in __update_counts
for facet, counts in (list(response['facet_counts']['facet_fields'].items())):
KeyError: 'facet_counts'
Hi,
I'm trying to use the module but every time I do a distributed search I bump into this error:
pyesgf.search.exceptions.EsgfSearchException: Shard spec esgf-node.jpl.nasa.gov/solr/datasets not recognised
SHARD_REXP = r'(?P.?):(?P\d)/solr(?P.*)'
changing this to
SHARD_REXP = r'(?P.?)(?P\d)/solr(?P.)'
in consts.py fixes it
Thanks,
Paola
Would it be possible to make a new release? @agstephens or @cehbrecht? Our users are complaining that search is slow ESMValGroup/ESMValCore#1495. I think that now that #75 has been fixed, we could speed up our search by almost a factor of 2.
@agstephens et al, here are two issues in one, the reason why I chose to open one single issue instead of two is that these two things are related, in actuality it's a cause and effect one:
cryptography
dep from Anaconda/main
channel is incompatible with noarch
/esgf-pyclient=0.3.1
from conda-forge, and am afraid that that exact one gets pulled in when installing esgf-pyclient from conda-forge; the incompatibility throws an openssl-related error from within cryptography
:from myproxy.client import MyProxyClient
results in
Traceback (most recent call last):
File "/home/valeriu/ESMValCore/testimp.py", line 1, in <module>
from myproxy.client import MyProxyClient
File "/home/valeriu/miniconda3/envs/experimental-all-conda/lib/python3.10/site-packages/myproxy/client/__init__.py", line 42, in <module>
from OpenSSL import crypto, SSL
File "/home/valeriu/miniconda3/envs/experimental-all-conda/lib/python3.10/site-packages/OpenSSL/__init__.py", line 8, in <module>
from OpenSSL import crypto, SSL
File "/home/valeriu/miniconda3/envs/experimental-all-conda/lib/python3.10/site-packages/OpenSSL/crypto.py", line 11, in <module>
from OpenSSL._util import (
File "/home/valeriu/miniconda3/envs/experimental-all-conda/lib/python3.10/site-packages/OpenSSL/_util.py", line 5, in <module>
from cryptography.hazmat.bindings.openssl.binding import Binding
File "/home/valeriu/miniconda3/envs/experimental-all-conda/lib/python3.10/site-packages/cryptography/hazmat/bindings/openssl/binding.py", line 14, in <module>
from cryptography.hazmat.bindings._openssl import ffi, lib
ImportError: libssl.so.1.1: cannot open shared object file: No such file or directory
(experimental-all-conda) valeriu@valeriu-PORTEGE-Z30-C:~/ESMValCore$ vim testimp.py
(experimental-all-conda) valeriu@valeriu-PORTEGE-Z30-C:~/ESMValCore$ python testimp.py
Traceback (most recent call last):
File "/home/valeriu/ESMValCore/testimp.py", line 1, in <module>
from myproxy.client import MyProxyClient
File "/home/valeriu/miniconda3/envs/experimental-all-conda/lib/python3.10/site-packages/myproxy/client/__init__.py", line 42, in <module>
from OpenSSL import crypto, SSL
File "/home/valeriu/miniconda3/envs/experimental-all-conda/lib/python3.10/site-packages/OpenSSL/__init__.py", line 8, in <module>
from OpenSSL import crypto, SSL
File "/home/valeriu/miniconda3/envs/experimental-all-conda/lib/python3.10/site-packages/OpenSSL/crypto.py", line 11, in <module>
from OpenSSL._util import (
File "/home/valeriu/miniconda3/envs/experimental-all-conda/lib/python3.10/site-packages/OpenSSL/_util.py", line 5, in <module>
from cryptography.hazmat.bindings.openssl.binding import Binding
File "/home/valeriu/miniconda3/envs/experimental-all-conda/lib/python3.10/site-packages/cryptography/hazmat/bindings/openssl/binding.py", line 14, in <module>
from cryptography.hazmat.bindings._openssl import ffi, lib
ImportError: libssl.so.1.1: cannot open shared object file: No such file or directory
-> now this is not yer fault since myproxyclient
is the package that is at fault here, but this is a heads up and maybe you see about this with myproxyclient
guys
pyesgf/logon.py
is related to the main issue but masks it at import trial:try:
from myproxy.client import MyProxyClient
import OpenSSL
_has_myproxy = True
except (ImportError, SyntaxError):
_has_myproxy = False
please catch the import exception and print it when _has_myproxy = False
since that way you let the user know what the actual offending import is! ๐บ
Look into test failures and accept PR if we get it all working.
You can see from the PR, some of the unit tests are failing when running as GitHub Actions (i.e. continuous integration on github). You can click and review the details to see which tests are failing. It would be worth checking out master first and running the tests on that to check whether there is a difference in the two branches - or whether the tests are failing on both.
Hi @philipkershaw, another question for you.
We have a call to the Attribute Service as defined in these tests:
class TestATS(TestCase):
@pytest.mark.xfail(reason='This test does not work anymore.')
def test_ceda_ats(self):
service = AttributeService(CEDA_NODE.ats_url, 'esgf-pyclient')
fn, ln = 'Ag', 'Stephens'
resp = service.send_request(OPENID, ['urn:esg:first:name',
'urn:esg:last:name'])
assert resp.get_subject() == OPENID
attrs = resp.get_attributes()
assert attrs['urn:esg:first:name'] == fn
assert attrs['urn:esg:last:name'] == ln
@pytest.mark.xfail(reason='This test does not work anymore.')
def test_multi_attribute(self):
service = AttributeService(CEDA_NODE.ats_url, 'esgf-pyclient')
resp = service.send_request(OPENID, ['CMIP5 Research'])
attrs = resp.get_attributes()
assert list(sorted(attrs['CMIP5 Research'])) == ['default', 'user']
Both tests fail at present. Do you know if this is because something has changed in the attribute service or whether there is just a configuration problem? These tests live in:
https://github.com/ESGF/esgf-pyclient/blob/master/pyesgf/test/test_ats.py
See: #53
Hi Ag,
first of all I should say that I'm a data "expert" for a climate science centre in Australia and I've been using the esgf-pyclient embedding it into a python interface to our local collection at NCI. So our users can compare what's available locally to what is online to a fine detail. Claire Trenham from NCI forwarded to me an e-mail conversation from the esgf-devel mailing list regarding the need of a "search and download" tool. Both synda and pyesgf were mentioned, I've never used synda though I will have a go now, pyesgf was good for me because it gives a lot of details, which I found necessary to compare files when the version information is missing.
I'm really interested in any progress on this discussion. In fact, when I chose pyesgf I assumed I could use it to download the files too. I was surprised because it is taking care of the certificates with the logon function, you can extract very easily the files download urls and checksums, but then there's no download option. I've actually tried to add one by myself, but I didn't have time to do it properly and I set it aside. Our python module was developed to fill a hole in the services we have (or better we don't have) available.
So if you decide to add this enhancement I'll be happy to be a tester. Though I'll be away for the next two months and back in mid-July.
Regards,
Paola
When using the logon_with_openid
method to retrieve a certificate from a myproxy server one might get an ssl verification error. It would be nice if we could make ssl verification optional in this case.
See:
https://github.com/ESGF/esgf-pyclient/blob/master/pyesgf/logon.py#L196
Could be replaced for example with requests
:
response = requests.get(openid, verify=False)
xml = etree.parse(BytesIO(response.content))
Can be changed after merge of PR #14.
I'm attempting to use esgf-pyclient to help download some data, but am stuck logging on.
I have an OpenID account with CEDA. Which is https://ceda.ac.uk/openid/Thomas.Crocker
My username to login at CEDA is tcrocker
All my attempts to connect all lead to: TimeoutError: [Errno 110] Connection timed out
I have tried:
$ OPENID = 'https://ceda.ac.uk/openid/Thomas.Crocker'
$ lm.logon_with_openid(openid=OPENID, password=None, bootstrap=True)
Enter myproxy username: tcrocker
Enter password for tcrocker:
and
$ proxyhost = 'esgf-index1.ceda.ac.uk'
$ lm.logon(hostname=proxyhost, interactive=True, bootstrap=True)
Enter myproxy username: tcrocker
Enter password for tcrocker:
and the same as above but with proxyhost
set to esgf.ceda.ac.uk
Can anyone advise how to get this to work? I am based at the UK Met Office so I wonder if the problem could be related to our network firewall in some way?
Hi,
I am trying to download cordex data sets. I have created an account on esg-dn1.nsc.liu.se data node.
My openID is:
'https://esg-dn1.nsc.liu.se/esgf-idp/openid/XXXXX'
I use pyclient to download series of simulations for a location (lat, lon). My search is sucessful, how ever, why I try to get the download, I get a Access Failure message.
could you please let me what part of my scripts I am doing wrong ?
I successfully logon using my openid and password.
here is the script:
from pyesgf.search import SearchConnection
conn = SearchConnection('https://esg-dn1.nsc.liu.se/esg-search', distrib=True)
ctx = conn.new_context(
project='CORDEX',
variable = ['pr'],
time_frequency = '3hr',
domain = 'MNA-44',
data_node = 'esg-dn1.nsc.liu.se'
)
ctx.hit_count
rslts = ctx.search()
urls = [] # get the urls here.
for r in rslts:
files = r.file_context().search()
for file in files:
if file.opendap_url is not None:
urls.append(file.opendap_url)
for url in urls:
path, filename = os.path.split(url)
print('downloading {}'.format(filename))
lat_v = 29.639659
lon_v = 52.569935
ds = xr.open_dataset(url)
**_
data = ds['pr']
_** ,-------------------------gives the Access Failure errror
da = data.sel(rlat=lat_v, rlon=lon_v, method = 'nearest')
da.to_netcdf(filename)
print('saved file {}'.format(filename))
ERROR MESSAGE:
OSError: [Errno -77] NetCDF: Access failure: b'http://esg-dn1.nsc.liu.se/thredds/dodsC/esg_dataroot3/cordexdata/cordex/output/MNA-44/SMHI/CNRM-CERFACS-CNRM-CM5/rcp85/r1i1p1/SMHI-RCA4/v1/3hr/pr/v20180109/pr_MNA-44_CNRM-CERFACS-CNRM-CM5_rcp85_r1i1p1_SMHI-RCA4_v1_3hr_200601010130-200612312230.nc'
Hi,
I'm wondering if there is a supported method for downloading files in batch numbers greater than 1000 using this tool. I'm running into an issue where if the script exceeds 1000, I cannot download the entire set. For example:
Warning! The total number of files was 3222 but this script will only process 1000.
Script created for 1000 file(s)
(The count won't match if you manually edit this file!)
I would like to know if there's a way of either increasing this limit or creating multiple wget scripts that can then be run in succession.
@agstephens and anyone else that sees this. Can I get some help getting the logonManager to work again. I'm getting
OpenSSL.SSL.Error: [('SSL routines', 'SSL3_GET_SERVER_CERTIFICATE', 'certificate verify failed')]
heres are the steps for recreating
cd esgf-pyclient
virtualenv env
source env/bin/activate
pip install MyProxyClient
python setup.py install
I removed my ~/.esg
and then went to pcmdi9 logged in search and clicked a wget, and ran it. which generated a new clean ~/.esg
directory
I wrote a simple example test-log-on.py from the example given
import pyesgf.logon
lm = pyesgf.logon.LogonManager()
lm.logoff()
lm.is_logged_on()
lm.logon_with_openid('https://pcmdi9.llnl.gov/esgf-idp/openid/mattben', 'PassWord')
lm.is_logged_on()
this is the output
(env)harris112@harris112ml1:[esgf-pyclient]:[master]:[15979]> python test-log-on.py
Traceback (most recent call last):
File "test-log-on.py", line 8, in <module>
lm.logon_with_openid('https://pcmdi9.llnl.gov/esgf-idp/openid/mattben', '1Lakehole!')
File "/Users/harris112/projects/ESGF/esgf-pyclient/pyesgf/logon.py", line 140, in logon_with_openid
interactive=interactive)
File "/Users/harris112/projects/ESGF/esgf-pyclient/pyesgf/logon.py", line 176, in logon bootstrap=bootstrap, updateTrustRoots=update_trustroots)
File "/Users/harris112/projects/ESGF/esgf-pyclient/env/lib/python2.7/site-ackages/myproxy/client.py", line 1412, in logon
**getTrustRootsKw)
File "/Users/harris112/projects/ESGF/esgf-pyclient/env/lib/python2.7/site-packages/myproxy/client.py", line 1564, in getTrustRoots
conn.write('0')
File "/Users/harris112/projects/ESGF/esgf-pyclient/env/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1271, in send
self._raise_ssl_error(self._ssl, result)
File "/Users/harris112/projects/ESGF/esgf-pyclient/env/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1187, in _raise_ssl_error
_raise_current_error()
File "/Users/harris112/projects/ESGF/esgf-pyclient/env/lib/python2.7/site-packages/OpenSSL/_util.py", line 48, in exception_from_error_queue
raise exception_type(errors)
OpenSSL.SSL.Error: [('SSL routines', 'SSL3_GET_SERVER_CERTIFICATE', 'certificate verify failed')]
Am I missing something? Any Help would be appreciated.
Sorry for the spamming: @LucaCinquini @ncaripsl @prashanth Dwarakanath @philipkershaw @ sashakames
I am running a download script using openid for CORDEX data. It is a script that I have been using successfully on different computers (windows, masosx and linux). I am trying to use it in other computers with first_time=True:
lm = LogonManager()
lm.logon_with_openid(openid=openid, password=password, bootstrap=first_time)
and the error:
File "/home/lloarca/climate_change/cordex/download.py", line 42, in search
lm.logon_with_openid(openid=openid, password=password, bootstrap=first_time)
File "/opt/anaconda3/lib/python3.7/site-packages/pyesgf/logon.py", line 149, in logon_with_openid
interactive=interactive)
File "/opt/anaconda3/lib/python3.7/site-packages/pyesgf/logon.py", line 185, in logon
updateTrustRoots=update_trustroots)
File "/opt/anaconda3/lib/python3.7/site-packages/myproxy/client.py", line 1448, in logon
**getTrustRootsKw)
File "/opt/anaconda3/lib/python3.7/site-packages/myproxy/client.py", line 1605, in getTrustRoots
conn.write(self.class.GLOBUS_INIT_MSG)
File "/opt/anaconda3/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1757, in send
self._raise_ssl_error(self._ssl, result)
File "/opt/anaconda3/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1671, in _raise_ssl_error
_raise_current_error()
File "/opt/anaconda3/lib/python3.7/site-packages/OpenSSL/_util.py", line 54, in exception_from_error_queue
raise exception_type(errors)
OpenSSL.SSL.Error: [('SSL routines', 'ssl3_get_record', 'wrong version number')]
I have openssl version 1.0.1k on one of them and 1.1.0i on another one. On both I get the same error. Any thoughts?
I'm using esgf-pyclient for accessing the data from ESGF. When user try to access the CORDEX data who is not registered to access the CORDEX data, The cookies store in .dods_cookies file. On next request, .dods_cookies block the data access for registered users also. I would like to know about the .dods_cookies file and provide me support.
Thank you.
I get a SSL error when I try to log in with OPENID. It seems that there is a mismatch between OpenSSL versions between my environment and the LLNL node. Would you be able to tell me on version of pyopenssl ESGF Pyclient is built?
Different search urls provide very different results, is there some way to search all data available on ESGF?
For example:
>>> pyesgf.search.SearchConnection(url='https://esgf-data.dkrz.de/esg-search', distrib=True).new_context().facet_counts['project']
{'wind': 1, 'uerra': 2, 'tracmip': 6767, 'reklies-index': 28792, 'obs4MIPs': 2, 'monthlyfc': 2710, 'input4mips': 5832, 'hiresireland': 66, 'TEST': 4, 'TAMIP': 192, 'PMIP3': 16, 'MiKlip': 5568, 'MPI-GE': 55111, 'LUCID': 112, 'CORDEX-Reklies': 7017, 'CORDEX-ESD': 1370, 'CORDEX': 67908, 'CMIP6': 874263, 'CMIP5': 53725}
>>> pyesgf.search.SearchConnection(url='http://esgf-index1.ceda.ac.uk/esg-search', distrib=True).new_context().facet_counts['project']
{'specs': 427949, 'obs4MIPs': 27, 'eucleia': 1921, 'clipc': 104, 'TAMIP': 640, 'PMIP3': 10, 'GeoMIP': 233, 'CORDEX': 5880, 'CMIP5': 48143}
>>> pyesgf.search.SearchConnection(url='http://esgf-node.llnl.gov/esg-search', distrib=True).new_context().facet_counts['project']
{'wind': 1, 'uerra': 2, 'tracmip': 6767, 'specs': 446693, 'reklies-index': 28792, 'psipps': 1, 'primavera': 6400, 'obs4MIPs': 218, 'ncpp2013': 17, 'monthlyfc': 2710, 'input4mips': 11492, 'input4MIPs': 201, 'hiresireland': 66, 'eucleia': 1921, 'e3sm-supplement': 53, 'e3sm': 813, 'cmip3': 71, 'clipc': 114, 'cc4e': 497, 'c3se': 184, 'c3s-cmip5-adjust': 188, 'ana4MIPs': 7, 'TEST': 7, 'TAMIP': 1536, 'PMIP3': 361, 'NEXGDDP': 3, 'NEX': 10, 'NARR_Hydrology': 85, 'MiKlip': 5568, 'MPI-GE': 55111, 'LUCID': 318, 'ISIMIP3b': 550, 'ISIMIP3a': 111, 'ISIMIP2b': 95963, 'ISIMIP2a': 13803, 'ISIMIP2 Phase a': 288, 'ISI-MIP Fast Track': 856, 'GeoMIP': 757, 'EUCLIPSE': 41, 'CREATE-IP': 110, 'CORDEX-Reklies': 7017, 'CORDEX-ESD': 1370, 'CORDEX-Adjust': 1221, 'CORDEX': 183980, 'CMIP6': 11174039, 'CMIP5': 206811, 'CMIP3': 29331, 'CDAT-sample': 1, 'BioClim': 2, 'ACME': 23}
The Search API documentation is not displaying on readthedocs: https://esgf-pyclient.readthedocs.io/en/latest/api.html#search-api.
It looks like this happens because the build fails to import the dependencies of the package, because when I create a conda environment from docs/environment.yml
and run make html
the result is:
sphinx-build -b html -d build/doctrees source build/html
Running Sphinx v3.2.1
making output directory... done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 13 source files that are out of date
updating environment: [new config] 13 added, 0 changed, 0 removed
reading sources... [100%] quickstart
WARNING: autodoc: failed to import module 'search' from module 'pyesgf'; the following exception was raised:
No module named 'requests_cache'
WARNING: autodoc: failed to import module 'search.connection' from module 'pyesgf'; the following exception was raised:
No module named 'requests_cache'
WARNING: autodoc: failed to import module 'search.context' from module 'pyesgf'; the following exception was raised:
No module named 'requests_cache'
WARNING: autodoc: failed to import module 'search.results' from module 'pyesgf'; the following exception was raised:
No module named 'requests_cache'
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] quickstart
generating indices... genindex py-modindexdone
copying notebooks ... [100%] notebooks/examples/search.ipynb
highlighting module code... [100%] pyesgf.logon
writing additional pages... searchdone
copying static files... ... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded, 4 warnings.
The HTML pages are in build/html.
Build finished. The HTML pages are in build/html
Please correct the conda install part in the docs to use the conda-forge channel:
https://github.com/ESGF/esgf-pyclient/blob/master/docs/index.rst
Should be:
$ conda install -c conda-forge esgf-pyclient
Hello,
According to the documentation, we always need to supply facets
to the search_context()
. If not, we get the following warning:
Warning - defaulting to search with facets=*
This behavior is kept for backward-compatibility, but ESGF indexes might not
successfully perform a distributed search when this option is used, so some
results may be missing. For full results, it is recommended to pass a list of
facets of interest when instantiating a context object. For example,
ctx = conn.new_context(facets='project,experiment_id')
Only the facets that you specify will be present in the facets_counts dictionary.
This warning is displayed when a distributed search is performed while using the
facets=* default, a maximum of once per context object. To suppress this warning,
set the environment variable ESGF_PYCLIENT_NO_FACETS_STAR_WARNING to any value
or explicitly use conn.new_context(facets='*')
-------------------------------------------------------------------------------
However, the problem is that this warning also appears for aggregation_context()
, even though aggregation_context()
does not take facets as parameter. Even if I create a new_context()
with facets and then create an aggregation_context()
out of the ctx.search()
, I get the facets warning. (For example with this piece of code:
facets = "source_id"
ctx = conn.new_context(
project='CMIP6',
experiment_id="historical",
facets=facets
)
result = ctx.search()[0]
agg_ctx = result.aggregation_context().search()
).
Is this a problem? Will this lead to an incomplete distributed search or is it something I do not need to worry about?
Thank you in advance!
Mark at the MO reported this error when running this code:
from pyesgf.search import SearchConnection
conn = SearchConnection('http://esgf-index1.ceda.ac.uk/esg-search', distrib=True)
ctx = conn.new_context(project='CMIP5', query='humidity')
ctx.hit_count
Error:
/usr/local/lib/python3.7/dist-packages/pyesgf/search/connection.py in
open(self)
96 def open(self):
97 if (isinstance(self._passed_session, requests.Session)
or isinstance(
---> 98 self._passed_session,
requests_cache.core.CachedSession)):
99 self.session = self._passed_session
100 else:
AttributeError: module 'requests_cache' has no attribute 'core'
A quick search suggests that the API inside the package has changed.
An older version worked fine:
requests_cache-0.4.1
The new version that failed was:
requests_cache-0.6.4
Needs further investigation.
Some code parts are never used or mentioned in the docs:
pyesgf/node.py
pyesgf/manifest.py
pyesgf/security/ats.py
Should we remove them?
I dont know exactly what is wrong, but I am trying to use one of the examples and it does not work.
Here is the script the produces the result:
#!/usr/bin/env python3
"""
#############
from pyesgf.logon import LogonManager
lm = LogonManager()
lm.logoff()
lm.is_logged_on()
password = ''
openId = 'https://esgf-data.dkrz.de/esgf-idp/openid/**'
lm.logon_with_openid(openId, password, bootstrap=True)
lm.is_logged_on()
import xarray as xr
url = 'http://esgf2.dkrz.de/thredds/fileServer/lta_dataroot/cmip5/output1/MIROC/MIROC5/rcp45/mon/aerosol/aero/r1i1p1/v20120514/wetss/wetss_aero_MIROC5_rcp45_r1i1p1_200601-210012.nc'
ds = xr.open_dataset(url , chunks={'time': 120})
print(ds)
@philipkershaw: I've been testing the latest pyclient with Python 3 and I'm getting the following error from the test_logon.py
tests:
c = MyProxyClient(hostname=hostname, caCertDir=self.esgf_certs_dir)
creds = c.logon(username, password,
bootstrap=bootstrap,
updateTrustRoots=update_trustroots)
with open(self.esgf_credentials, 'w') as fh:
for cred in creds:
> fh.write(cred)
E TypeError: write() argument must be str, not bytes
It looks like there has been a change in MyProxyClient returning bytes rather than a string. I tried a simple fh.write(str(cred))
to fix it but it didn't work. Any idea what might fix this.
NOTE: Seems to work fine with python2.7.
Hi!
I pip installed pyesgf and also requests, but apparently the versions don't work well together. apparently pyesgf (0.3.0) needs a different version of requests than the one currently available on pip.
I tried this:
myinstance = 'CMIP6.AerChemMIP.BCC.BCC-ESM1.ssp370SST-lowNTCF.r1i1p1f1.Lmon.tsl.gn.v20190612'
conn = SearchConnection(index_search_url, distrib=False)
ctx = conn.new_context(project="CMIP6", instance_id=myinstance)
dset=ctx.search()
files=dset.file_context().search()
i=0
for file in files:
i += 1
print('%s : %s' % (i, file.json["instance_id"]))
And run into this error:
Traceback (most recent call last):
File "corr.py", line 21, in <module>
dset=ctx.search()
File "/home/.../venv3/lib/python3.6/site-packages/pyesgf/search/context.py", line 126, in search
sc.__update_counts(ignore_facet_check=ignore_facet_check)
File "/home/.../venv3/lib/python3.6/site-packages/pyesgf/search/context.py", line 206, in __update_counts
response = self.connection.send_search(query_dict, limit=0)
File "/home/.../venv3/lib/python3.6/site-packages/pyesgf/search/connection.py", line 156, in send_search
self.open()
File "/home/.../venv3/lib/python3.6/site-packages/pyesgf/search/connection.py", line 98, in open
self._passed_session, requests_cache.core.CachedSession)):
AttributeError: module 'requests_cache' has no attribute 'core'
Versions/environment:
Python 3.6.8 (default, Nov 16 2020, 16:55:22)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyesgf
>>> import requests
>>> requests.__version__
'2.26.0'
>>> pyesgf.__version__
'0.3.0'
>>>
If you need any more details, don't hesitate to contact me (or also @cehbrecht ). Thanks!
Hi there,
I'm currently trying to download subsets of some CMIP6 models data using the esgf-pyclient following the [https://esgf-pyclient.readthedocs.io/en/latest/notebooks/demo/subset-cmip6.html
] examples. It works mostly great, two minors adjustments I had to make were setting 'decode_cf' to 'False' while opening xr, and spatial subsetting (da.sel() doesn't work probably because my data has multidimensional coordinates but manage to get a workaround on it).
However, once I get to the point to extract it to .nc, a simple da.to_netcdf('test.nc') returns me an "AttributeError: NetCDF: String match to name in use" error. I've tried then setting up to netcdf3_classic (as a test)
da.to_netcdf('teste.nc', 'w', 'NETCDF3_CLASSIC')
and it does initially run, creates a file but it breaks down at some point witch donโt surprise me much as netcdf3_classic is not well prepared to handle files over 2GB. Then I get the errors:
...
RuntimeError: NetCDF: Operation not allowed in define mode... During handling of the above exception, another exception occurred: ..... RuntimeError: NetCDF: One or more variable sizes violate format constraints.
Opening the created file makes no sense as the variable of interest comes up full of '--' (nothing just which seems '-' strings but it takes up over 2GB of no data). I've tried too with files I know are under 2GB just as a test and the error I get is "RuntimeError: NetCDF: Access failure" (when using to_netcdf, they are also created but make no sense).
I've looked up through the data before setting up to extract to nc. I'm still learning programming, python and handling netcdf files but I manage to understand a bit. Using dask and data access protocols such as opendap(pydap/netcdf4) however, it's still a bit cloudy for me. I was able to access the references variables values (time, lat, lon, levels) but once I get to my variable of interest values, it just also breaks down, examples below:
vo = da.variables["vo"][:,:,:,:].values # RuntimeError: NetCDF: Access failure
vo = subset.variables['vo'][1,1,:,:].values # it does work, but I'm then able to access all the values of my variable to construct my whole file
I should note if I don't set the decode_cf to False, the error while trying to create the nc is "AttributeError: 'numpy.float64' object has no attribute โyearโ (just in case someone runs into it too).
Any thoughts?
It looks like the latest release (v0.2.2) is not available on PyPI and conda forge. Is there a reason for this or did you forget to upload it?
esgf-pyclient is used in ESMValTool. We normally install packages via Conda (see the environment.yml file here: https://github.com/ESMValGroup/ESMValTool/blob/REFACTORING_backend/environment.yml
Would it be possible to create a Conda package for esgf-pyclient?
The default batch size of 50, set in search/consts.py
: DEFAULT_BATCH_SIZE = 50
, makes the response slow. Adjusting it to 5000 gives a much faster response.
Is there any plans to port this code to python 3? If you want, I could do it but it appears that the authentication might not be easily ported since it relies on MyProxyClient, which is also fairly out of date.
I've found an example of a search using pyesgf where changing the batch size changes the number of results although the documentation says: "The batch_size argument does not affect the final result but may affect the speed of the response."
Here's a test that demonstrates the problem:
import unittest
from pyesgf.search import SearchConnection
class TestBatchSize(unittest.TestCase):
def test_batch_size_has_no_impact_on_results(self):
conn = SearchConnection(
'https://esgf-index1.ceda.ac.uk/esg-search', distrib=True)
ctx = conn.new_context(
mip_era='CMIP6', institution_id='CCCma',
experiment_id='pdSST-pdSIC', table_id='Amon', variable_id='ua')
results = ctx.search(batch_size=50)
ids_batch_size_50 = sorted(results, key=lambda x: x.dataset_id)
ctx = conn.new_context(
mip_era='CMIP6', institution_id='CCCma',
experiment_id='pdSST-pdSIC', table_id='Amon', variable_id='ua')
results = ctx.search(batch_size=100)
ids_batch_size_100 = sorted(results, key=lambda x: x.dataset_id)
self.assertEqual(len(ids_batch_size_50), len(ids_batch_size_100))
if __name__ == '__main__':
unittest.main()
The following python code reports finding 38 files, but a query using the web interface finds a dataset with 86 files. Why isn't the python version finding all the files?
from pyesgf.search import SearchConnection
conn = SearchConnection(
"https://esgf.ceda.ac.uk/esg-search", distrib=True)
ctx = conn.new_context(
mip_era="CMIP6", source_id="EC-Earth3", experiment_id="ssp370",
member_id="r1i1p1f1", table_id="Amon", variable_id="pr",
latest=True)
results = ctx.search(batch_size=1000)
files = results[0].file_context().search()
print(len(files))
The DatasetResult.file_context
function (see results.py
) doesn't allow a facets
keyword argument, but we might want to set the facets
property of the FileSearchContext
object that is returned (especially in order to avoid the default facets='*'
).
Currently we have to monkey-patch it:
fc = result.file_context()
fc.facets = 'project'
but it would be nice to be able to do:
fc = result.file_context(facets='project')
Should be a simple fix to just add the argument and pass it through.
Dear team,
I'm using ESGF API for listing the files. I gave the parameters as input in given below
Project: Cordex
Institute: MPI-CSI
Time Freq: day
Ensemble: r1i1p1
Domain: WAS-44i
Driving Model: MPI-M-MPI-ESM-LR
OpenID:
Password:
esgf-node: https://esgf-node.ipsl.upmc.fr/esg-search
I'm getting error like
In above error, It shows the esgf node "http://esg-cccr.tropmet.res.in" and driving model "CCCma-CanESM2" instead of input datanode and driving model. But few days before, It was working fine.
I would like you to help me to solve this issue.
Thank you.
Can be done with pytest:
https://pypi.org/project/nbval/
See example in birdy.
The internal call inside "context.py" called "__update_counts()" will always add {"facets": "*"} to the query and will, behind the scenes, make a call to refresh the hit count and the available facets. This typically takes 2 seconds to complete.
If you are looping through lots of different contexts this call will make thing slow. Here are two example URLs to demonstrate...
With facet counts (slow):
Without facet counts (quick):
Currently when I attempt to generate wget scripts using SearchContext.get_download_script()
, setting limit
has no effect. There was a comment in the module that said this was a planned feature.
Any reason why this hasn't been implemented yet? This would be useful since the default limit is too small for certain use cases. It seems like something that should be trivial at this point, so correct me if I am wrong.
... these notebooks can also be rendered in the Sphinx doc:
https://nbsphinx.readthedocs.io/en/0.4.2/
esgf-pyclient
into a sandbox environmentrequests-cache
library interface as specified here: #71 (comment)requests-cache
library to ensure they work with the old and new interfacemaster
I am exploring to use esgf-pyclient to get a list of all retracted CMIP6 datasets (for our automated maintenance of Pangeo CMIP6 cloud data.
I am trying the following:
from pyesgf.search import SearchConnection
conn = SearchConnection(
'https://esgf-node.llnl.gov/esg-search',
distrib=True,
)
ctx = conn.new_context(mip_era='CMIP6', retracted=True, replica=False, fields='id', facets=['doi'])
ctx.hit_count
And I get back a hit count of 691984
But when I try to extract a list of instance_ids
results = ctx.search(batch_size=10000)
retracted = [ds.dataset_id for ds in results]
len(retracted)
The list only has 240000
elements. That very even number makes me think that there is some internal limit I am hitting here?
Or did I miss something in the above code?
Any help on this would be greatly appreciated.
Hi everyone,
i have one more thing where I am a little lost. I can access ESGF url via pyesgf
and it works fine for me with opendap. However, I access CORDEX datasets regularly which require a logon to ESGF for data access. It works fine with opendap and xarray if i logon and search like, e.g.,
import xarray as xr
import pyesgf
from pyesgf.logon import LogonManager
from pyesgf.search import SearchConnection
lm = LogonManager()
# logon
myproxy_host = 'esgf-data.dkrz.de'
lm.logon(hostname=myproxy_host, interactive=True, bootstrap=True)
print(lm.is_logged_on())
# search
conn = SearchConnection('http://esgf-data.dkrz.de/esg-search', distrib=False)
ctx = conn.new_context(project='CORDEX', experiment='evaluation', time_frequency='mon',
variable='tas', driving_model="ECMWF-ERAINT", domain="EUR-11")
result = ctx.search()
print(f"length: {len(result)}")
res = result[0]
ctx = res.file_context()
#ctx.facet_counts
dataset = ctx.search()
download_url = dataset[0].download_url
opendap_url = dataset[0].opendap_url
ds = xr.open_dataset(opendap_url)
ds
I can't access via the download_url
, e.g.,
import fsspec
with fsspec.open(download_url, ssl=True) as f:
ds = xr.open_dataset(f)
which give a 401 Unauthorized
error...
---------------------------------------------------------------------------
ClientResponseError Traceback (most recent call last)
File /opt/anaconda3/envs/pyesgf/lib/python3.10/site-packages/fsspec/implementations/http.py:391, in HTTPFileSystem._info(self, url, **kwargs)
389 try:
390 info.update(
--> 391 await _file_info(
392 url,
393 size_policy=policy,
394 session=session,
395 **self.kwargs,
396 **kwargs,
397 )
398 )
399 if info.get("size") is not None:
File /opt/anaconda3/envs/pyesgf/lib/python3.10/site-packages/fsspec/implementations/http.py:772, in _file_info(url, session, size_policy, **kwargs)
771 async with r:
--> 772 r.raise_for_status()
774 # TODO:
775 # recognise lack of 'Accept-Ranges',
776 # or 'Accept-Ranges': 'none' (not 'bytes')
777 # to mean streaming only, no random access => return None
File /opt/anaconda3/envs/pyesgf/lib/python3.10/site-packages/aiohttp/client_reqrep.py:1004, in ClientResponse.raise_for_status(self)
1003 self.release()
-> 1004 raise ClientResponseError(
1005 self.request_info,
1006 self.history,
1007 status=self.status,
1008 message=self.reason,
1009 headers=self.headers,
1010 )
ClientResponseError: 401, message='401', url=URL('https://cordexesg.dmi.dk/esg-orp/home.htm?redirect=http://cordexesg.dmi.dk/thredds/fileServer/cordex_general/cordex/output/EUR-11/DMI/ECMWF-ERAINT/evaluation/r1i1p1/DMI-HIRHAM5/v1/mon/tas/v20140620/tas_EUR-11_ECMWF-ERAINT_evaluation_r1i1p1_DMI-HIRHAM5_v1_mon_198901-199012.nc')
The above exception was the direct cause of the following exception:
FileNotFoundError Traceback (most recent call last)
Input In [5], in <cell line: 2>()
1 import fsspec
----> 2 with fsspec.open(download_url, ssl=True) as f:
3 ds = xr.open_dataset(f)
File /opt/anaconda3/envs/pyesgf/lib/python3.10/site-packages/fsspec/core.py:104, in OpenFile.__enter__(self)
101 def __enter__(self):
102 mode = self.mode.replace("t", "").replace("b", "") + "b"
--> 104 f = self.fs.open(self.path, mode=mode)
106 self.fobjects = [f]
108 if self.compression is not None:
File /opt/anaconda3/envs/pyesgf/lib/python3.10/site-packages/fsspec/spec.py:1037, in AbstractFileSystem.open(self, path, mode, block_size, cache_options, compression, **kwargs)
1035 else:
1036 ac = kwargs.pop("autocommit", not self._intrans)
-> 1037 f = self._open(
1038 path,
1039 mode=mode,
1040 block_size=block_size,
1041 autocommit=ac,
1042 cache_options=cache_options,
1043 **kwargs,
1044 )
1045 if compression is not None:
1046 from fsspec.compression import compr
File /opt/anaconda3/envs/pyesgf/lib/python3.10/site-packages/fsspec/implementations/http.py:340, in HTTPFileSystem._open(self, path, mode, block_size, autocommit, cache_type, cache_options, size, **kwargs)
338 kw["asynchronous"] = self.asynchronous
339 kw.update(kwargs)
--> 340 size = size or self.info(path, **kwargs)["size"]
341 session = sync(self.loop, self.set_session)
342 if block_size and size:
File /opt/anaconda3/envs/pyesgf/lib/python3.10/site-packages/fsspec/asyn.py:86, in sync_wrapper.<locals>.wrapper(*args, **kwargs)
83 @functools.wraps(func)
84 def wrapper(*args, **kwargs):
85 self = obj or args[0]
---> 86 return sync(self.loop, func, *args, **kwargs)
File /opt/anaconda3/envs/pyesgf/lib/python3.10/site-packages/fsspec/asyn.py:66, in sync(loop, func, timeout, *args, **kwargs)
64 raise FSTimeoutError from return_result
65 elif isinstance(return_result, BaseException):
---> 66 raise return_result
67 else:
68 return return_result
File /opt/anaconda3/envs/pyesgf/lib/python3.10/site-packages/fsspec/asyn.py:26, in _runner(event, coro, result, timeout)
24 coro = asyncio.wait_for(coro, timeout=timeout)
25 try:
---> 26 result[0] = await coro
27 except Exception as ex:
28 result[0] = ex
File /opt/anaconda3/envs/pyesgf/lib/python3.10/site-packages/fsspec/implementations/http.py:404, in HTTPFileSystem._info(self, url, **kwargs)
401 except Exception as exc:
402 if policy == "get":
403 # If get failed, then raise a FileNotFoundError
--> 404 raise FileNotFoundError(url) from exc
405 logger.debug(str(exc))
407 return {"name": url, "size": None, **info, "type": "file"}
FileNotFoundError: http://cordexesg.dmi.dk/thredds/fileServer/cordex_general/cordex/output/EUR-11/DMI/ECMWF-ERAINT/evaluation/r1i1p1/DMI-HIRHAM5/v1/mon/tas/v20140620/tas_EUR-11_ECMWF-ERAINT_evaluation_r1i1p1_DMI-HIRHAM5_v1_mon_198901-199012.nc
I would be greatful for any idea of how I can access CORDEX http urls. If I simply click on those http urls and login (in the webportal), I can download the files, e.g., from the browser. However, I have no experience of how to login with an open id in python for http access...
Hi,
the esgf-pyclient
really works great for me on CMIP5
and CMIP6
data. However, I have some problems accessing CORDEX
data. I have CORDEX_Research
data access rights and can successfully login using the pyclient:
import netCDF4 as nc4
from pyesgf.logon import LogonManager
from pyesgf.search import SearchConnection
import pyesgf
print(nc4.__version__)
print(pyesgf.__version__)
lm = LogonManager()
myproxy_host = 'esgf-data.dkrz.de'
lm.logon(hostname=myproxy_host, interactive=True, bootstrap=True)
lm.is_logged_on()
1.5.3
0.3.0
Enter myproxy username:
g300046
Enter password for g300046: ยทยทยทยทยทยทยทยท
True
# search CORDEX project for REMO2015 fx orog variables
conn = SearchConnection('http://esgf-data.dkrz.de/esg-search', distrib=False)
ctx = conn.new_context(project='CORDEX', experiment='evaluation', time_frequency='fx', rcm_name='REMO2015', variable='orog')
result = ctx.search()
orog_url = {}
# loop through search results of datasets
for res in result:
ctx = res.file_context()
domain = list(ctx.facet_counts['domain'].keys())[0]
print('domain: {}'.format(domain))
# the dataset should contains only one files for fx variables
dataset = ctx.search()
filename = dataset[0].opendap_url
print('filename: {}'.format(filename))
orog_url[domain] = filename
domain: EUR-11
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/EUR-11/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20180813/orog_EUR-11_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx.nc
domain: SAM-22
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/SAM-22/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20191030/orog_SAM-22_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx_r0i0p0.nc
domain: AFR-22
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/AFR-22/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20191030/orog_AFR-22_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx_r0i0p0.nc
domain: CAM-22
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/CAM-22/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20191030/orog_CAM-22_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx_r0i0p0.nc
domain: EAS-22
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/EAS-22/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20191030/orog_EAS-22_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx_r0i0p0.nc
domain: EUR-22
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/EUR-22/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20191030/orog_EUR-22_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx_r0i0p0.nc
domain: SEA-22
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/SEA-22/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20191030/orog_SEA-22_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx_r0i0p0.nc
domain: WAS-22
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/WAS-22/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20191030/orog_WAS-22_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx_r0i0p0.nc
domain: AUS-22
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/AUS-22/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20191030/orog_AUS-22_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx_r0i0p0.nc
domain: CAS-22
filename: http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/CAS-22/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20191030/orog_CAS-22_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx_r0i0p0.nc
orog_url.keys()
dict_keys(['EUR-11', 'SAM-22', 'AFR-22', 'CAM-22', 'EAS-22', 'EUR-22', 'SEA-22', 'WAS-22', 'AUS-22', 'CAS-22'])
url = orog_url['EUR-11']
url
'http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/EUR-11/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20180813/orog_EUR-11_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx.nc'
This works all fine until I actually want to access the data:
# netcdf4 engine
ds = nc4.Dataset(url)
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-8-fbb4748a9677> in <module>()
1 # netcdf4 engine
----> 2 ds = nc4.Dataset(url)
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()
OSError: [Errno -68] NetCDF: I/O failure: b'http://esgf1.dkrz.de/thredds/dodsC/cordex/cordex/output/EUR-11/GERICS/ECMWF-ERAINT/evaluation/r0i0p0/GERICS-REMO2015/v1/fx/orog/v20180813/orog_EUR-11_ECMWF-ERAINT_evaluation_r0i0p0_GERICS-REMO2015_v1_fx.nc'
With CMIP5 data everything works fine, e.g,:
# check with CMIP5 data, this works fine.
url = "http://esgf1.dkrz.de/thredds/dodsC/cmip5/cmip5/output1/MPI-M/MPI-ESM-LR/historical/fx/atmos/fx/r0i0p0/v20120315/orog/orog_fx_MPI-ESM-LR_historical_r0i0p0.nc"
ds = nc4.Dataset(url)
ds
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_CLASSIC data model, file format DAP2):
institution: Max Planck Institute for Meteorology
institute_id: MPI-M
experiment_id: historical
source: MPI-ESM-LR 2011; URL: http://svn.zmaw.de/svn/cosmos/branches/releases/mpi-esm-cmip5/src/mod; atmosphere: ECHAM6 (REV: 4603), T63L47; land: JSBACH (REV: 4603); ocean: MPIOM (REV: 4603), GR15L40; sea ice: 4603; marine bgc: HAMOCC (REV: 4603);
model_id: MPI-ESM-LR
forcing: GHG,Oz,SD,Sl,Vl,LU
parent_experiment_id: piControl
parent_experiment_rip: r1i1p1
branch_time: 10957.0
contact: [email protected]
history: Model raw output postprocessing with modelling environment (IMDI) at DKRZ: URL: http://svn-mad.zmaw.de/svn/mad/Model/IMDI/trunk, REV: 4201 2012-01-13T07:51:03Z CMOR rewrote data to comply with CF standards and CMIP5 requirements.
references: ECHAM6: n/a; JSBACH: Raddatz et al., 2007. Will the tropical land biosphere dominate the climate-carbon cycle feedback during the twenty first century? Climate Dynamics, 29, 565-574, doi 10.1007/s00382-007-0247-8; MPIOM: Marsland et al., 2003. The Max-Planck-Institute global ocean/sea ice model with orthogonal curvilinear coordinates. Ocean Modelling, 5, 91-127; HAMOCC: Technical Documentation, http://www.mpimet.mpg.de/fileadmin/models/MPIOM/HAMOCC5.1_TECHNICAL_REPORT.pdf;
initialization_method: 0
physics_version: 0
tracking_id: d9bbcbd4-c852-4bd0-a3b4-0fccb598f23c
product: output
experiment: historical
frequency: fx
creation_date: 2012-01-13T07:51:03Z
Conventions: CF-1.4
project_id: CMIP5
table_id: Table fx (26 July 2011) 491518982c8d8b607a58ba740689ea09
title: MPI-ESM-LR model output prepared for CMIP5 historical
parent_experiment: pre-industrial control
modeling_realm: atmos
realization: 0
cmor_version: 2.6.0
dimensions(sizes): bnds(2), lat(96), lon(192)
variables(dimensions): float64 lat(lat), float64 lat_bnds(lat,bnds), float64 lon(lon), float64 lon_bnds(lon,bnds), float32 orog(lat,lon)
groups:
I know, that this is no esgf-pyclient issue but I wonder how the logon would work. I suspect it's a problem with me logging onto ESGF via python (I can logon also on the web interface of ESGF and download CORDEX data without a probem). It would be really nice for me to have access to the opendap urls via python, too. Thanks a lot!
The following issue occurs with OpenSSL 1.1.1e but goes away if I downgrade to 1.1.1d. It seems that other users of OpenSSL are reporting similar issues (e.g. openssl/openssl#11381).
As an interim measure, I suggest that specifying a dependency on OpenSSL=1.1.1d.
from pyesgf.logon import LogonManager
lm = LogonManager()
# Error trace:
4 openid = "MY_OPENID"
5 password = "MY_PASSWORD"
----> 6 lm.logon_with_openid(openid=openid, password=password, bootstrap=True)
7 lm.is_logged_on()
~/.conda/envs/research/lib/python3.8/site-packages/pyesgf/logon.py in logon_with_openid(self, openid, password, bootstrap, update_trustroots, interactive)
144 """
145 username, myproxy = self._get_logon_details(openid)
--> 146 return self.logon(username, password, myproxy,
147 bootstrap=bootstrap,
148 update_trustroots=update_trustroots,
~/.conda/envs/research/lib/python3.8/site-packages/pyesgf/logon.py in logon(self, username, password, hostname, bootstrap, update_trustroots, interactive)
181 c = MyProxyClient(hostname=hostname, caCertDir=self.esgf_certs_dir)
182
--> 183 creds = c.logon(username, password,
184 bootstrap=bootstrap,
185 updateTrustRoots=update_trustroots)
~/.conda/envs/research/lib/python3.8/site-packages/myproxy/client/__init__.py in logon(self, username, passphrase, credname, lifetime, keyPair, certReq, nBitsForKey, bootstrap, updateTrustRoots, authnGetTrustRootsCall, sslCertFile, sslKeyFile, sslKeyFilePassphrase)
1451 getTrustRootsKw = {}
1452
-> 1453 self.getTrustRoots(writeToCACertDir=True,
1454 bootstrap=bootstrap,
1455 **getTrustRootsKw)
~/.conda/envs/research/lib/python3.8/site-packages/myproxy/client/__init__.py in getTrustRoots(self, username, passphrase, writeToCACertDir, bootstrap)
1622 try:
1623 for tries in range(self.MAX_RECV_TRIES):
-> 1624 dat += conn.recv(self.SERVER_RESP_BLK_SIZE)
1625 except SSL.SysCallError:
1626 # Expect this exception when response content exhausted
~/.conda/envs/research/lib/python3.8/site-packages/OpenSSL/SSL.py in recv(self, bufsiz, flags)
1807 else:
1808 result = _lib.SSL_read(self._ssl, buf, bufsiz)
-> 1809 self._raise_ssl_error(self._ssl, result)
1810 return _ffi.buffer(buf, result)[:]
1811 read = recv
~/.conda/envs/research/lib/python3.8/site-packages/OpenSSL/SSL.py in _raise_ssl_error(self, ssl, result)
1669 pass
1670 else:
-> 1671 _raise_current_error()
1672
1673 def get_context(self):
~/.conda/envs/research/lib/python3.8/site-packages/OpenSSL/_util.py in exception_from_error_queue(exception_type)
52 text(lib.ERR_reason_error_string(error))))
53
---> 54 raise exception_type(errors)
55
56
Error: [('SSL routines', 'ssl3_read_n', 'unexpected eof while reading')]
I am working on what I think is a fairly common workflow:
LogonManager
classSearchConnection
classnetcdf4-python
or pydap
Here's an example workflow:
In [1]: openid = 'https://esgf-node.llnl.gov/esgf-idp/openid/SECRET'
...: password = 'SECRET'
...:
In [2]: from pyesgf.logon import LogonManager
...: from pyesgf.search import SearchConnection
...: import xarray as xr
...:
In [3]: # intialize the logon manager
...: lm = LogonManager(verify=True)
...: if not lm.is_logged_on():
...: lm.logon_with_openid(openid, password, 'pcmdi9.llnl.gov')
...: lm.is_logged_on()
...:
Out[3]: True
In [4]: def print_context_info(ctx):
...: print('Hits:', ctx.hit_count)
...: print('Realms:', ctx.facet_counts['experiment'])
...: print('Realms:', ctx.facet_counts['realm'])
...: print('Ensembles:', ctx.facet_counts['ensemble'])
...:
In [5]: # search for some data
...: conn = SearchConnection('http://pcmdi9.llnl.gov/esg-search', distrib=Tru
...: e)
...: ctx = conn.new_context(project='CMIP5', model='CCSM4', experiment='rcp85
...: ', time_frequency='day')
...: ctx = ctx.constrain(realm='atmos', ensemble='r1i1p1')
...:
...: # print a summary of what we found
...: print_context_info(ctx)
...:
Hits: 4
Realms: {'rcp85': 4}
Realms: {'atmos': 4}
Ensembles: {'r1i1p1': 4}
In [6]: # aggregate results
...: result = ctx.search()[0]
...: agg_ctx = result.aggregation_context()
...:
...: # get a list of opendap urls
...: x = list(a.opendap_url for a in agg_ctx.search() if a.opendap_url)
...: x
...:
Out[6]:
['http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.tasmin.20120705.aggregation.1',
'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.tasmax.20120705.aggregation.1',
'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.prc.20120705.aggregation.1',
'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.psl.20120705.aggregation.1',
'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.tas.20120705.aggregation.1',
'http://aims3.llnl.gov/thredds/dodsC/cmip5.output1.NCAR.CCSM4.rcp85.day.atmos.day.r1i1p1.pr.20120705.aggregation.1']
In [7]: # try opening one of the opendap datasets
...: xr.open_dataset(x[0], engine='pydap')
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-7-90d39efb83f7> in <module>()
1 # try opening one of the opendap datasets
----> 2 xr.open_dataset(x[0], engine='pydap')
~/anaconda/envs/aist/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables)
302 autoclose=autoclose)
303 elif engine == 'pydap':
--> 304 store = backends.PydapDataStore.open(filename_or_obj)
305 elif engine == 'h5netcdf':
306 store = backends.H5NetCDFStore(filename_or_obj, group=group,
~/anaconda/envs/aist/lib/python3.6/site-packages/xarray/backends/pydap_.py in open(cls, url, session)
75 def open(cls, url, session=None):
76 import pydap.client
---> 77 ds = pydap.client.open_url(url, session=session)
78 return cls(ds)
79
~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/client.py in open_url(url, application, session, output_grid)
62 never retrieve coordinate axes.
63 """
---> 64 dataset = DAPHandler(url, application, session, output_grid).dataset
65
66 # attach server-side functions
~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/handlers/dap.py in __init__(self, url, application, session, output_grid)
62
63 # build the dataset from the DDS and add attributes from the DAS
---> 64 self.dataset = build_dataset(dds)
65 add_attributes(self.dataset, parse_das(das))
66
~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/dds.py in build_dataset(dds)
159 def build_dataset(dds):
160 """Return a dataset object from a DDS representation."""
--> 161 return DDSParser(dds).parse()
162
163
~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/dds.py in parse(self)
47 dataset = DatasetType('nameless')
48
---> 49 self.consume('dataset')
50 self.consume('{')
51 while not self.peek('}'):
~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/dds.py in consume(self, regexp)
39 def consume(self, regexp):
40 """Consume and return a token."""
---> 41 token = super(DDSParser, self).consume(regexp)
42 self.buffer = self.buffer.lstrip()
43 return token
~/anaconda/envs/aist/lib/python3.6/site-packages/pydap/parsers/__init__.py in consume(self, regexp)
180 self.buffer = self.buffer[len(token):]
181 else:
--> 182 raise Exception("Unable to parse token: %s" % self.buffer[:10])
183 return token
Exception: Unable to parse token:
Questions:
Hi all,
Thanks for maintaining this library. Really useful in my day to day. I wanted to raise an issue since I don't think Python3 is 100% compatible yet. When generating a wget
script, I can only run it from my shell if my native Python is 2.7. This is a bit of pain when working in conda and needing to create a Python2.7 environment to run scripts written using Python3.
The error raised when running the generated wget scripts with Python3 is as follows:
File "<stdin>", line 18
print "-s %s -p %s -l %s" % (host, port, username)
^
SyntaxError: invalid syntax
Is this a known issue? Are there plans to address this issue? Is this user error? Please let me know.
Cheers,
Hi, the documentation indicates that this package should work with CMIP6. However, when I attempt the following:
conn = SearchConnection('https://esgf-node.llnl.gov/esg-search')
ctx = conn.new_context(project='CMIP6', experiment='past1000', variable='tas')
print('Hits: {}, Realms: {}, Ensembles: {}'.format(
ctx.hit_count,
ctx.facet_counts['realm'],
ctx.facet_counts['ensemble']))
print(ctx.get_facet_options())
I get different results than searching through the web GUI at https://esgf-node.llnl.gov/search/cmip6/. The CMIP6 data guide (https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html) directs me back to the RESTful API (https://esgf.github.io/esgf-user-support/user_guide.html#the-esgf-search-restful-api) which is providing incomplete results.
Does anyone know the source of this issue or what alternatives exist for bulk downloads of filtered CMIP6 data? The wget script generator has been inconsistent & I've found myself missing datasets. Downloading search results as JSON is limited to 100 results at a time. Is this potentially due to the THREADDs catalog being down? The CEDA node alternatively notes CMIP6 data is still in the process of being added to its FTP catalog, so that is out too.
Sorry if I am missing something obvious. If anyone has leads it would be much appreciated!
I have a simple example, i logon with my openid to ESGF:
OPENID = 'https://esgf-data.dkrz.de/esgf-idp/openid/<user>'
lm.logon_with_openid(openid=OPENID, interactive=True, bootstrap=True)
lm.is_logged_on()
That works fine, however, i still get 401
if i want to access a dataset on esgf.dwd.de
:
import xarray as xr
xr.open_dataset("https://esgf.dwd.de/thredds/dodsC/esgf2_1/cordex/output/EUR-11/CLMcom/MIROC-MIROC5/rcp26/r1i1p1/CLMcom-CCLM4-8-17/v1/mon/tas/v20180707/tas_EUR-11_MIROC-MIROC5_rcp26_r1i1p1_CLMcom-CCLM4-8-17_v1_mon_200601-201012.nc", engine="pydap")
HTTPError: 401 401
Shouldn't my openid grant general access to all ESGF servers? I have also problems accessing other servers, e.g., only esgf-data.dkrz.de
seems to be stable. Actually, i can go to the web interface, logon, and access that URL, so that works fine. It fails also with the netcdf4
engine...
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.