Giter Site home page Giter Site logo

Comments (4)

bouweandela avatar bouweandela commented on July 1, 2024

This appears to be a problem with the ESGF search results, if you specify facets=*, you appear to get only search results from the local node, while if you specify the facet you're interested in (facets=project), you get more complete results?

>>> pyesgf.search.SearchConnection(url='https://esgf-data.dkrz.de/esg-search', distrib=True).new_context().facet_counts['project']
{'wind': 1, 'uerra': 2, 'tracmip': 6767, 'reklies-index': 28792, 'obs4MIPs': 2, 'monthlyfc': 2710, 'input4mips': 5832, 'hiresireland': 66, 'TEST': 4, 'TAMIP': 192, 'PMIP3': 16, 'MiKlip': 5568, 'MPI-GE': 55111, 'LUCID': 112, 'CORDEX-Reklies': 7017, 'CORDEX-ESD': 1370, 'CORDEX': 67908, 'CMIP6': 882140, 'CMIP5': 53725}
>>> pyesgf.search.SearchConnection(url='https://esgf-data.dkrz.de/esg-search', distrib=True).new_context(facets=['project']).facet_counts['project']
{'wind': 1, 'uerra': 2, 'tracmip': 6767, 'specs': 446693, 'reklies-index': 28792, 'psipps': 1, 'primavera': 6947, 'obs4MIPs': 210, 'monthlyfc': 2710, 'input4mips': 11492, 'input4MIPs': 201, 'hiresireland': 66, 'eucleia': 1921, 'e3sm-supplement': 53, 'e3sm': 815, 'cmip3': 71, 'clipc': 114, 'cc4e': 497, 'c3se': 184, 'c3s-cmip5-adjust': 188, 'ana4MIPs': 14, 'TEST': 5, 'TAMIP': 1536, 'PMIP3': 361, 'NEXGDDP': 3, 'NEX': 10, 'NARR_Hydrology': 85, 'MiKlip': 5568, 'MPI-GE': 55111, 'LUCID': 318, 'ISIMIP3b': 550, 'ISIMIP3a': 111, 'ISIMIP2b': 95963, 'ISIMIP2a': 13803, 'ISIMIP2 Phase a': 288, 'ISI-MIP Fast Track': 856, 'GeoMIP': 757, 'EUCLIPSE': 41, 'CREATE-IP': 114, 'CORDEX-Reklies': 7017, 'CORDEX-ESD': 1370, 'CORDEX-Adjust': 1221, 'CORDEX': 183980, 'CMIP6': 11249567, 'CMIP5': 201129, 'CMIP3': 29331, 'CDAT-sample': 1, 'BioClim': 2, 'ACME': 23}

It might be nice to add some information about this behavior in the documentation, because this is not very intuitive.

from esgf-pyclient.

alaniwi avatar alaniwi commented on July 1, 2024

This is essentially a server-side issue. Since CMIP6, the number of facet values has increased substantially because a few of the facets have potentially a very large number of possible values (sorry I forget exactly which ones), and if you include facets=* in the search then the server does not seem to do a proper distributed search across all the shards, with the consequence that results are missing. There was a workaround implemented on one or two of the index nodes, including DKRZ, to exclude certain known problematic facets from the facets='*', but this seems to be no longer the case. Anyway, the point here is that it has nothing to do with pyclient, and exactly the same behaviour is observed using for example curl.

$ url_stem='https://esgf-data.dkrz.de/esg-search/search/?limit=0&type=Dataset&project=CMIP6'

$ curl -s "$url_stem&facets=project%2Cproduct%2Cdata_node" | grep numFound
<result name="response" numFound="11307457" start="0" maxScore="1.0">

$ curl -s "$url_stem&facets=*" | grep numFound
<result name="response" numFound="886069" start="0" maxScore="1.0">

Now the behaviour in esgf-pyclient is to default to facets=* and the problem is that changing this would break backward compatibility. But we can issue a warning in the code.

I have pushed a feature branch with a commit to implement such a warning, and will issue a pull-request cross-referenced to this issue, but I do not want to merge to master myself. Somebody needs to review this and see if it is desirable.

from esgf-pyclient.

bouweandela avatar bouweandela commented on July 1, 2024

Thanks for looking into this! If the warning in the code is not desirable, a warning in the documentation would already be great.

from esgf-pyclient.

alaniwi avatar alaniwi commented on July 1, 2024

Branch add_facets_star_warnings described above now also contains doc string changes which will get incorporated in the Sphinx documentation.

Hoping this can be merged soon, after I have sorted out a couple of CI issues.

from esgf-pyclient.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.