Giter Site home page Giter Site logo

krassowski / easy-entrez Goto Github PK

View Code? Open in Web Editor NEW
61.0 4.0 6.0 123 KB

Retrieve PubMed articles, text-mining annotations, or molecular data from >35 Entrez databases via easy to use Python package - built on top of Entrez E-utilities API.

Home Page: https://easy-entrez.readthedocs.io/en/latest/

License: GNU Lesser General Public License v3.0

Python 84.37% Jupyter Notebook 15.60% Shell 0.02%
entrez pubmed literature-search literature-mining entrez-eutilities eutilities pubmed-central gene-annotations meta-analysis

easy-entrez's Issues

Docs request: possible tool names

The first parameter of EntrezAPI is a tool. Where can possible tool names be found?

To share, I am trying to look up if a DOI returned by a LLM actually exists.

Request: `StrEnum` for all filters

So far, I am aware of a few filters:

  • Publisher ID: filter by DOI
  • Title: filter by title

It would be cool if easy-entrez defined a StrEnum defining all of the possible filters

Request: officially supporting Python 3.12

Hello team, love this package! Excited to start using it today, except, I use Python 3.12.

Any chance we can:

  • Add Python 3.12 to the test matrix
  • Add 3.12 to setup.py's classifiers

Also, may be good to add python_requires=">=3.7" to the setup() call as well.

problem of (ReadTimeout ) API time out with easy_entrez

I used easy-entrez to get the name of the genes from the SNP ID, I have a large dataset of 7 Million SNP. I just tried with 4000 in ( for loop for just 1000 in one time ) and it gave me an error in the last loop.

HTTPSConnectionPool(host='eutils.ncbi.nlm.nih.gov', port=443): Read timed out. (read timeout=10)
So How can solve this problem?

Request: `async` support

It would be nice to support async usage. As of easy-entrez==0.3.7, it seems async isn't part of this package.

So the request is to either:

How to (1) search by title and (2) download abstract from matching paper

I am trying to figure out how to:

  1. Search for a matching title in PubMed
  2. Download the relevant abstract from there

Point 1 is failing, I am using [Title] filter with exact title, and it's not getting a match:

# SEE: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8021862/
from easy_entrez import EntrezAPI

TITLE_SUBSTRING = "Interpreting Genetic Variation in Human and Cancer Genomes"

api = EntrezAPI(tool="easy-entrez", email="[email protected]")
search_result = api.search(
    term=f'"{TITLE_SUBSTRING}"[Title]', max_results=1, database="pubmed"
)
result = search_result.data["esearchresult"]  # count is 0 here :/

Can you help me piece it together from here?

Batch querying

Hi @krassowski thanks for the easy API!

I was wondering if there is a way to query in batches. I have a list of 1000 coordinates I want to query for rsids. I would have done it in a for-loop but the API is set to limit to 3 queries per second which becomes impossible to implement.

My main question is there a method I can use to query the 1000 coordinates to get their rsids without using a loop? I believe this would be efficient and faster besides bypassing the rate limit set by NCBI.

Running pytest sometimes fails

Running pytest (in a venv) on commit 6cd14fb sometimes fails with the following error:

(.venv) jfreige@sl-akali-p-cs1:easy-entrez (main)$ pytest
=================================================================== test session starts ====================================================================
platform linux -- Python 3.10.12, pytest-7.4.2, pluggy-1.3.0
rootdir: /data/local/jfreige/geo-mining/easy-entrez
plugins: cov-4.1.0
collected 15 items

tests/test_api.py .F.                                                                                                                                [ 20%]
tests/test_parsing.py ....                                                                                                                           [ 46%]
tests/test_queries.py ........                                                                                                                       [100%]

========================================================================= FAILURES =========================================================================
_______________________________________________________________________ test_search ________________________________________________________________________

    def test_search():
        result = entrez_api.search('cancer AND human[organism]', max_results=1)
        assert is_response_for(result, SearchQuery)
        assert not is_response_for(result, FetchQuery)
>       assert result.data['esearchresult']['count'] != 0

tests/test_api.py:29:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
easy_entrez/api.py:45: in data
    if self.content_type == 'json':
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <EntrezResponse status=502 for SearchQuery 'cancer AND human[organism]' in pubmed>

    @property
    def content_type(self) -> ReturnType:
        declared_type = self.response.headers['Content-Type']
        if declared_type.startswith('application/json'):
            return 'json'
        if declared_type.startswith('text/xml'):
            return 'xml'
>       raise ValueError(f'Unknown content type: {declared_type}')
E       ValueError: Unknown content type: text/plain

easy_entrez/api.py:41: ValueError
===================================================================== warnings summary =====================================================================
tests/test_parsing.py:39
  /data/local/jfreige/geo-mining/easy-entrez/tests/test_parsing.py:39: PytestUnknownMarkWarning: Unknown pytest.mark.optional - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @pytest.mark.optional

tests/test_parsing.py:78
  /data/local/jfreige/geo-mining/easy-entrez/tests/test_parsing.py:78: PytestUnknownMarkWarning: Unknown pytest.mark.optional - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @pytest.mark.optional

tests/test_parsing.py:89
  /data/local/jfreige/geo-mining/easy-entrez/tests/test_parsing.py:89: PytestUnknownMarkWarning: Unknown pytest.mark.optional - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @pytest.mark.optional

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================= short test summary info ==================================================================
FAILED tests/test_api.py::test_search - ValueError: Unknown content type: text/plain
======================================================== 1 failed, 14 passed, 3 warnings in 13.15s =========================================================```

Examples fail to run

Description

Running the example code from the documentation fails with this exception:
TypeError: batches_support_wrapper() missing 1 required positional argument: 'collection'

Reproduce

Run this code with easy_entrez 0.3.0:

from easy_entrez import EntrezAPI

entrez_api = EntrezAPI(
    'test',
    '[email protected]'
)

print(entrez_api.link(database=None, ids=[15718680, 157427902], database_from='protein', command='acheck'))

Expected behavior

It should return results instead of throwing this exception.

Context

OS: Ubuntu 18.04.6 LTS
Python: 3.6.9
easy_entrez 0.3.0 was installed in a virtualenv.

Suggestion: `pytest-vcr` for reliable testing

I believe the test suite is actually making requests to Entrez each time it's run.

To fix this, I suggest using pytest-vcr, a pytest plug-in for caching the response of requests in a subfolder of the test folder.

It's very easy to use, and may help with the seeming flakiness of CI at the moment

`fetch` method with JSON response raises exception

Hi @krassowski ,

thanks for providing this cool package. :)

I just ran into an issue. When calling

result = eapi.fetch(['36999552', '36999549', '36999539'], max_results=3, return_type="xml")
batches.data

everything is fine. However, when changing to JSON response:

result = eapi.fetch(['36999552', '36999549', '36999539'], max_results=3, return_type="json")
batches.data

I get the exception: json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 9)

Can you reproduce that? Can you find out, why this happens?

Thanks a lot!

Cheers,

Adrian

Entrez search result limit

Hello!
I've come to this project since the BioPython entrez search fail me.
It used to return more than 9999 results but now there's this cursed limit.
so several question

  1. Is the default search the same as the one in BioPython?
  2. Are the articles added by relevancy? In BioPython they are, and the first articles MIDs here and there are different
  3. And most important one, how can I get more then 9999 results? I've tried the 'in_batchs_of' with the entrez_api.search function but I still get only 9999 results

I need the most simple use of these functions, I want to put a term ('T cell' for example) and get a list of the most 100k relevant articles PMIDs. That's the only thing standing in my project way

Cheers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.