bebatut / enasearch Goto Github PK
View Code? Open in Web Editor NEWA Python library for interacting with ENA's API
Home Page: http://bebatut.fr/enasearch
License: MIT License
A Python library for interacting with ENA's API
Home Page: http://bebatut.fr/enasearch
License: MIT License
>>> enasearch.retrieve_run_report(accession="ERR1558694", fields="run_accession,instrument_platform,library_strategy,read_count,fastq_ftp")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/database/lib/python2.7/site-packages/enasearch/__init__.py", line 737, in retrieve_run_report
file=file)
File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/database/lib/python2.7/site-packages/enasearch/__init__.py", line 721, in retrieve_filereport
return request_url(url, "text", file)
File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/database/lib/python2.7/site-packages/enasearch/__init__.py", line 375, in request_url
r.raise_for_status()
File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/database/lib/python2.7/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://www.ebi.ac.uk/ena/portal/api/filereport
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
backports 1.0 py27_1 conda-forge
biopython 1.76 py27h516909a_0 conda-forge
blas 1.0 mkl
brotlipy 0.7.0 py27h516909a_1000 conda-forge
ca-certificates 2021.7.5 h06a4308_1
certifi 2020.6.20 pyhd3eb1b0_3
cffi 1.14.0 py27he30daa8_1
chardet 3.0.4 py27h8c360ce_1006 conda-forge
click 8.0.0 pyhd3eb1b0_0
configparser 4.0.2 py27_0
contextlib2 0.6.0.post1 py_0 conda-forge
cryptography 2.8 py27h2c19f6a_2 conda-forge
dicttoxml 1.7.4 py27_0 conda-forge
enasearch 0.2.2 py27_1 bioconda
enum34 1.1.10 py27h8c360ce_1 conda-forge
flake8 3.8.3 pyh9f0ad1d_0 conda-forge
idna 2.10 pyh9f0ad1d_0 conda-forge
importlib-metadata 1.5.0 py27h8c360ce_1 conda-forge
intel-openmp 2021.3.0 h06a4308_3350
ipaddress 1.0.23 py_0 conda-forge
libffi 3.3 h58526e2_2 conda-forge
libgcc-ng 11.1.0 hc902ee8_8 conda-forge
libgfortran-ng 7.5.0 h14aa051_19 conda-forge
libgfortran4 7.5.0 h14aa051_19 conda-forge
libgomp 11.1.0 hc902ee8_8 conda-forge
libstdcxx-ng 11.1.0 h56837e0_8 conda-forge
mccabe 0.6.1 py27_0 conda-forge
mkl 2020.2 256
mkl-service 2.3.0 py27he904b0f_0
mkl_fft 1.0.15 py27ha843d7b_0
mkl_random 1.1.0 py27hd6b4f25_0
more-itertools 5.0.0 py_0 conda-forge
ncurses 6.2 h58526e2_4 conda-forge
numpy 1.16.6 py27hbc911f0_0
numpy-base 1.16.6 py27hde5b4d6_0
openssl 1.1.1k h7f98852_1 conda-forge
pathlib2 2.3.5 py27h8c360ce_1 conda-forge
pip 20.1.1 pyh9f0ad1d_0 conda-forge
pycodestyle 2.6.0 pyh9f0ad1d_0 conda-forge
pycparser 2.20 pyh9f0ad1d_2 conda-forge
pyflakes 2.2.0 pyh9f0ad1d_0 conda-forge
pyopenssl 20.0.1 pyhd3eb1b0_1
pysocks 1.7.1 py27h8c360ce_1 conda-forge
python 2.7.18 ha1903f6_2
python_abi 2.7 1_cp27mu conda-forge
readline 8.1 h46c0cb4_0 conda-forge
requests 2.25.1 pyhd3deb0d_0 conda-forge
scandir 1.10.0 py27hdf8410d_1 conda-forge
setuptools 44.0.0 py27_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
sqlite 3.36.0 h9cd32fc_0 conda-forge
tk 8.6.10 hed695b0_1 conda-forge
urllib3 1.26.6 pyhd8ed1ab_0 conda-forge
wheel 0.37.0 pyhd8ed1ab_1 conda-forge
xmltodict 0.12.0 py_0 conda-forge
zipp 1.0.0 py_0 conda-forge
zlib 1.2.11 h516909a_1010 conda-forge
While every command is written out in the documentation, the functionality of the command, what the output is or does, or what the options are, are not described.
Here it would be helpful to describe what an 'analysis field' is, and what the [OPTIONS]
are. What can I use a list of analysis fields for?
I am having trouble using the search_data() function to filter fields that I return
Going from the tutorial, if I update to call this:
data = enasearch.search_data(
free_text_search=True,
query="SMP1+homo",
result="sequence_release",
display="report", fields="accession,collection_date")
I still get all fields (including sequence, etc) not just the fields of "accession" and "collection date" I am interested in
Am I missing something?
Thanks!
Would it be easy to support the short option -h
as an alias for --help
too?
enasearch -h
Error: no such option: -h
When I try to install enasearch through conda, it asks me to downgrade my environment to Python 2
$ conda install -c bioconda enasearch
Solving environment: done
## Package Plan ##
environment location: /home/giordano-n/.miniconda3/envs/test_enasearch
added / updated specs:
- enasearch
The following NEW packages will be INSTALLED:
asn1crypto: 0.24.0-py27_0
biopython: 1.68-py27_0 bioconda
blas: 1.0-mkl
cffi: 1.11.5-py27he75722e_1
chardet: 3.0.4-py27_1
click: 7.0-py27_0
configparser: 3.5.0-py27_0
cryptography: 2.3.1-py27h1ba5d50_2
dicttoxml: 1.7.4-py27_0 conda-forge
enasearch: 0.2.2-py27_1 bioconda
enum34: 1.1.6-py27_1
flake8: 3.6.0-py27_0
freetype: 2.9.1-h8a8886c_1
idna: 2.7-py27_0
intel-openmp: 2019.0-118
ipaddress: 1.0.22-py27_0
jpeg: 9b-h024ee3a_2
libgfortran-ng: 7.3.0-hdf63c60_0
libpng: 1.6.35-hbc83047_0
libtiff: 4.0.9-he85c1e1_2
mccabe: 0.6.1-py27_1
mkl: 2019.0-118
mkl_fft: 1.0.6-py27h7dd41cf_0
mkl_random: 1.0.1-py27h4414c95_1
mmtf-python: 1.0.2-py27_0 bioconda
msgpack-python: 0.5.6-py27h6bb024c_1
numpy: 1.15.4-py27h1d66e8a_0
numpy-base: 1.15.4-py27h81de0dd_0
olefile: 0.46-py27_0
pillow: 5.3.0-py27h34e0f95_0
pycodestyle: 2.4.0-py27_0
pycparser: 2.19-py27_0
pyflakes: 2.0.0-py27_0
pyopenssl: 18.0.0-py27_0
pysocks: 1.6.8-py27_0
reportlab: 3.5.9-py27he686d34_0
requests: 2.20.0-py27_0
six: 1.11.0-py27_1
urllib3: 1.23-py27_0
xmltodict: 0.9.2-py27_0 bioconda
The following packages will be UPDATED:
certifi: 2018.10.15-py37_0 --> 2018.10.15-py27_0
pip: 18.1-py37_0 --> 18.1-py27_0
setuptools: 40.5.0-py37_0 --> 40.5.0-py27_0
wheel: 0.32.2-py37_0 --> 0.32.2-py27_0
The following packages will be DOWNGRADED:
python: 3.7.1-h0371630_3 --> 2.7.15-h9bab390_4
Proceed ([y]/n)?
Installing manually via python3 setup.py install
seems to work fine.
Is it possible to handle non-existent field names in a nicer way?
The python backtrace is ugly.
% retrieve_run_report --accession PRJNA275974 --fields run_accession --fields wrong_name
Traceback (most recent call last):
File "/home/linuxbrew/.linuxbrew/bin/enasearch", line 11, in <module>
load_entry_point('enasearch==0.0.6', 'console_scripts', 'enasearch')()
File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/enasearch/__main__.py", line 353, in retrieve_run_report
file=file)
File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/enasearch/__init__.py", line 653, in retrieve_run_report
file=file)
File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/enasearch/__init__.py", line 633, in retrieve_filereport
check_returnable_fields(fields.split(","), result)
File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/enasearch/__init__.py", line 136, in check_returnable_fields
raise ValueError(err_str)
ValueError: The field wrong_name is not a returnable field for result read_run
Needed for pipeline auditing:
% enasearch --version
enasearch 0.0.5
For the BioProject PRJNA256314
does not return any data:
enasearch retrieve_data --ids PRJNA256314 --display text
Entry: PRJNA256314 display type is either not supported or entry is not found.
I believe this is an umbrella project. Any ideas why I can't retrieve data.
The current code causes a stack trace for some calls, eg.
enasearch retrieve_taxons --ids 'Elusimicrobia:phylum' --result 'coding_release' --display 'fasta' --download 'fasta'
(probably the call is missing something anyway .. I'm just starting to explore this tool)
click wants to call encode() on the exception message (if python2 is used which is the case in Galaxy installations of enasearch):
but an Exception is used as parameter:
Line 20 in 7c1e280
Run:
$ enasearch retrieve_analysis_report --accession "ERZ009929" --file retrieve_analysis_report_1 --fields "analysis_accession"
Error:
File "/Users/bebatut/miniconda3/envs/enasearch/lib/python3.5/site-packages/enasearch/__init__.py", line 618, in retrieve_filereport
check_returnable_fields(fields.split(","), result)
AttributeError: 'tuple' object has no attribute 'split'
To have meaningful results
There is no description of what the software does. It does say it is an interface to the ENA API, but there are no examples or descriptions of what the use of this would be, or what can be accomplished with the software.
If the user provides no fields, can you default to all the valid fields?
It took me a while to figure out I had to use get_analysis_fields
to figure out what fields I could use.
The repository nor documentation provides an example of how the software can be used to solve a real world analysis problem.
The repository does not contain a JOSS paper (either PDF or MD).
There are not statements of what problems the software is aiming to solve.
Below is a manual URL i use to get ENA SRA files from a given project in TSV format.
I have tried to replicate it with enasearch search_data
or enasearch retrieve_data
but I have failed to get it working.
Any help would be appreciated!
% curl 'http://www.ebi.ac.uk/ena/data/warehouse/filereport?accession=PRJNA275974&result=read_run&download=txt&fields=run_accession,fastq_ftp'
run_accession fastq_ftp
SRR1922792 ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/002/SRR1922792/SRR1922792_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/002/SRR1922792/SRR1922792_2.fastq.gz
SRR1922793 ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/003/SRR1922793/SRR1922793_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/003/SRR1922793/SRR1922793_2.fastq.gz
SRR1922794 ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/004/SRR1922794/SRR1922794_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/004/SRR1922794/SRR1922794_2.fastq.gz
SRR1922795 ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/005/SRR1922795/SRR1922795_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/005/SRR1922795/SRR1922795_2.fastq.gz
SRR1922796 ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/006/SRR1922796/SRR1922796_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/006/SRR1922796/SRR1922796_2.fastq.gz
SRR1922797 ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/007/SRR1922797/SRR1922797_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/007/SRR1922797/SRR1922797_2.fastq.gz
I try to search for taxa by name that appear on multiple levels, e.g., Elusimicrobia.
On https://www.ebi.ac.uk/ena/data/warehouse/search the autocomplete suggests Elusimicrobia <phylum>
and Elusimicrobia <class>
. I could not find out how I can perform such a search with enasearch (command line .. or Galaxy :)?
I am not sure what is wrong here:
% enasearch retrieve_data --ids ERR924889 --display report
Traceback (most recent call last):
File "/home/linuxbrew/.linuxbrew/bin/enasearch", line 11, in <module>
load_entry_point('enasearch==0.0.5', 'console_scripts', 'enasearch')()
File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/enasearch/__main__.py", line 251, in retrieve_data
header=header)
File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/enasearch/__init__.py", line 407, in retrieve_data
header=header)
File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/enasearch/__init__.py", line 365, in build_retrieve_url
check_boolean(expanded)
File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/enasearch/__init__.py", line 281, in check_boolean
raise ValueError(err_str)
ValueError: A boolean value must be only 'true' or 'false'
It seems to output it in internal Python stringified manner.
I would expect STDOUT to behave the same as --result FILENAME
.
This would be very convenient for Unix pipes.
% enasearch retrieve_run_report --accession PRJNA275974 --fields run_accession --fields fastq_ftp
# expected TSV but got __string__
('run_accession\tfastq_ftp\n'
'SRR1922792\t'
'ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/002/SRR1922792/SRR1922792_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/002/SRR1922792/SRR1922792_2.fastq.gz\n'
'SRR1922793\t'
'ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/003/SRR1922793/SRR1922793_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/003/SRR1922793/SRR1922793_2.fastq.gz\n'
'SRR1922794\t'
'ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/004/SRR1922794/SRR1922794_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/004/SRR1922794/SRR1922794_2.fastq.gz\n'
'SRR1922795\t'
'ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/005/SRR1922795/SRR1922795_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/005/SRR1922795/SRR1922795_2.fastq.gz\n'
'SRR1922796\t'
'ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/006/SRR1922796/SRR1922796_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/006/SRR1922796/SRR1922796_2.fastq.gz\n'
'SRR1922797\t'
'ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/007/SRR1922797/SRR1922797_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/007/SRR1922797/SRR1922797_2.fastq.gz\n')
From #6 i see you need to use --fields
multiple times, one per column you want.
This is very verbose. I would much rather use --fields f1,f2,f3
etc.
You could still support the old multi field syntax but combine it with this one. The comma ,
is guaranteed to not be in any ENA field names so it would backward compatible.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.