Giter Site home page Giter Site logo

enasearch's People

Contributors

bebatut avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

enasearch's Issues

Support comma separated --fields a,b,c

From #6 i see you need to use --fields multiple times, one per column you want.

This is very verbose. I would much rather use --fields f1,f2,f3 etc.

You could still support the old multi field syntax but combine it with this one. The comma , is guaranteed to not be in any ENA field names so it would backward compatible.

Default to ALL fields in --fields?

If the user provides no fields, can you default to all the valid fields?

It took me a while to figure out I had to use get_analysis_fields to figure out what fields I could use.

No functional claims of the software

There is no description of what the software does. It does say it is an interface to the ENA API, but there are no examples or descriptions of what the use of this would be, or what can be accomplished with the software.

trouble filtering fields to return

I am having trouble using the search_data() function to filter fields that I return

Going from the tutorial, if I update to call this:
data = enasearch.search_data(
free_text_search=True,
query="SMP1+homo",
result="sequence_release",
display="report", fields="accession,collection_date")

I still get all fields (including sequence, etc) not just the fields of "accession" and "collection date" I am interested in

Am I missing something?

Thanks!

How do I replicate this URL via this module?

Below is a manual URL i use to get ENA SRA files from a given project in TSV format.

I have tried to replicate it with enasearch search_data or enasearch retrieve_data but I have failed to get it working.

Any help would be appreciated!

% curl 'http://www.ebi.ac.uk/ena/data/warehouse/filereport?accession=PRJNA275974&result=read_run&download=txt&fields=run_accession,fastq_ftp'

run_accession   fastq_ftp
SRR1922792      ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/002/SRR1922792/SRR1922792_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/002/SRR1922792/SRR1922792_2.fastq.gz
SRR1922793      ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/003/SRR1922793/SRR1922793_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/003/SRR1922793/SRR1922793_2.fastq.gz
SRR1922794      ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/004/SRR1922794/SRR1922794_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/004/SRR1922794/SRR1922794_2.fastq.gz
SRR1922795      ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/005/SRR1922795/SRR1922795_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/005/SRR1922795/SRR1922795_2.fastq.gz
SRR1922796      ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/006/SRR1922796/SRR1922796_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/006/SRR1922796/SRR1922796_2.fastq.gz
SRR1922797      ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/007/SRR1922797/SRR1922797_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/007/SRR1922797/SRR1922797_2.fastq.gz

Friendly handling of illegal --fields

Is it possible to handle non-existent field names in a nicer way?
The python backtrace is ugly.

% retrieve_run_report --accession PRJNA275974 --fields run_accession --fields wrong_name

Traceback (most recent call last):
  File "/home/linuxbrew/.linuxbrew/bin/enasearch", line 11, in <module>
    load_entry_point('enasearch==0.0.6', 'console_scripts', 'enasearch')()
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/enasearch/__main__.py", line 353, in retrieve_run_report
    file=file)
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/enasearch/__init__.py", line 653, in retrieve_run_report
    file=file)
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/enasearch/__init__.py", line 633, in retrieve_filereport
    check_returnable_fields(fields.split(","), result)
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/enasearch/__init__.py", line 136, in check_returnable_fields
    raise ValueError(err_str)
ValueError: The field wrong_name is not a returnable field for result read_run

click seems to need string message

The current code causes a stack trace for some calls, eg.

enasearch retrieve_taxons --ids 'Elusimicrobia:phylum' --result 'coding_release' --display 'fasta' --download 'fasta'
(probably the call is missing something anyway .. I'm just starting to explore this tool)

click wants to call encode() on the exception message (if python2 is used which is the case in Galaxy installations of enasearch):

https://github.com/pallets/click/blob/132d66ac7d69a2b0e8f218a4cd39e50be3e0bcb9/click/exceptions.py#L15

but an Exception is used as parameter:

raise click.ClickException(e)

No example usage

The repository nor documentation provides an example of how the software can be used to solve a real world analysis problem.

Support -h as well as --help

Would it be easy to support the short option -h as an alias for --help too?

enasearch -h
Error: no such option: -h

Write TSV to stdout when --result is not provided

It seems to output it in internal Python stringified manner.
I would expect STDOUT to behave the same as --result FILENAME.
This would be very convenient for Unix pipes.

% enasearch retrieve_run_report --accession PRJNA275974 --fields run_accession --fields fastq_ftp

# expected TSV but got __string__

('run_accession\tfastq_ftp\n'
 'SRR1922792\t'
 'ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/002/SRR1922792/SRR1922792_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/002/SRR1922792/SRR1922792_2.fastq.gz\n'
 'SRR1922793\t'
 'ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/003/SRR1922793/SRR1922793_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/003/SRR1922793/SRR1922793_2.fastq.gz\n'
 'SRR1922794\t'
 'ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/004/SRR1922794/SRR1922794_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/004/SRR1922794/SRR1922794_2.fastq.gz\n'
 'SRR1922795\t'
 'ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/005/SRR1922795/SRR1922795_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/005/SRR1922795/SRR1922795_2.fastq.gz\n'
 'SRR1922796\t'
 'ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/006/SRR1922796/SRR1922796_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/006/SRR1922796/SRR1922796_2.fastq.gz\n'
 'SRR1922797\t'
 'ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/007/SRR1922797/SRR1922797_1.fastq.gz;ftp.sra.ebi.ac.uk/vol1/fastq/SRR192/007/SRR1922797/SRR1922797_2.fastq.gz\n')

The functionality documentation is not clear

While every command is written out in the documentation, the functionality of the command, what the output is or does, or what the options are, are not described.

Example:
screen shot 2017-10-02 at 11 59 19

Here it would be helpful to describe what an 'analysis field' is, and what the [OPTIONS] are. What can I use a list of analysis fields for?

requests.exceptions.HTTPError: 500 Server Error

>>> enasearch.retrieve_run_report(accession="ERR1558694", fields="run_accession,instrument_platform,library_strategy,read_count,fastq_ftp")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/database/lib/python2.7/site-packages/enasearch/__init__.py", line 737, in retrieve_run_report
    file=file)
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/database/lib/python2.7/site-packages/enasearch/__init__.py", line 721, in retrieve_filereport
    return request_url(url, "text", file)
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/database/lib/python2.7/site-packages/enasearch/__init__.py", line 375, in request_url
    r.raise_for_status()
  File "/ebio/abt3_projects/software/dev/miniconda3_dev/envs/database/lib/python2.7/site-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://www.ebi.ac.uk/ena/portal/api/filereport

conda env

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
backports                 1.0                      py27_1    conda-forge
biopython                 1.76             py27h516909a_0    conda-forge
blas                      1.0                         mkl
brotlipy                  0.7.0           py27h516909a_1000    conda-forge
ca-certificates           2021.7.5             h06a4308_1
certifi                   2020.6.20          pyhd3eb1b0_3
cffi                      1.14.0           py27he30daa8_1
chardet                   3.0.4           py27h8c360ce_1006    conda-forge
click                     8.0.0              pyhd3eb1b0_0
configparser              4.0.2                    py27_0
contextlib2               0.6.0.post1                py_0    conda-forge
cryptography              2.8              py27h2c19f6a_2    conda-forge
dicttoxml                 1.7.4                    py27_0    conda-forge
enasearch                 0.2.2                    py27_1    bioconda
enum34                    1.1.10           py27h8c360ce_1    conda-forge
flake8                    3.8.3              pyh9f0ad1d_0    conda-forge
idna                      2.10               pyh9f0ad1d_0    conda-forge
importlib-metadata        1.5.0            py27h8c360ce_1    conda-forge
intel-openmp              2021.3.0          h06a4308_3350
ipaddress                 1.0.23                     py_0    conda-forge
libffi                    3.3                  h58526e2_2    conda-forge
libgcc-ng                 11.1.0               hc902ee8_8    conda-forge
libgfortran-ng            7.5.0               h14aa051_19    conda-forge
libgfortran4              7.5.0               h14aa051_19    conda-forge
libgomp                   11.1.0               hc902ee8_8    conda-forge
libstdcxx-ng              11.1.0               h56837e0_8    conda-forge
mccabe                    0.6.1                    py27_0    conda-forge
mkl                       2020.2                      256
mkl-service               2.3.0            py27he904b0f_0
mkl_fft                   1.0.15           py27ha843d7b_0
mkl_random                1.1.0            py27hd6b4f25_0
more-itertools            5.0.0                      py_0    conda-forge
ncurses                   6.2                  h58526e2_4    conda-forge
numpy                     1.16.6           py27hbc911f0_0
numpy-base                1.16.6           py27hde5b4d6_0
openssl                   1.1.1k               h7f98852_1    conda-forge
pathlib2                  2.3.5            py27h8c360ce_1    conda-forge
pip                       20.1.1             pyh9f0ad1d_0    conda-forge
pycodestyle               2.6.0              pyh9f0ad1d_0    conda-forge
pycparser                 2.20               pyh9f0ad1d_2    conda-forge
pyflakes                  2.2.0              pyh9f0ad1d_0    conda-forge
pyopenssl                 20.0.1             pyhd3eb1b0_1
pysocks                   1.7.1            py27h8c360ce_1    conda-forge
python                    2.7.18               ha1903f6_2
python_abi                2.7                    1_cp27mu    conda-forge
readline                  8.1                  h46c0cb4_0    conda-forge
requests                  2.25.1             pyhd3deb0d_0    conda-forge
scandir                   1.10.0           py27hdf8410d_1    conda-forge
setuptools                44.0.0                   py27_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sqlite                    3.36.0               h9cd32fc_0    conda-forge
tk                        8.6.10               hed695b0_1    conda-forge
urllib3                   1.26.6             pyhd8ed1ab_0    conda-forge
wheel                     0.37.0             pyhd8ed1ab_1    conda-forge
xmltodict                 0.12.0                     py_0    conda-forge
zipp                      1.0.0                      py_0    conda-forge
zlib                      1.2.11            h516909a_1010    conda-forge

ValueError: A boolean value must be only 'true' or 'false'

I am not sure what is wrong here:

% enasearch retrieve_data --ids ERR924889  --display report

Traceback (most recent call last):
  File "/home/linuxbrew/.linuxbrew/bin/enasearch", line 11, in <module>
    load_entry_point('enasearch==0.0.5', 'console_scripts', 'enasearch')()
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/enasearch/__main__.py", line 251, in retrieve_data
    header=header)
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/enasearch/__init__.py", line 407, in retrieve_data
    header=header)
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/enasearch/__init__.py", line 365, in build_retrieve_url
    check_boolean(expanded)
  File "/home/linuxbrew/.linuxbrew/opt/python3/lib/python3.6/site-packages/enasearch/__init__.py", line 281, in check_boolean
    raise ValueError(err_str)
ValueError: A boolean value must be only 'true' or 'false'

Can not have only one field in for 'retrieve_filereport'

Run:

$ enasearch retrieve_analysis_report --accession "ERZ009929" --file retrieve_analysis_report_1 --fields "analysis_accession"

Error:

File "/Users/bebatut/miniconda3/envs/enasearch/lib/python3.5/site-packages/enasearch/__init__.py", line 618, in retrieve_filereport
    check_returnable_fields(fields.split(","), result)
AttributeError: 'tuple' object has no attribute 'split'

Conda installation only allows Python 2

When I try to install enasearch through conda, it asks me to downgrade my environment to Python 2

$ conda install -c bioconda enasearch
Solving environment: done

## Package Plan ##

  environment location: /home/giordano-n/.miniconda3/envs/test_enasearch

  added / updated specs: 
    - enasearch


The following NEW packages will be INSTALLED:

    asn1crypto:     0.24.0-py27_0                    
    biopython:      1.68-py27_0           bioconda   
    blas:           1.0-mkl                          
    cffi:           1.11.5-py27he75722e_1            
    chardet:        3.0.4-py27_1                     
    click:          7.0-py27_0                       
    configparser:   3.5.0-py27_0                     
    cryptography:   2.3.1-py27h1ba5d50_2             
    dicttoxml:      1.7.4-py27_0          conda-forge
    enasearch:      0.2.2-py27_1          bioconda   
    enum34:         1.1.6-py27_1                     
    flake8:         3.6.0-py27_0                     
    freetype:       2.9.1-h8a8886c_1                 
    idna:           2.7-py27_0                       
    intel-openmp:   2019.0-118                       
    ipaddress:      1.0.22-py27_0                    
    jpeg:           9b-h024ee3a_2                    
    libgfortran-ng: 7.3.0-hdf63c60_0                 
    libpng:         1.6.35-hbc83047_0                
    libtiff:        4.0.9-he85c1e1_2                 
    mccabe:         0.6.1-py27_1                     
    mkl:            2019.0-118                       
    mkl_fft:        1.0.6-py27h7dd41cf_0             
    mkl_random:     1.0.1-py27h4414c95_1             
    mmtf-python:    1.0.2-py27_0          bioconda   
    msgpack-python: 0.5.6-py27h6bb024c_1             
    numpy:          1.15.4-py27h1d66e8a_0            
    numpy-base:     1.15.4-py27h81de0dd_0            
    olefile:        0.46-py27_0                      
    pillow:         5.3.0-py27h34e0f95_0             
    pycodestyle:    2.4.0-py27_0                     
    pycparser:      2.19-py27_0                      
    pyflakes:       2.0.0-py27_0                     
    pyopenssl:      18.0.0-py27_0                    
    pysocks:        1.6.8-py27_0                     
    reportlab:      3.5.9-py27he686d34_0             
    requests:       2.20.0-py27_0                    
    six:            1.11.0-py27_1                    
    urllib3:        1.23-py27_0                      
    xmltodict:      0.9.2-py27_0          bioconda   

The following packages will be UPDATED:

    certifi:        2018.10.15-py37_0                 --> 2018.10.15-py27_0
    pip:            18.1-py37_0                       --> 18.1-py27_0      
    setuptools:     40.5.0-py37_0                     --> 40.5.0-py27_0    
    wheel:          0.32.2-py37_0                     --> 0.32.2-py27_0    

The following packages will be DOWNGRADED:

    python:         3.7.1-h0371630_3                  --> 2.7.15-h9bab390_4

Proceed ([y]/n)? 

Installing manually via python3 setup.py installseems to work fine.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.