Giter Site home page Giter Site logo

wwood / kingfisher-download Goto Github PK

View Code? Open in Web Editor NEW
233.0 8.0 37.0 10.16 MB

Easier download/extract of FASTA/Q read data and metadata from the ENA, NCBI, AWS or GCP.

Home Page: https://wwood.github.io/kingfisher-download

License: GNU General Public License v3.0

Python 99.08% Shell 0.92%
sra aspera-client fastq-files

kingfisher-download's Introduction

Welcome

Kingfisher is a fast and flexible program for procurement of sequence files (and their annotations) from public data sources, including the European Nucleotide Archive (ENA), NCBI SRA, Amazon AWS and Google Cloud. It's input is one or more "Run" accessions e.g. DRR001970, or a BioProject accessions e.g. PRJNA621514 or SRP260223.

For more documentation, see https://wwood.github.io/kingfisher-download/

Kingfisher logo

kingfisher-download's People

Contributors

jamiecfreeman avatar jolespin avatar rhysnewell avatar shodgkins avatar wwood avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

kingfisher-download's Issues

Feature request: allow downloading from GSA

Since there are data deposited in GSA but not in SRA/ENA, it would be useful to add the option to download from GSA.
I was able to use Aspera for downloading from GSA using a command like:
./ascp -i [/path/to/the/key/file] -P33001 -QT -l100m -k1 [email protected]:/gsa3/<data set ID>/<run ID> [/path/to/your/local/directory/]

No FTP download URLs found

Hi,

Thank you for this wonderful tool, it has been really helpful. I have been using this tool for a while but over the past couple days, no accessions seem to be working. I have tried this with several accessions that have previously worked with no luck.

python3 ./scripts/ena-fast-download.py SRR5012117
08/09/2020 08:43:12 AM INFO: Using aspera ssh key file: $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh
08/09/2020 08:43:12 AM INFO: Querying ENA for FTP paths for SRR5012117..
08/09/2020 08:43:13 AM WARNING: No FTP download URLs found for run SRR5012117, cannot continue

I suspect this is not a bug is not with the tool but on the ENA side but thought I should open an issue just in case.

Method ena-ascp ena-ftp failed

Hi
I would like to try Kingfisher package with a bunch of ENA and SRA files.
I installed aspera connect and kingfisher. When I run it I get following error

python ~/bin/kingfisher get --run-identifiers-list acession_ids-ena.txt \ --download-methods ena-ftp \ --download-threads 30 \ --extraction-threads 30 \ -> [OptionHandlerImpl.cc:184] errorCode=1 max-connection-per-server must be between 1 and 16. 用法: -x, --max-connection-per-server=N Maximum number of connections to a single server per download.
Possible values: 1-16 Default: 1 Tags: #basic, #http, #ftp 04/27/2022 11:10:23 AM WARNING: Method ena-ftp failed, error was Command 'aria2c -x30 -o ERR1755873_1.fastq.gz 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR175/003/ERR1755873/ERR1755873_1.fastq.gz'' returned non-zero exit status 28. 04/27/2022 11:10:23 AM WARNING: Method ena-ftp failed

If i select ena-ascp i get the following error

WARNING: Error downloading from ENA with ASCP: Command ascp -T -l 300m -P33001 -k 2 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh [email protected]:/vol1/fastq/ERR175/003/ERR1755873/ERR1755873_1.fastq.gz . returned non-zero exit status 127. STDERR was: b'bash: ascp: \xe6\x9c\xaa\xe6\x89\xbe\xe5\x88\xb0\xe5\x91\xbd\xe4\xbb\xa4\n'STDOUT was: b'' 04/27/2022 11:20:52 AM WARNING: Method ena-ascp failed

Also how to provide multiple download methods. I tried -m ena-ascp,ena-ftp,prefetch and it throws

error: argument -m/--download_methods/--download-methods: invalid choice: error

ASCP FAILED

Recently (from last 2weeks), the ASPERA server for ENA seemed shut down ?
When using kingfisher with -m ena-ascp it always gives an error Session Stop (Error: Server aborted session: No such file or directory) , I' not sure if this is my network problem or ENA's problem?

EDIT: when using -m ena-ftp it will work, but I still don't know why ASPERA not work?

No FTP download URLs found for run error

Do you know what could be causing this issue? I haven't updated my script since the last time I ran this and it worked perfectly fine. Could be an issue with my aspera key?

(genome_adaptation_env) -bash-4.1$ ena-fast-download.py ERR598955
09/01/2020 03:52:41 PM INFO: Using aspera ssh key file: $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh
09/01/2020 03:52:41 PM INFO: Querying ENA for FTP paths for ERR598955..
/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/genome_adaptation_env/bin/ena-fast-download.py:67: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
logging.warn("No FTP download URLs found for run {}, cannot continue".format(run_id))
09/01/2020 03:52:42 PM WARNING: No FTP download URLs found for run ERR598955, cannot continue

Help on getting this to work

Sorry for my lack of knowledge on this but I have had trouble getting an ssh key on my Linux server.

Can you give a step by step tutorial on how to get this to run on a remote Linux server?

I’ve installed the aspera client via conda but I’m not sure how to proceed. I would love to use this tool because fastq dump hasn’t been working for me, prefetch takes forever, and the ENA tutorial is extremely confusing for someone who mainly programs in Python.

Any help would be greatly appreciated.

Does not work with python3.9

In python3.9, getchildren() was removed. This causes the following error when trying the annotate function

$ kingfisher annotate -r ERR1739691 --debug
08/10/2021 05:45:29 PM INFO: Kingfisher v0.0.1-dev
08/10/2021 05:45:29 PM INFO: Querying NCBI esearch for 1 distinct accessions e.g. ERR1739691
08/10/2021 05:45:29 PM DEBUG: Starting new HTTPS connection (1): eutils.ncbi.nlm.nih.gov:443
08/10/2021 05:45:29 PM DEBUG: https://eutils.ncbi.nlm.nih.gov:443 "GET /entrez/eutils/esearch.fcgi?db=sra&term=ERR1739691%5Baccn%5D&tool=kingfisher&email=kingfisher%40github.com&retmax=1000 HTTP/1.1" 200 None
Traceback (most recent call last):
  File "/opt/homebrew/bin/kingfisher", line 275, in <module>
    main()
  File "/opt/homebrew/bin/kingfisher", line 261, in main
    kingfisher.annotate(
  File "/opt/homebrew/Cellar/kingfisher-download/0.0.1-dev/bin/../kingfisher/__init__.py", line 438, in annotate
    metadata = SraMetadata().efetch_sra_from_accessions(run_identifiers)
  File "/opt/homebrew/Cellar/kingfisher-download/0.0.1-dev/bin/../kingfisher/sra_metadata.py", line 132, in efetch_sra_from_accessions
    ids = list(set([c.text for c in id_list_node.getchildren()]))
AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'getchildren'

see: https://docs.python.org/3.8/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element.getchildren

AttributeError: 'float' object has no attribute 'split'

Dear @wwood
I am downloading batchs of SRA with kingfisher.
But there is always error like:

03/09/2023 11:30:56 AM INFO: Kingfisher v0.1.2
03/09/2023 11:30:56 AM INFO: Attempting download method ena-ascp for run ERR1346134 ..
03/09/2023 11:30:56 AM INFO: Using aspera ssh key file: /home/data/user01/Software/mambaforge/envs/sra_download/lib/python3.11/site-packages/kingfisher/data/asperaweb_id_dsa.openssh
03/09/2023 11:30:56 AM INFO: Querying ENA for FTP paths for ERR1346134..
Traceback (most recent call last):
  File "/home/user01/WORKSPACE/Software/mambaforge/envs/sra_download/bin/kingfisher", line 299, in <module>
    main()
  File "/home/user01/WORKSPACE/Software/mambaforge/envs/sra_download/bin/kingfisher", line 244, in main
    kingfisher.download_and_extract(
  File "/home/user01/WORKSPACE/Software/mambaforge/envs/sra_download/lib/python3.11/site-packages/kingfisher/__init__.py", line 47, in download_and_extract
    download_and_extract_one_run(run, **kwargs)
  File "/home/user01/WORKSPACE/Software/mambaforge/envs/sra_download/lib/python3.11/site-packages/kingfisher/__init__.py", line 306, in download_and_extract_one_run
    result = EnaDownloader().download_with_aspera(run_identifier, '.',
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user01/WORKSPACE/Software/mambaforge/envs/sra_download/lib/python3.11/site-packages/kingfisher/ena.py", line 71, in download_with_aspera
    report = self.get_ftp_download_urls(run_id)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user01/WORKSPACE/Software/mambaforge/envs/sra_download/lib/python3.11/site-packages/kingfisher/ena.py", line 47, in get_ftp_download_urls
    ftp_urls = row['fastq_ftp'].split(';')
               ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'float' object has no attribute 'split'

It seems to be ENA Querying API issue.
How could I solve it ?

Best wish
Johnsonz

SRA API change

I believe the SRA API has changed in the past few weeks and am now getting the following error whenever I use ena-fast-download:

ena-fast-download.py:60: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
logging.warn("No FTP download URLs found for run {}, cannot continue".format(run_id))

No FTP download URLs found for run ERRXXXXXX, cannot continue

Any chances for an update ?

ena-ascp fails with ascp v4.2

Hi,
I've been having troubles getting to run ena-ascp due to changes in the 4.2 version of Aspera Connect (and maybe all 4.x). Basically, the needed ssh_key, asperaweb_id_dsa.openssh, is not bundled with v4.2 (tested in 3 different computers/servers). This causes ascp to fail:

$ ascp -T -l 300m -P33001 -k 2 -i /home/z3382651/.aspera/connect/etc/asperaweb_id_dsa.openssh [email protected]:/vol1/fastq/ERR191/005/ERR1916915/ERR1916915_1.fastq.gz .
ascp: Private key file not found at path /home/z3382651/.aspera/connect/etc/asperaweb_id_dsa.openssh, exiting.

Session Stop  (Error: Private key file not found at path /home/z3382651/.aspera/connect/etc/asperaweb_id_dsa.openssh)

These are the contents of ~/.aspera/connect/etc/ with the current 4.2 version:

$ ls -a
.   aspera.conf         asperadrive.sample.conf  aspera_tokenauth_id_rsa  aspera_web_key.pem
..  asperaconnect.path  aspera-license           aspera_web_cert.pem      curl-ca-bundle.crt

As a comparison, this are the contents of the same folder with v3.9.8:

$ ls -a
.   aspera.conf         aspera-license           aspera_web_cert.pem       asperaweb_id_dsa.putty  curl-ca-bundle.crt
..  asperaconnect.path  aspera_tokenauth_id_rsa  asperaweb_id_dsa.openssh  aspera_web_key.pem

I was finally able to get an older version (3.9.8) thanks to a link in MakeTheBrainHappy's kingfisher-cloud: https://download.asperasoft.com/download/sw/connect/3.9.8/ibm-aspera-connect-3.9.8.176272-linux-g2.12-64.tar.gz.

I found no documentation or whatsoever about the missing asperaweb_id_dsa.openssh, and I spent way more time than I expected trying to find any. I thought it'd be a good idea leaving notice of this in case anyone else runs into the same issue.

Cheers,

Add `extern` to PyPI dependencies

I install kingfisher via pip but I got the follow error:

(VEBA-preprocess_env) [jespinoz@login01 Fastq]$ kingfisher get -h
Traceback (most recent call last):
  File "/expanse/projects/jcl110/anaconda3/envs/VEBA-preprocess_env/bin/kingfisher", line 17, in <module>
    import kingfisher
  File "/expanse/projects/jcl110/anaconda3/envs/VEBA-preprocess_env/lib/python3.7/site-packages/kingfisher/__init__.py", line 8, in <module>
    import extern
ModuleNotFoundError: No module named 'extern'

I then install extern via pip and it worked:

(VEBA-preprocess_env) [jespinoz@login01 Fastq]$ python --version
Python 3.7.11
(VEBA-preprocess_env) [jespinoz@login01 Fastq]$ pip install extern
Collecting extern
  Downloading extern-0.4.1.tar.gz (6.3 kB)
  Preparing metadata (setup.py) ... done
Building wheels for collected packages: extern
  Building wheel for extern (setup.py) ... done
  Created wheel for extern: filename=extern-0.4.1-py3-none-any.whl size=5954 sha256=ee3dafe4cbb68115ef9d60666dedc49fe7696886cb99dd7676e71a4e7015c330
  Stored in directory: /home/jespinoz/.cache/pip/wheels/7b/1c/5f/f408ede1f40464be5726fc8b2fad29923b6f301ea0b29aabdc
Successfully built extern
Installing collected packages: extern
Successfully installed extern-0.4.1
(VEBA-preprocess_env) [jespinoz@login01 Fastq]$ kingfisher get -h
usage: kingfisher get [-h] [--debug] [--version] [--quiet] --run-identifier
                      RUN_IDENTIFIER -m
                      {aws-http,prefetch,aws-cp,gcp-cp,ena-ascp,ena-ftp}
                      [{aws-http,prefetch,aws-cp,gcp-cp,ena-ascp,ena-ftp} ...]
                      [--download-threads DOWNLOAD_THREADS]
                      [-t EXTRACTION_THREADS]
                      [--output-format-possibilities {sra,fastq,fastq.gz,fasta,fasta.gz} [{sra,fastq,fastq.gz,fasta,fasta.gz} ...]]
                      [--force] [--unsorted] [--stdout]
                      [--gcp-project GCP_PROJECT]
                      [--gcp-user-key-file GCP_USER_KEY_FILE]
                      [--aws-user-key-id AWS_USER_KEY_ID]
                      [--aws-user-key-secret AWS_USER_KEY_SECRET]
                      [--allow-paid] [--allow-paid-from-gcp]
                      [--allow-paid-from-aws] [--ascp-ssh-key ASCP_SSH_KEY]
                      [--ascp-args ASCP_ARGS]

Feature Request: ignore error term

At kingfisher get --run-identifiers-list condition, if one term error, the software will exit. Can you add a parameter to control ignore this error, and continue downloading other terms?

Feature Request: support sralite format.

SRA data are available either with full base quality scores (SRA Normalized Format), or with simplified quality scores (SRA Lite), depending on user preference. Both formats can be streamed on demand to the same filetypes (fastq, sam, etc.), so they are both compatible with existing workflows and applications that expect quality scores. However, the SRA Lite format is much smaller, enabling a reduction in storage footprint and data transfer times, allowing dumps to complete faster. The SRA toolkit defaults to using the SRA Normalized Format that includes full, per-base quality scores, but users can opt to use simplified quality scores in their analysis by requesting the SRA Lite version to save time on their data transfers.

Can you please add download parameters to support the smaller and more concise sralite format?

Failed to authenticate

I got error below.
How could I do that?


python ena-fast-download.py ERR962744
03/12/2021 02:27:29 PM INFO: Using aspera ssh key file: $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh
03/12/2021 02:27:29 PM INFO: Querying ENA for FTP paths for ERR962744..
03/12/2021 02:28:13 PM INFO: Downloading 1 FTP read set(s): ftp.sra.ebi.ac.uk/vol1/fastq/ERR962/ERR962744/ERR962744.fastq.gz
03/12/2021 02:28:13 PM INFO: Running command: ascp -T -l 300m -P33001  -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh [email protected]:/vol1/fastq/ERR962/ERR962744/ERR962744.fastq.gz .
ascp: failed to authenticate, exiting.

Session Stop  (Error: failed to authenticate)
Traceback (most recent call last):
  File "ena-fast-download.py", line 115, in <module>
    subprocess.check_call(cmd,shell=True)
  File "/root/anaconda2/lib/python2.7/subprocess.py", line 190, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'ascp -T -l 300m -P33001  -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh [email protected]:/vol1/fastq/ERR962/ERR962744/ERR962744.fastq.gz .' returned non-zero exit status 1

kingfisher executable file not found in $PATH in Singularity

Hi Ben,

I have been using it through conda and docker and had no problems but recently moved over to a cluster that predominately uses singularity. I am not too familiar with singularity so this is probably just a problem on my end but any insight would be super helpful.

module load singularity/3.11.4
singularity pull --name kingfisher:0.3.0.sif docker://wwood/kingfisher:0.3.0

singularity exec kingfisher:0.3.0.sif kingfisher
FATAL: "kingfisher": executable file not found in $PATH

I've replicated this with all versions of kingfisher available on DockerHub

Here is the output of singularity inspect:

singularity inspect kingfisher:0.3.0.sif org.label-schema.build-arch: amd64 org.label-schema.build-date: Friday_22_September_2023_11:7:54_AWST org.label-schema.schema-version: 1.0 org.label-schema.usage.singularity.deffile.bootstrap: docker org.label-schema.usage.singularity.deffile.from: wwood/kingfisher:0.3.0 org.label-schema.usage.singularity.version: 3.11.4 org.opencontainers.image.created: 2022-11-29T15:51:20.245Z org.opencontainers.image.description: Rapid builds of small Conda-based containers using micromamba. org.opencontainers.image.licenses: Apache-2.0 org.opencontainers.image.revision: 4427b199720b9962a7c135fb159fbce50e1ba7b8 org.opencontainers.image.source: https://github.com/mamba-org/micromamba-docker org.opencontainers.image.title: micromamba-docker org.opencontainers.image.url: https://github.com/mamba-org/micromamba-docker org.opencontainers.image.version: latest

Thanks for the great tool.

ENA-FTP download aborted

Thanks for this great tool, I've been using it frequently to pull data from repositories.

However, since today kingfisher is not resolving ena-ftp downloads correctly anymore for some reason. I'm using docker: docker run -v 'pwd':/data wwood/kingfisher:0.3.1 get -p SRP098789 -m ena-ftp

results in:

10/16/2023 08:55:26 AM INFO: Kingfisher v0.3.1
10/16/2023 08:55:28 AM INFO: Attempting download method ena-ftp for run SRR5350745 ..
10/16/2023 08:55:28 AM INFO: Querying ENA for FTP paths for SRR5350745..
10/16/2023 08:55:28 AM INFO: Downloading ftp.sra.ebi.ac.uk/vol1/fastq/SRR535/005/SRR5350745/SRR5350745.fastq.gz ..

10/16 08:55:28 [NOTICE] Downloading 1 item(s)

10/16 08:55:28 [ERROR] CUID#7 - Download aborted. URI=ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR535/005/SRR5350745/SRR5350745.fastq.gz
Exception: [AbstractCommand.cc:351] errorCode=3 URI=ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR535/005/SRR5350745/SRR5350745.fastq.gz
-> [FtpNegotiationCommand.cc:318] errorCode=3 Resource not found

10/16 08:55:28 [NOTICE] Download GID#89d113d914f82d74 not complete: /data/SRR5350745.fastq.gz

Download Results:
gid |stat|avg speed |path/URI
======+====+===========+=======================================================
89d113|ERR | 0B/s|/data/SRR5350745.fastq.gz

Status Legend:
(ERR):error occurred.

aria2 will resume download if the transfer is restarted.
If there are any errors, then see the log file. See '-l' option in help/man page for details.
10/16/2023 08:55:28 AM WARNING: Method ena-ftp failed, error was Command 'aria2c -x8 -o SRR5350745.fastq.gz 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR535/005/SRR5350745/SRR5350745.fastq.gz'' returned non-zero exit status 3.
10/16/2023 08:55:28 AM WARNING: Method ena-ftp failed
Traceback (most recent call last):
File "/tmp/kingfisher-download/bin/kingfisher", line 309, in
main()
File "/tmp/kingfisher-download/bin/kingfisher", line 254, in main
kingfisher.download_and_extract(
File "/tmp/kingfisher-download/bin/../kingfisher/init.py", line 52, in download_and_extract
download_and_extract_one_run(run, **kwargs)
File "/tmp/kingfisher-download/bin/../kingfisher/init.py", line 338, in download_and_extract_one_run
raise Exception("No more specified download methods, cannot continue")
Exception: No more specified download methods, cannot continue

Ena-ascp download gives 'Deprecated peer license' error

It seems ascp downloads for ENA are not working anymore. At least at my end. I have tried with aspera connect version 4.1 and 4.2. This is what I get:

kingfisher get -m ena-ascp -r SRR616044
03/13/2024 03:53:13 PM INFO: Kingfisher v0.4.1
03/13/2024 03:53:13 PM INFO: Attempting download method ena-ascp for run SRR616044 ..
03/13/2024 03:53:13 PM INFO: Using aspera ssh key file: /usr/local/lib/python3.11/site-packages/kingfisher/data/asperaweb_id_dsa.openssh
03/13/2024 03:53:13 PM INFO: Querying ENA for FTP paths for SRR616044..
03/13/2024 03:53:13 PM INFO: Downloading 1 FTP read set(s): ftp.sra.ebi.ac.uk/vol1/fastq/SRR616/SRR616044/SRR616044.fastq.gz
03/13/2024 03:53:13 PM INFO: Running command: ascp -T -l 300m -P33001 -k 2 -i /usr/local/lib/python3.11/site-packages/kingfisher/data/asperaweb_id_dsa.openssh [email protected]:/vol1/fastq/SRR616/SRR616044/SRR616044.fastq.gz .
03/13/2024 03:53:14 PM WARNING: Error downloading from ENA with ASCP: Command ascp -T -l 300m -P33001 -k 2 -i /usr/local/lib/python3.11/site-packages/kingfisher/data/asperaweb_id_dsa.openssh [email protected]:/vol1/fastq/SRR616/SRR616044/SRR616044.fastq.gz . returned non-zero exit status 1.
STDERR was: b''STDOUT was: b'\r\nSession Stop  (Error: Failure processing peer license: Deprecated peer license)\n'
03/13/2024 03:53:14 PM WARNING: Method ena-ascp failed
Traceback (most recent call last):
  File "/usr/local/bin/kingfisher", line 323, in <module>
    main()
  File "/usr/local/bin/kingfisher", line 266, in main
    kingfisher.download_and_extract(
  File "/usr/local/lib/python3.11/site-packages/kingfisher/__init__.py", line 72, in download_and_extract
    download_and_extract_one_run(run, **kwargs)
  File "/usr/local/lib/python3.11/site-packages/kingfisher/__init__.py", line 365, in download_and_extract_one_run
    raise Exception("No more specified download methods, cannot continue")
Exception: No more specified download methods, cannot continue

Is this a general error?

Get .sra files without convertion

Hi,
Kingfisher is a wonderful tool for download and it really helps me alot.
I have a question when using this tool : what should I do if I just want .sra files but not fastq ?
I find that Kingfisher will convert .sra files into fastq automatically when I use the following code

kingfisher get -r SRR7692286 -m aws-http prefetch -t 16

In your tutorial,it mentions that 'the default parameter of --output-format-possibilities is fastq, shall I change it to other format?

Thank you inadvance !

Unexpected behavior when downloading fastq using SRA identifier

https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&page_size=10&acc=SRR13615821&display=metadata
image

I ran kingfisher and it pulled 3 fastq files for 1 record. A single ended and 2 paired end files.

(base) [jespinoz@exp-15-28 split_reads]$ kingfisher --version
0.3.1

ID=SRR13615821
kingfisher get -r ${ID} -m aws-http -f fastq.gz

I thought that maybe one was interleaved but the read sizes didn't match up:

(base) [jespinoz@exp-15-28 Fastq]$ seqkit stats SRR13615821_1.fastq.gz SRR13615821_2.fastq.gz split_reads/SRR13615821.fastq.gz
processed files:  3 / 3 [======================================] ETA: 0s. done
file                              format  type   num_seqs        sum_len  min_len  avg_len  max_len
SRR13615821_1.fastq.gz            FASTQ   DNA     808,228    197,172,014       35      244      301
SRR13615821_2.fastq.gz            FASTQ   DNA     808,228    199,461,172       21    246.8      301
split_reads/SRR13615821.fastq.gz  FASTQ   DNA   5,860,790  1,438,979,322       35    245.5      301

The above files were what were downloaded by kingfisher.

Note: I moved SRR13615821.fastq.gz into a separate folder to split the reads but BBSuite said there were no pairs:

base) [jespinoz@exp-15-28 split_reads]$ repair.sh in=SRR13615821.fastq.gz out1=SRR13615821_1.fastq.gz out2=SRR13615821_2.fastq.gz
java -ea -Xmx84979m -cp /expanse/projects/jcl110/miniconda3/opt/bbmap-39.01-1/current/ jgi.SplitPairsAndSingles rp in=SRR13615821.fastq.gz out1=SRR13615821_1.fastq.gz out2=SRR13615821_2.fastq.gz
Executing jgi.SplitPairsAndSingles [rp, in=SRR13615821.fastq.gz, out1=SRR13615821_1.fastq.gz, out2=SRR13615821_2.fastq.gz]

Set INTERLEAVED to false
Started output stream.

Input:                  	5860790 reads 		1438979322 bases.
Result:                 	5860790 reads (100.00%) 	1438979322 bases (100.00%)
Pairs:                  	0 reads (0.00%) 	0 bases (0.00%)
Singletons:             	5860790 reads (100.00%) 	1438979322 bases (100.00%)

Time:                         	36.897 seconds.
Reads Processed:       5860k 	158.84k reads/sec
Bases Processed:       1438m 	39.00m bases/sec

The above is me trying to split the reads manually.

Do you know what could be happening?

install kingfisher on m1 MacBook Pro

when I use the code conda env create -n kingfisher -f kingfisher.yml on my m1 Mac, the error occurs
Solving environment: failed

ResolvePackageNotFound:

  • aria2[version='>=1.36.0']
  • sra-tools
  • sracat
    I can't install this three packages using conda.

How split files in Kingfisher get download process?

Thanks for your wonderful contribution.
I have some problems, for example, when I download single-cell sra data which were pair-ends, kingfisher will automatically convert it into a single fastq file, while I need to split it into three fastq files.
However, I canno't find the corresponding setting parameters in kingfisher with spilt-files that fastq-dump provided, could you help me to solve this problem?

Sincerely

Installing with kingfisher.yaml leads to aria2c issue + fix

When installing w/ provided env yaml, leads to the following trace:

12/16/2021 03:54:06 AM INFO: Kingfisher v0.0.1-dev
12/16/2021 03:54:06 AM INFO: Attempting download method aws-http for run SRR5036382 ..
12/16/2021 03:54:07 AM INFO: Found ODP link https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR5036382/SRR5036382
12/16/2021 03:54:07 AM INFO: Downloading .SRA file from AWS Open Data Program HTTP link using aria2c ..
aria2c: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory

then running:

ldd $(which aria2c) 
        linux-vdso.so.1 =>  (0x00007ffff3bdf000)
        libz.so.1 => .../kingfisher/bin/../lib/libz.so.1 (0x00007fd57e252000)
        libxml2.so.2 => .../kingfisher/bin/../lib/libxml2.so.2 (0x00007fd57e0e9000)
        libsqlite3.so.0 => .../kingfisher/bin/../lib/libsqlite3.so.0 (0x00007fd57df1d000)
        libssl.so.1.0.0 => not found
        libcrypto.so.1.0.0 => not found
        libssh2.so.1 => .../kingfisher/bin/../lib/libssh2.so.1 (0x00007fd57e08b000)
        libcares.so.2 => .../kingfisher/bin/../lib/libcares.so.2 (0x00007fd57e071000)
        libstdc++.so.6 => .../kingfisher/bin/../lib/libstdc++.so.6 (0x00007fd57dd72000)

the following yaml/conda correction lead to a solution in my case, since openssl==3.0 using kingfisher.yaml:
conda install -c bioconda openssl=1.0

Posting in case any else has this issue come up.

Failed to Authenticate

Having the following error when running the software on a particular ENA dataset:

File "../../software/ena-fast-download/ena-fast-download.py", line 63, in
subprocess.check_call(cmd,shell=True)
File "/home/ec2-user/miniconda3/lib/python3.7/subprocess.py", line 347, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'ascp -QT -l 300m -P33001 -i $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh [email protected]:/vol1/fastq/SRR524/000/SRR5245600/SRR5245600.fastq.gz .' returned non-zero exit status 1.

Any ideas ?

kingfisher error

hello, wwood,

I used the kingfisher get some error. Could you help me, Thank you.

`kingfisher get -r ERR1739691 -m ena-ascp

07/17/2021 04:23:33 PM INFO: Attempting download method ena-ascp ..
07/17/2021 04:23:33 PM INFO: Using aspera ssh key file: $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh
07/17/2021 04:23:33 PM INFO: Querying ENA for FTP paths for ERR1739691..
Traceback (most recent call last):
File "/public4/home/sc56340/miniconda3/bin/kingfisher", line 261, in
main()
File "/public4/home/sc56340/miniconda3/bin/kingfisher", line 224, in main
kingfisher.download_and_extract(
File "/public4/home/sc56340/miniconda3/lib/python3.9/site-packages/kingfisher/init.py", line 213, in download_and_extract
result = EnaDownloader().download_with_aspera(run_identifier, '.',
File "/public4/home/sc56340/miniconda3/lib/python3.9/site-packages/kingfisher/ena.py", line 56, in download_with_aspera
ftp_urls = self.get_ftp_download_urls(run_id)
File "/public4/home/sc56340/miniconda3/lib/python3.9/site-packages/kingfisher/ena.py", line 20, in get_ftp_download_urls
text = extern.run("curl --silent '{}'".format(query_url))
File "/public4/home/sc56340/miniconda3/lib/python3.9/site-packages/extern/init.py", line 41, in run
raise ExternCalledProcessError(process, command)
extern.ExternCalledProcessError: Command curl --silent 'https://www.ebi.ac.uk/ena/portal/api/filereport?accession=ERR1739691&result=read_run&fields=fastq_ftp' returned non-zero exit status 1.
STDERR was: b STDOUT was: b
`

Allow multiple bioproject ids in input

Bioproject argument descriptions indicate multiple inputs are supported but only one is.

BioProject IDs number(s) to download/extract from e.g. PRJNA621514 or SRP260223

kingfisher annotate error when accession has a blank attribute

E.g. ERR2178284 has nothing in "# of Spots"

Command:

kingfisher annotate -r ERR2178284 -f tsv > kingfisher_metadata.tsv

Error:

10/18/2022 01:51:23 PM INFO: Kingfisher v0.0.1-dev
10/18/2022 01:51:23 PM INFO: Querying NCBI esearch for 1 distinct accessions e.g. ERR2178284
10/18/2022 01:51:25 PM INFO: Querying NCBI efetch for 1 distinct IDs e.g. 5212983
Traceback (most recent call last):
  File "/mnt/hpccs01/work/microbiome/conda/envs/kingfisher/bin/kingfisher", line 290, in <module>
    main()
  File "/mnt/hpccs01/work/microbiome/conda/envs/kingfisher/bin/kingfisher", line 275, in main
    kingfisher.annotate(
  File "/mnt/hpccs01/work/microbiome/sw/kingfisher-download/bin/../kingfisher/__init__.py", line 554, in annotate
    metadata = SraMetadata().efetch_sra_from_accessions(run_identifiers)
  File "/mnt/hpccs01/work/microbiome/sw/kingfisher-download/bin/../kingfisher/sra_metadata.py", line 207, in efetch_sra_from_accessions
    metadata = self.efetch_metadata_from_ids(webenv, accessions, len(sra_ids))
  File "/mnt/hpccs01/work/microbiome/sw/kingfisher-download/bin/../kingfisher/sra_metadata.py", line 142, in efetch_metadata_from_ids
    d2['spots'] = try_get(lambda: int(run.attrib['total_spots']))
  File "/mnt/hpccs01/work/microbiome/sw/kingfisher-download/bin/../kingfisher/sra_metadata.py", line 79, in try_get
    return func()
  File "/mnt/hpccs01/work/microbiome/sw/kingfisher-download/bin/../kingfisher/sra_metadata.py", line 142, in <lambda>
    d2['spots'] = try_get(lambda: int(run.attrib['total_spots']))
KeyError: 'total_spots

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.