bionode / bionode-ncbi Goto Github PK

View Code? Open in Web Editor NEW

65.0 24.0 20.0 25.97 MB

Node.js module for working with the NCBI API (aka e-utils).

Home Page: bionode.io

License: MIT License

JavaScript 99.71% Dockerfile 0.29%

bionode tool api-client bioinformatics nodejs

bionode-ncbi's Issues

We need a smaller assembly from NCBI for testing

Guillardia theta assembly file is around 50mb. This wasn't a problem when the tests were downloading it every time from NCBI. However, now we are mocking NCBI api with a cached version of the file (so that tests don't break every week due to NCBI side changes). So this file is now stored in git-lfs, and with TravisCI testing we quickly reach the 1GB/month free GitHub git-lfs bandwidth. Maybe there's a small virus, bacteria or DNA region on NCBI with a much smaller assembly.

pagination

for e.g. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&retmode=json&version=2.0&db=gds&term=GSE48968&field=ACCN&usehistory=y which has 2428 results but we can only do 1000 at a time we should implement a batched fetcher that basically paginates through all of the result ids

Modularize lib structure.

I would like to modularize the library splitting it in different modules instead of having it in one giant file. The idea is no keep context while developing, allowing for easier implementation of new features without refactoring old logic and reduce the amount of duplicated code.
This should also make easier for new contributors to adventure on adding new logic.

I would like to start by:

Split the modules by command.
Extract common logic to utils modules.

What do you think?

Implement efetch API

Currently bionode-ncbi uses NCBI esearch and esummary, which allows it to fetch metadata and also figure out where to dowload datasets. However, to be able to fetch only specific sequences, like genes, we need to implement efetch.

TypeError: Cannot create property 'Run' on string ' '

I am running into the following error when searching the SRA for all Staphylococcus aureus sequences:

Command executed:

 bionode-ncbi search sra "Staphylococcus aureus" > results.txt

Command exit status:
  1

Command output:
  (empty)

Command error:
  events.js:165
        throw er; // Unhandled 'error' event
        ^

  TypeError: Cannot create property 'Run' on string '                                                                '
      at /home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/nested-property/index.js:54:27
      at Array.reduce (<anonymous>)
      at Object.setNestedProperty [as set] (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/nested-property/index.js:53:26)
      at Transform.cb [as _transform] (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/lib/tool-stream.js:136:49)
      at Transform._read (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_transform.js:184:10)
      at Transform._write (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_transform.js:172:12)
      at doWrite (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_writable.js:237:10)
      at writeOrBuffer (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_writable.js:227:5)
      at Transform.Writable.write (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_writable.js:194:11)
      at Pumpify.Duplexify._write (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/duplexify/index.js:214:22)
  Emitted 'error' event at:
      at Parser.exports.Parser.Parser.parseString (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/xml2js/lib/parser.js:326:16)
      at /home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/xml2js/lib/parser.js:5:59
      at parseXMLString (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/lib/tool-stream.js:399:5)
      at parseXMLProperty (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/lib/tool-stream.js:386:7)
      at /home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/async/lib/async.js:122:13
      at _each (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/async/lib/async.js:46:13)
      at Object.async.each (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/async/lib/async.js:121:9)
      at Transform.parser [as _transform] (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/lib/tool-stream.js:379:11)
      at Transform._read (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_transform.js:184:10)
      at Transform._write (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_transform.js:172:12)
  .command.stub: line 98: 106508 Terminated              nxf_trace "$pid" .command.trace

Use progress bar on download

It would be better to use a progress bar when downloading archives.
Only when using a --verbose flag all details should be printed to the console.

geo is not searchable

add mapping to gst

Download the GFF file of the Cycas taitungensis mitochondrion

Hi. I'd like to download the annotation in GFF format of the Cycas taitungensis mitochondrion, NC_010303.1. I started with bionode ncbi search genome cycas taitungensis, but got nada back. Any suggestions?

rna_from_genomic instead of genomic.fna

Some species that do this are

Salmonella enterica
Staphylococcus aureus

I'll provide more details later; just making a note for now.

Tests are broken because of NCBI side metadata changes (again)

Probably easy to fix (just update the tests data) but right now I can't look into it.
We could maybe as a long term solution use some mock framework to replicate the current NCBI expected behaviour and not rely and any specific metadata.

retmax

are you planning to support passing an option for different number of required results?

thanks.

Get a sequence from a Genbank ID

Is it possible to get a single sequence given a Genbank ID?

I tried:

bionode-ncbi search Genbank AF068820.2

Which returns nothing.

If not, this would be useful 😸

Downloading one run download the whole Bioproject

Hi,

I'm using bionode-ncbi to download a large number of files from the SRA. It has been working very well so far except for 54 samples, all belonging to the same Bioproject.

Bioproject: PRJNA277291
Example sample: SRR1834189

if I do

bionode-ncbi download sra SRR1834189

it downloads the 54 samples related to the bioproject, instead of only SRR1834189 like it should.

I was quite dumbfounded at first, but after a bit of digging it appears that all the offending samples are linking to the same Experiment ID (SRX900319), instead of having one experiment ID per sample like my other downloads, which confuses bionode-ncbi

Indeed, when I do

bionode-ncbi search sra --pretty SRR1834189

it returns the whole experiment (SRX900319) instead of the run.

Thanks in advance for you help (and thank you for the good work at bionode, it's been a great resource so far!)

Hadrien.

Uncaught, unspecified "error" event

I'm running a script that starts:

#!/bin/bash -ue
bionode-ncbi download assembly GCA_000320565.2 > /dev/null
bionode-ncbi download gff GCA_000320565.2 > /dev/null

which exits with exitcode 8:

events.js:74
        throw TypeError('Uncaught, unspecified "error" event.');
              ^
TypeError: Uncaught, unspecified "error" event.
    at TypeError (<anonymous>)
    at EventEmitter.emit (events.js:74:15)
    at Duplexify._destroy (/usr/local/lib/node_modules/bionode-ncbi/node_modules/pumpify/node_modules/duplexify/index.js:183:15)
    at /usr/local/lib/node_modules/bionode-ncbi/node_modules/pumpify/node_modules/duplexify/index.js:174:10
    at process._tickCallback (node.js:415:13)

Using:

[email protected]
[email protected]

Is there a good way to debug this error?

Git rid of git-lfs

Git large file system is hard to deal with and has costs on GitHub. It was only used for a 50mb assembly file that was used in NCBI mock tests. Yet, we still manage to easily go over the 1 GB/month free bandwidth limit. Until we find a smaller one (#24) we need to disable the relevant tests and gets rid of git-lfs.

Error handling when NCBI connection is lost

Often I lose connection to NCBI server (for some mysterious reason related with my router), but the interesting thing I found is that when connection is lost, bionode-ncbi returns an endless error message like the following:

try 21http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&retmode=json&version=2.0&db=assembly&term=Stenotrophomonas%20ginsengisoli&usehistory=y
try 22http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&retmode=json&version=2.0&db=assembly&term=Stenotrophomonas%20ginsengisoli&usehistory=y
try 23http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&retmode=json&version=2.0&db=assembly&term=Stenotrophomonas%20ginsengisoli&usehistory=y
...

In this instances it would be better if bionode-ncbi returned a clearer error message. Like

"cannot download assembly Stenotrophomonas ginsengisoli because it was unable to establish a connection to NCBI link (http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&retmode=json&version=2.0&db=assembly&term=Stenotrophomonas%20ginsengisoli&usehistory=y)"

The current error is not a NDJSON, thus it is not even "parsable" to other programs or scripts. But, I want to discuss if you think a "parsable" type of NDJSON would be useful in any circumstance, because from my experience if I cannot connect, I have to reset my router.
My idea is that we can convert the current string error message to a NDJSON, but would it be useful? In which circumstances?

Alternatively, if we get error messages related with the connection to NCBI, we could output a clearer error message in order to be more user friendly.

Add Blast API

Hey there. I've ported the Blast API as a rudimentary example here, ncbi-lib.
I've seen it is missing in this great NCBI library so I'd be willing to provide some help and source code to add a more feature-rich version to bionode

What do you guys think?

Unexpected error when running 'bionode'

I did a global install of bionode, which installed /usr/bin/bionode in my $PATH.

When I run "bionode" I get this error:

Please check the documentation at http://doc.bionode.io
events.js:72
        throw er; // Unhandled 'error' event
              ^
Error: spawn ENOENT
    at errnoException (child_process.js:1001:11)
    at Process.ChildProcess._handle.onexit (child_process.js:792:34)

Is this expected behaviour?

I notice it downloaded and installed a bunch of tools, and I can see them in /usr/lib/node_modules/bionode/node_modules/ including a subfolder bionode-ncbi but I don't have any tool called bionode-ncbi I can run?

I've never really used npm/Node software before, so any help appreciated.

User-friendly CLI

As mentioned in bionode/bionode#25 and in other issues, we need to improve our CLI. Switching from minimist to yargs seems to be the solution.

Output some warning to terminal when there is no output

Now if we ran bionode-ncbi search assembly someweirdcombinationofcharacters this should render no output at all. Maybe it would be nice to have a warning in these cases.

Provide a user-friendly error when unable to access NCBI servers

Right now it's usually an error like:

TypeError: Cannot read property 'count' of undefined
    at DestroyableTransform.transform [as _transform] (/usr/local/lib/node_modules/bionode-ncbi/lib/bionode-ncbi.js:162:48)
[...]

Dependencies error: Prototype Pollution (lodash dependencies)

Hello, I'm trying to start new bionode.js and make a API with typeorm with typescript
and I tried to add bionode and it throw me error on dependencies

in npm solutions are update lodash to 4.17.11 (here)

command-line argument parsing error

$ bionode-ncbi help
/Users/rds45/.nvm/v0.10.24/lib/node_modules/bionode-ncbi/cli.js:44
var ncbiStream = options ? ncbi[command](options) : ncbi[command](arg1, arg2,
                                        ^
TypeError: Property 'undefined' of object #<Object> is not a function
    at Object.<anonymous> (/Users/rds45/.nvm/v0.10.24/lib/node_modules/bionode-ncbi/cli.js:44:41)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Function.Module.runMain (module.js:497:10)
    at startup (node.js:119:16)
    at node.js:902:3

How to look up metadata for an SRR file

Thanks to you, I am able to download all the SRA files associated with a project and convert them to FASTQ. My question is how can I get the metadata information, say for all the files under SRP011546 . Under GEO, there is a metadata file but all the identifiers are GEO-based, i.e. GSE36552 and the like. And there's no mapping of GEO to SRA ids (e.g. SRR490990 in this project). What I'd like to get is the equivalent of Title: "Oocyte #1" from GEO or Library name: GSM922167: 8-cell embryo#2 -Cell#6 from DNA Nexus Can the bionode-ncbi search function be used to find this information?

download in the examples printing too much info

I used the example code for DOWNLOAD option and the download is properly done, but there are a bunch of prints being outputted to terminal like this example:

{ uid: '244018',
  url: 'http://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/188/075/GCF_000188075.1_Si_gnG/GCF_000188075.1_Si_gnG_genomic.fna.gz',
  path: '244018/GCF_000188075.1_Si_gnG_genomic.fna.gz',
  status: 'downloading',
  total: 116654441,
  progress: 100,
  speed: 2046569.1403508773 }

It would be better to have some nice output like the one shown in the examples with a progress bar than a lot of prints in each state.
Also, after the download is completed the console freezes in the following prints:

{ uid: '244018',
  url: 'http://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/188/075/GCF_000188075.1_Si_gnG/GCF_000188075.1_Si_gnG_genomic.fna.gz',
  path: '244018/GCF_000188075.1_Si_gnG_genomic.fna.gz',
  status: 'completed',
  total: 116654441,
  progress: 100,
  speed: 'NA',
  size: '111 MB' }

Finally, to escape this, I had to press "ctrl+c" on the console in order to be able to continue inputing new code.

bionode / bionode-ncbi Goto Github PK

bionode-ncbi's Issues

Recommend Projects

Recommend Topics

Recommend Org