bionode / bionode-ncbi Goto Github PK
View Code? Open in Web Editor NEWNode.js module for working with the NCBI API (aka e-utils).
Home Page: bionode.io
License: MIT License
Node.js module for working with the NCBI API (aka e-utils).
Home Page: bionode.io
License: MIT License
Guillardia theta assembly file is around 50mb. This wasn't a problem when the tests were downloading it every time from NCBI. However, now we are mocking NCBI api with a cached version of the file (so that tests don't break every week due to NCBI side changes). So this file is now stored in git-lfs, and with TravisCI testing we quickly reach the 1GB/month free GitHub git-lfs bandwidth. Maybe there's a small virus, bacteria or DNA region on NCBI with a much smaller assembly.
for e.g. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&retmode=json&version=2.0&db=gds&term=GSE48968&field=ACCN&usehistory=y which has 2428 results but we can only do 1000 at a time we should implement a batched fetcher that basically paginates through all of the result ids
I would like to modularize the library splitting it in different modules instead of having it in one giant file. The idea is no keep context while developing, allowing for easier implementation of new features without refactoring old logic and reduce the amount of duplicated code.
This should also make easier for new contributors to adventure on adding new logic.
I would like to start by:
What do you think?
I am running into the following error when searching the SRA for all Staphylococcus aureus sequences:
Command executed:
bionode-ncbi search sra "Staphylococcus aureus" > results.txt
Command exit status:
1
Command output:
(empty)
Command error:
events.js:165
throw er; // Unhandled 'error' event
^
TypeError: Cannot create property 'Run' on string ' '
at /home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/nested-property/index.js:54:27
at Array.reduce (<anonymous>)
at Object.setNestedProperty [as set] (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/nested-property/index.js:53:26)
at Transform.cb [as _transform] (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/lib/tool-stream.js:136:49)
at Transform._read (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_transform.js:184:10)
at Transform._write (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_transform.js:172:12)
at doWrite (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_writable.js:237:10)
at writeOrBuffer (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_writable.js:227:5)
at Transform.Writable.write (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_writable.js:194:11)
at Pumpify.Duplexify._write (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/duplexify/index.js:214:22)
Emitted 'error' event at:
at Parser.exports.Parser.Parser.parseString (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/xml2js/lib/parser.js:326:16)
at /home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/xml2js/lib/parser.js:5:59
at parseXMLString (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/lib/tool-stream.js:399:5)
at parseXMLProperty (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/lib/tool-stream.js:386:7)
at /home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/async/lib/async.js:122:13
at _each (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/async/lib/async.js:46:13)
at Object.async.each (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/async/lib/async.js:121:9)
at Transform.parser [as _transform] (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/lib/tool-stream.js:379:11)
at Transform._read (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_transform.js:184:10)
at Transform._write (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_transform.js:172:12)
.command.stub: line 98: 106508 Terminated nxf_trace "$pid" .command.trace
It would be better to use a progress bar when downloading archives.
Only when using a --verbose
flag all details should be printed to the console.
add mapping to gst
Hi. I'd like to download the annotation in GFF format of the Cycas taitungensis mitochondrion, NC_010303.1. I started with bionode ncbi search genome cycas taitungensis
, but got nada back. Any suggestions?
Some species that do this are
I'll provide more details later; just making a note for now.
Probably easy to fix (just update the tests data) but right now I can't look into it.
We could maybe as a long term solution use some mock framework to replicate the current NCBI expected behaviour and not rely and any specific metadata.
are you planning to support passing an option for different number of required results?
thanks.
Is it possible to get a single sequence given a Genbank ID?
I tried:
bionode-ncbi search Genbank AF068820.2
Which returns nothing.
If not, this would be useful ๐ธ
Hi,
I'm using bionode-ncbi
to download a large number of files from the SRA. It has been working very well so far except for 54 samples, all belonging to the same Bioproject.
if I do
bionode-ncbi download sra SRR1834189
it downloads the 54 samples related to the bioproject, instead of only SRR1834189
like it should.
I was quite dumbfounded at first, but after a bit of digging it appears that all the offending samples are linking to the same Experiment ID (SRX900319), instead of having one experiment ID per sample like my other downloads, which confuses bionode-ncbi
Indeed, when I do
bionode-ncbi search sra --pretty SRR1834189
it returns the whole experiment (SRX900319) instead of the run.
Thanks in advance for you help (and thank you for the good work at bionode, it's been a great resource so far!)
Hadrien.
I'm running a script that starts:
#!/bin/bash -ue
bionode-ncbi download assembly GCA_000320565.2 > /dev/null
bionode-ncbi download gff GCA_000320565.2 > /dev/null
which exits with exitcode 8:
events.js:74
throw TypeError('Uncaught, unspecified "error" event.');
^
TypeError: Uncaught, unspecified "error" event.
at TypeError (<anonymous>)
at EventEmitter.emit (events.js:74:15)
at Duplexify._destroy (/usr/local/lib/node_modules/bionode-ncbi/node_modules/pumpify/node_modules/duplexify/index.js:183:15)
at /usr/local/lib/node_modules/bionode-ncbi/node_modules/pumpify/node_modules/duplexify/index.js:174:10
at process._tickCallback (node.js:415:13)
Using:
Is there a good way to debug this error?
Git large file system is hard to deal with and has costs on GitHub. It was only used for a 50mb assembly file that was used in NCBI mock tests. Yet, we still manage to easily go over the 1 GB/month free bandwidth limit. Until we find a smaller one (#24) we need to disable the relevant tests and gets rid of git-lfs.
Often I lose connection to NCBI server (for some mysterious reason related with my router), but the interesting thing I found is that when connection is lost, bionode-ncbi returns an endless error message like the following:
try 21http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&retmode=json&version=2.0&db=assembly&term=Stenotrophomonas%20ginsengisoli&usehistory=y
try 22http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&retmode=json&version=2.0&db=assembly&term=Stenotrophomonas%20ginsengisoli&usehistory=y
try 23http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&retmode=json&version=2.0&db=assembly&term=Stenotrophomonas%20ginsengisoli&usehistory=y
...
In this instances it would be better if bionode-ncbi returned a clearer error message. Like
"cannot download assembly Stenotrophomonas ginsengisoli because it was unable to establish a connection to NCBI link (http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&retmode=json&version=2.0&db=assembly&term=Stenotrophomonas%20ginsengisoli&usehistory=y)"
The current error is not a NDJSON, thus it is not even "parsable" to other programs or scripts. But, I want to discuss if you think a "parsable" type of NDJSON would be useful in any circumstance, because from my experience if I cannot connect, I have to reset my router.
My idea is that we can convert the current string error message to a NDJSON, but would it be useful? In which circumstances?
Alternatively, if we get error messages related with the connection to NCBI, we could output a clearer error message in order to be more user friendly.
I did a global install of bionode, which installed /usr/bin/bionode
in my $PATH
.
When I run "bionode" I get this error:
Please check the documentation at http://doc.bionode.io
events.js:72
throw er; // Unhandled 'error' event
^
Error: spawn ENOENT
at errnoException (child_process.js:1001:11)
at Process.ChildProcess._handle.onexit (child_process.js:792:34)
Is this expected behaviour?
I notice it downloaded and installed a bunch of tools, and I can see them in /usr/lib/node_modules/bionode/node_modules/
including a subfolder bionode-ncbi
but I don't have any tool called bionode-ncbi
I can run?
I've never really used npm/Node software before, so any help appreciated.
As mentioned in bionode/bionode#25 and in other issues, we need to improve our CLI. Switching from minimist to yargs seems to be the solution.
Now if we ran bionode-ncbi search assembly someweirdcombinationofcharacters
this should render no output at all. Maybe it would be nice to have a warning in these cases.
Right now it's usually an error like:
TypeError: Cannot read property 'count' of undefined
at DestroyableTransform.transform [as _transform] (/usr/local/lib/node_modules/bionode-ncbi/lib/bionode-ncbi.js:162:48)
[...]
Hello, I'm trying to start new bionode.js and make a API with typeorm with typescript
and I tried to add bionode and it throw me error on dependencies
in npm solutions are update lodash to 4.17.11 (here)
$ bionode-ncbi help
/Users/rds45/.nvm/v0.10.24/lib/node_modules/bionode-ncbi/cli.js:44
var ncbiStream = options ? ncbi[command](options) : ncbi[command](arg1, arg2,
^
TypeError: Property 'undefined' of object #<Object> is not a function
at Object.<anonymous> (/Users/rds45/.nvm/v0.10.24/lib/node_modules/bionode-ncbi/cli.js:44:41)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Function.Module.runMain (module.js:497:10)
at startup (node.js:119:16)
at node.js:902:3
Thanks to you, I am able to download all the SRA files associated with a project and convert them to FASTQ. My question is how can I get the metadata information, say for all the files under SRP011546
. Under GEO, there is a metadata file but all the identifiers are GEO-based, i.e. GSE36552
and the like. And there's no mapping of GEO to SRA ids (e.g. SRR490990
in this project). What I'd like to get is the equivalent of Title: "Oocyte #1"
from GEO or Library name: GSM922167: 8-cell embryo#2 -Cell#6
from DNA Nexus Can the bionode-ncbi search
function be used to find this information?
I used the example code for DOWNLOAD option and the download is properly done, but there are a bunch of prints being outputted to terminal like this example:
{ uid: '244018',
url: 'http://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/188/075/GCF_000188075.1_Si_gnG/GCF_000188075.1_Si_gnG_genomic.fna.gz',
path: '244018/GCF_000188075.1_Si_gnG_genomic.fna.gz',
status: 'downloading',
total: 116654441,
progress: 100,
speed: 2046569.1403508773 }
It would be better to have some nice output like the one shown in the examples with a progress bar than a lot of prints in each state.
Also, after the download is completed the console freezes in the following prints:
{ uid: '244018',
url: 'http://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/188/075/GCF_000188075.1_Si_gnG/GCF_000188075.1_Si_gnG_genomic.fna.gz',
path: '244018/GCF_000188075.1_Si_gnG_genomic.fna.gz',
status: 'completed',
total: 116654441,
progress: 100,
speed: 'NA',
size: '111 MB' }
Finally, to escape this, I had to press "ctrl+c" on the console in order to be able to continue inputing new code.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.