Giter Site home page Giter Site logo

bionode / bionode-ncbi Goto Github PK

View Code? Open in Web Editor NEW
65.0 65.0 21.0 25.97 MB

Node.js module for working with the NCBI API (aka e-utils).

Home Page: bionode.io

License: MIT License

JavaScript 99.71% Dockerfile 0.29%
api-client bioinformatics bionode nodejs tool

bionode-ncbi's Introduction

bionode logo
bionode.io

bionode

Modular and universal bioinformatics

npm Travis Coveralls Dependencies npm Gitter

Install

You need to install the latest Node.JS first, please check nodejs.org or do the following:

# Ubuntu
sudo apt-get install npm
# Mac
brew install node
# Both
npm install -g n
n stable

To use bionode as a command line tool, you can install it globally with -g.

npm install bionode -g

Or, if you want to use it as a JavaScript library, you need to install it in your local project folder inside the node_modules directory by doing the same command without -g.

npm i bionode # 'i' can be used as shorcut to 'install'

Documentation

Check our documentation at doc.bionode.io

Modules list

For a complete list of bionode modules, please check the repositories with the "tool" tag

Contributing

We welcome all kinds of contributions at all levels of experience, please read the CONTRIBUTING.md to get started!

Communication channels

Don't be shy! Come talk to us ๐Ÿ˜ƒ

Who's using Bionode?

For a list of some projects or institutions that we know of, check the USERS.md file. If you think you should be on that list or know who should, let us know! :D

Acknowledgements

We would like to thank all the people and institutions listed below!

bionode-ncbi's People

Contributors

ayangromano avatar bmpvieira avatar istar-eldritch avatar katrinleinweber avatar max-mapper avatar stuntspt avatar terfilip avatar thejmazz avatar tiagofilipe12 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bionode-ncbi's Issues

Get a sequence from a Genbank ID

Is it possible to get a single sequence given a Genbank ID?

I tried:

bionode-ncbi search Genbank AF068820.2

Which returns nothing.

If not, this would be useful ๐Ÿ˜ธ

Downloading one run download the whole Bioproject

Hi,

I'm using bionode-ncbi to download a large number of files from the SRA. It has been working very well so far except for 54 samples, all belonging to the same Bioproject.

  • Bioproject: PRJNA277291
  • Example sample: SRR1834189

if I do

bionode-ncbi download sra SRR1834189

it downloads the 54 samples related to the bioproject, instead of only SRR1834189 like it should.

I was quite dumbfounded at first, but after a bit of digging it appears that all the offending samples are linking to the same Experiment ID (SRX900319), instead of having one experiment ID per sample like my other downloads, which confuses bionode-ncbi

Indeed, when I do

bionode-ncbi search sra --pretty SRR1834189

it returns the whole experiment (SRX900319) instead of the run.

Thanks in advance for you help (and thank you for the good work at bionode, it's been a great resource so far!)

Hadrien.

Uncaught, unspecified "error" event

I'm running a script that starts:

#!/bin/bash -ue
bionode-ncbi download assembly GCA_000320565.2 > /dev/null
bionode-ncbi download gff GCA_000320565.2 > /dev/null

which exits with exitcode 8:

events.js:74
        throw TypeError('Uncaught, unspecified "error" event.');
              ^
TypeError: Uncaught, unspecified "error" event.
    at TypeError (<anonymous>)
    at EventEmitter.emit (events.js:74:15)
    at Duplexify._destroy (/usr/local/lib/node_modules/bionode-ncbi/node_modules/pumpify/node_modules/duplexify/index.js:183:15)
    at /usr/local/lib/node_modules/bionode-ncbi/node_modules/pumpify/node_modules/duplexify/index.js:174:10
    at process._tickCallback (node.js:415:13)

Using:

Is there a good way to debug this error?

retmax

are you planning to support passing an option for different number of required results?

thanks.

Add Blast API

Hey there. I've ported the Blast API as a rudimentary example here, ncbi-lib.
I've seen it is missing in this great NCBI library so I'd be willing to provide some help and source code to add a more feature-rich version to bionode

What do you guys think?

Unexpected error when running 'bionode'

I did a global install of bionode, which installed /usr/bin/bionode in my $PATH.

When I run "bionode" I get this error:

Please check the documentation at http://doc.bionode.io
events.js:72
        throw er; // Unhandled 'error' event
              ^
Error: spawn ENOENT
    at errnoException (child_process.js:1001:11)
    at Process.ChildProcess._handle.onexit (child_process.js:792:34)

Is this expected behaviour?

I notice it downloaded and installed a bunch of tools, and I can see them in /usr/lib/node_modules/bionode/node_modules/ including a subfolder bionode-ncbi but I don't have any tool called bionode-ncbi I can run?

I've never really used npm/Node software before, so any help appreciated.

Modularize lib structure.

I would like to modularize the library splitting it in different modules instead of having it in one giant file. The idea is no keep context while developing, allowing for easier implementation of new features without refactoring old logic and reduce the amount of duplicated code.
This should also make easier for new contributors to adventure on adding new logic.

I would like to start by:

  • Split the modules by command.
  • Extract common logic to utils modules.

What do you think?

How to look up metadata for an SRR file

Thanks to you, I am able to download all the SRA files associated with a project and convert them to FASTQ. My question is how can I get the metadata information, say for all the files under SRP011546 . Under GEO, there is a metadata file but all the identifiers are GEO-based, i.e. GSE36552 and the like. And there's no mapping of GEO to SRA ids (e.g. SRR490990 in this project). What I'd like to get is the equivalent of Title: "Oocyte #1" from GEO or Library name: GSM922167: 8-cell embryo#2 -Cell#6 from DNA Nexus Can the bionode-ncbi search function be used to find this information?

We need a smaller assembly from NCBI for testing

Guillardia theta assembly file is around 50mb. This wasn't a problem when the tests were downloading it every time from NCBI. However, now we are mocking NCBI api with a cached version of the file (so that tests don't break every week due to NCBI side changes). So this file is now stored in git-lfs, and with TravisCI testing we quickly reach the 1GB/month free GitHub git-lfs bandwidth. Maybe there's a small virus, bacteria or DNA region on NCBI with a much smaller assembly.

Git rid of git-lfs

Git large file system is hard to deal with and has costs on GitHub. It was only used for a 50mb assembly file that was used in NCBI mock tests. Yet, we still manage to easily go over the 1 GB/month free bandwidth limit. Until we find a smaller one (#24) we need to disable the relevant tests and gets rid of git-lfs.

download in the examples printing too much info

I used the example code for DOWNLOAD option and the download is properly done, but there are a bunch of prints being outputted to terminal like this example:

{ uid: '244018',
  url: 'http://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/188/075/GCF_000188075.1_Si_gnG/GCF_000188075.1_Si_gnG_genomic.fna.gz',
  path: '244018/GCF_000188075.1_Si_gnG_genomic.fna.gz',
  status: 'downloading',
  total: 116654441,
  progress: 100,
  speed: 2046569.1403508773 }

It would be better to have some nice output like the one shown in the examples with a progress bar than a lot of prints in each state.
Also, after the download is completed the console freezes in the following prints:

{ uid: '244018',
  url: 'http://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/188/075/GCF_000188075.1_Si_gnG/GCF_000188075.1_Si_gnG_genomic.fna.gz',
  path: '244018/GCF_000188075.1_Si_gnG_genomic.fna.gz',
  status: 'completed',
  total: 116654441,
  progress: 100,
  speed: 'NA',
  size: '111 MB' }

Finally, to escape this, I had to press "ctrl+c" on the console in order to be able to continue inputing new code.

Error handling when NCBI connection is lost

Often I lose connection to NCBI server (for some mysterious reason related with my router), but the interesting thing I found is that when connection is lost, bionode-ncbi returns an endless error message like the following:

try 21http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&retmode=json&version=2.0&db=assembly&term=Stenotrophomonas%20ginsengisoli&usehistory=y
try 22http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&retmode=json&version=2.0&db=assembly&term=Stenotrophomonas%20ginsengisoli&usehistory=y
try 23http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&retmode=json&version=2.0&db=assembly&term=Stenotrophomonas%20ginsengisoli&usehistory=y
...

In this instances it would be better if bionode-ncbi returned a clearer error message. Like

"cannot download assembly Stenotrophomonas ginsengisoli because it was unable to establish a connection to NCBI link (http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&retmode=json&version=2.0&db=assembly&term=Stenotrophomonas%20ginsengisoli&usehistory=y)"

The current error is not a NDJSON, thus it is not even "parsable" to other programs or scripts. But, I want to discuss if you think a "parsable" type of NDJSON would be useful in any circumstance, because from my experience if I cannot connect, I have to reset my router.
My idea is that we can convert the current string error message to a NDJSON, but would it be useful? In which circumstances?

Alternatively, if we get error messages related with the connection to NCBI, we could output a clearer error message in order to be more user friendly.

Use progress bar on download

It would be better to use a progress bar when downloading archives.
Only when using a --verbose flag all details should be printed to the console.

Implement efetch API

Currently bionode-ncbi uses NCBI esearch and esummary, which allows it to fetch metadata and also figure out where to dowload datasets. However, to be able to fetch only specific sequences, like genes, we need to implement efetch.

command-line argument parsing error

$ bionode-ncbi help
/Users/rds45/.nvm/v0.10.24/lib/node_modules/bionode-ncbi/cli.js:44
var ncbiStream = options ? ncbi[command](options) : ncbi[command](arg1, arg2,
                                        ^
TypeError: Property 'undefined' of object #<Object> is not a function
    at Object.<anonymous> (/Users/rds45/.nvm/v0.10.24/lib/node_modules/bionode-ncbi/cli.js:44:41)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Function.Module.runMain (module.js:497:10)
    at startup (node.js:119:16)
    at node.js:902:3

TypeError: Cannot create property 'Run' on string ' '

I am running into the following error when searching the SRA for all Staphylococcus aureus sequences:

Command executed:

 bionode-ncbi search sra "Staphylococcus aureus" > results.txt

Command exit status:
  1

Command output:
  (empty)

Command error:
  events.js:165
        throw er; // Unhandled 'error' event
        ^

  TypeError: Cannot create property 'Run' on string '                                                                '
      at /home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/nested-property/index.js:54:27
      at Array.reduce (<anonymous>)
      at Object.setNestedProperty [as set] (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/nested-property/index.js:53:26)
      at Transform.cb [as _transform] (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/lib/tool-stream.js:136:49)
      at Transform._read (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_transform.js:184:10)
      at Transform._write (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_transform.js:172:12)
      at doWrite (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_writable.js:237:10)
      at writeOrBuffer (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_writable.js:227:5)
      at Transform.Writable.write (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_writable.js:194:11)
      at Pumpify.Duplexify._write (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/duplexify/index.js:214:22)
  Emitted 'error' event at:
      at Parser.exports.Parser.Parser.parseString (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/xml2js/lib/parser.js:326:16)
      at /home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/xml2js/lib/parser.js:5:59
      at parseXMLString (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/lib/tool-stream.js:399:5)
      at parseXMLProperty (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/lib/tool-stream.js:386:7)
      at /home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/async/lib/async.js:122:13
      at _each (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/async/lib/async.js:46:13)
      at Object.async.each (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/async/lib/async.js:121:9)
      at Transform.parser [as _transform] (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/lib/tool-stream.js:379:11)
      at Transform._read (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_transform.js:184:10)
      at Transform._write (/home/esteinig/miniconda3/envs/nextflow/lib/node_modules/bionode-ncbi/node_modules/tool-stream/node_modules/readable-stream/lib/_stream_transform.js:172:12)
  .command.stub: line 98: 106508 Terminated              nxf_trace "$pid" .command.trace

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.