Giter Site home page Giter Site logo

bionode's Introduction

bionode logo
bionode.io

bionode

Modular and universal bioinformatics

npm Travis Coveralls Dependencies npm Gitter

Install

You need to install the latest Node.JS first, please check nodejs.org or do the following:

# Ubuntu
sudo apt-get install npm
# Mac
brew install node
# Both
npm install -g n
n stable

To use bionode as a command line tool, you can install it globally with -g.

npm install bionode -g

Or, if you want to use it as a JavaScript library, you need to install it in your local project folder inside the node_modules directory by doing the same command without -g.

npm i bionode # 'i' can be used as shorcut to 'install'

Documentation

Check our documentation at doc.bionode.io

Modules list

For a complete list of bionode modules, please check the repositories with the "tool" tag

Contributing

We welcome all kinds of contributions at all levels of experience, please read the CONTRIBUTING.md to get started!

Communication channels

Don't be shy! Come talk to us 😃

Who's using Bionode?

For a list of some projects or institutions that we know of, check the USERS.md file. If you think you should be on that list or know who should, let us know! :D

Acknowledgements

We would like to thank all the people and institutions listed below!

bionode's People

Contributors

alanrice avatar bmpvieira avatar c089 avatar katrinleinweber avatar max-mapper avatar stuntspt avatar thejmazz avatar yannickwurm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bionode's Issues

Autogenerate module template

From a name, generate a template skeleton with everything setup, including badges in README.md (e.g, bionode init)

Make project board public

The organization wide project board (that gathers all the issues from all the modules) is only visible to members. This makes it hard to show to new potential contributors the state of the project. It's unclear if/when GitHub will allow this to be public. In the meantime we could have a very simple html page that lists the cards by column and is autogenerated using data from GitHub's API:

https://developer.github.com/v3/projects/cards/

This could be simply hosted as a gh-page and built daily using Travis cron jobs. No need to build an Express or React app for on this.

Mozilla's Global Sprint (June 1st and 2nd 2017)

About

"Mozilla’s Global Sprint is a fun, fast-paced and two-day collaborative event to hack and build projects for a healthy Internet. A diverse network of scientists, educators, artists, engineers and others come together in person and online to innovate in the open. Get your tickets now!"

Communication channels

Come chat with us at http://gitter.im/bionode/bionode or IRC (freenode) #bionode!

Ways to contribute to Bionode during sprint:

Very little or no programming knowledge required:

Some JavaScript knowledge required:

Challenging issues:

User-friendly CLI

To avoid confusing users we should improve the way command line arguments are parsed. At least it should:

  • show usage information when invoked without arguments, or with -h or --help
  • show a helpful error message if any of the arguments are missing or of the wrong type

If this becomes too cumbersome to implement with the very simple minimist engine, it might be good consider using a full-featured parsing engine such as yargs for this.

Code of Conduct

Styleguide for tagging

For a more productive organisation, we should probably follow this simple styleguide for tagging issues and avoid creating new tags.

Here's what we should use

For clarity, use only one of these labels! if you feel that you need to use more, your issue might not be specific enough:

  • #EE3F46 Problems (high priority): bug security production
  • #5EBEFF Improvements (generally iteration on code): enhancement optimization refactor
  • #91CA55 Additions (generally new code): feature
  • #CC317C Feedback (some do not involve code): discussion question community
  • #FAD8C7 Testing (needs to be done or only occurs in): test staging
  • #FFC274 Experience (more artistic): design ux
  • #FEF2C0 Mindless (can be done during a break): chore legal documentation examples
  • #D2DAE1 Inactive (gets immediately closed): invalid wontfix duplicate deprecated

These can be attached to the ones above to give more context:

  • #BFD4F2 Platform (can exceptionally be customized to group issues): server browser
  • #FBCA04 Pending (stalled by difficulty or external): help wanted in progress epic on hold

We should use something like github-labels to do this automatically and integrate with #23.

Old Roadmap

Roadmap

This is a WIP roadmap started as a Mozilla Working Open Workshop 2017 activity

Modular and universal bioinformatics

Bionode.io is a community that aims at building highly reusable tools and code for bioinformatics by leveraging the Node.JS ecosystem and community.

Why

Genomic data is flooding the web, and we need tools that can scale to analyse it in realtime and fast to:

  • Potentially save lives with real-time analysis in scenarios such as rapid response to bacteria or virus outbreaks (especially now with portable real-time DNA sequencers like Oxford Nanopore);
  • Make research advance faster (with quicker data analysis) and be more reusable (with modular tools);
  • Democratize science by reducing the computational resources required for some data analysis (because not everyone has access to terabytes of RAM and petabytes of space) and allowing things to run in a browser without complicated software installations.

Core features

  • Run everywhere: JavaScript/Node.JS is the most "write once, run anywhere" programming language available. Bionode data analysis tools and pipelines should run on distributed high performance computing servers for big data but also locally on user machines for browser web applications (e.g., genome browsers).
  • Use Streams: The core architecture of the code should be based on Node.JS Streams. This allows to process data in realtime and in chunks with self-regulation though backpressure (i.e., if one step is slow, the whole pipeline adjusts). For example, while you are downloading a data set, you could be analysing it before the download is complete without worrying about latency and timeouts issues. In practice this means Bionode pipelines would use less computing resources (memory and disk space), do the data analysis in realtime and finish faster than other approaches.

Short term - what we're working on now

  • Funding for full-time development
  • Showcase data analysis pipeline

Medium term - what we're working on next!

  • CWL integration
  • Dat integration

Longer term items - working on this soon!

  • C++ integration
  • Workflow GUIs
  • BioJS integration

Achievements

  • Prototype code and tools
  • GSoC 2016
  • Google Campus Hackathon in London, UK
  • Workshop at the Mozilla Festival 2016 in London, UK
  • Workshop at the Bioinformatics Open Days in Braga, Portugal
  • Workshop at the Institute of Molecular Medicine in Lisbon, Portugal

Follow the JavaScript Standard style

The standard style is used probably a lot in Bionode (and Node.js) but currently not enforced in Bionode. I think having the same style across all modules makes it easier for the team to review the code and contribute.
The standard module should be added to the dependencies and to the tests of all bionode modules.

WIP:

Roadmap

Roadmap

This is a WIP roadmap started as a Mozilla Working Open Workshop 2017 activity

Modular and universal bioinformatics

Bionode.io is a community that aims at building highly reusable tools and code for bioinformatics by leveraging the Node.JS ecosystem and community.

Why

Genomic data is flooding the web, and we need tools that can scale to analyse it in realtime and fast to:

  • Potentially save lives with real-time analysis in scenarios such as rapid response to bacteria or virus outbreaks (especially now with portable real-time DNA sequencers like Oxford Nanopore);
  • Make research advance faster (with quicker data analysis) and be more reusable (with modular tools);
  • Democratize science by reducing the computational resources required for some data analysis (because not everyone has access to terabytes of RAM and petabytes of space) and allowing things to run in a browser without complicated software installations.

Core features

  • Run everywhere: JavaScript/Node.JS is the most "write once, run anywhere" programming language available. Bionode data analysis tools and pipelines should run on distributed high performance computing servers for big data but also locally on user machines for browser web applications (e.g., genome browsers).
  • Use Streams: The core architecture of the code should be based on Node.JS Streams. This allows to process data in realtime and in chunks with self-regulation though backpressure (i.e., if one step is slow, the whole pipeline adjusts). For example, while you are downloading a data set, you could be analysing it before the download is complete without worrying about latency and timeouts issues. In practice this means Bionode pipelines would use less computing resources (memory and disk space), do the data analysis in realtime and finish faster than other approaches.

Short term - what we're working on now

  • Funding for full-time development
  • Showcase data analysis pipeline

Medium term - what we're working on next!

  • CWL integration
  • Dat integration

Longer term items - working on this soon!

  • C++ integration
  • Workflow GUIs
  • BioJS integration

Achievements

  • Prototype code and tools
  • GSoC 2016
  • Google Campus Hackathon in London, UK
  • Workshop at the Mozilla Festival 2016 in London, UK
  • Workshop at the Bioinformatics Open Days in Braga, Portugal
  • Workshop at the Institute of Molecular Medicine in Lisbon, Portugal

Issues with modules that require binaries (wrappers)

@yannickwurm: "Some major work is required for modules which require binaries. Currently using samtools (etc) involves download & compiling... this is extremely prone to fail & a big maintenance effort. Instead, rely on local docker install (so each module relies on a single docker file. (thus each module would include a dockerfile)."

So, either we:

  • Rely on Docker like Yannick says (issue: Docker becomes a dependency);
  • Don't provide at all the binaries (just the wrapper) and push the responsibility of installing to the user (sucks and hard to control versioning);
  • Try to provide pre-compiled binaries (but then, we're doing package management...). @maxogden might have some ideas;
  • Turn some of those tools into native addons (hard to do and maintain in sync with upstream).

RequireBin FLAGRANT SYSTEM ERROR

There is a problem requireing bionode in RequireBin.

Should bionode work in RequireBin?

Bundling error:

---FLAGRANT SYSTEM ERROR---

--- error #0: ---

(logs uuid: 518f5c00-a3a1-11e5-ad6f-0d42355767a6 )

Error: "browserify exited with code 1"

code: 1
stderr: Error: Cannot find module 'browserify-fs' from '/tmp/bionode1151116-8415-q9lek3/node_modules/bionode/node_modules/bionode-fasta/lib'
at /home/admin/browserify-cdn/node_modules/browserify/node_modules/browser-resolve/node_modules/resolve/lib/async.js:50:17
at process (/home/admin/browserify-cdn/node_modules/browserify/node_modules/browser-resolve/node_modules/resolve/lib/async.js:119:43)
at /home/admin/browserify-cdn/node_modules/browserify/node_modules/browser-resolve/node_modules/resolve/lib/async.js:128:21
at load (/home/admin/browserify-cdn/node_modules/browserify/node_modules/browser-resolve/node_modules/resolve/lib/async.js:60:43)
at /home/admin/browserify-cdn/node_modules/browserify/node_modules/browser-resolve/node_modules/resolve/lib/async.js:66:22
at /home/admin/browserify-cdn/node_modules/browserify/node_modules/browser-resolve/node_modules/resolve/lib/async.js:21:47
at Object.oncomplete (fs.js:107:15)

dirPath: /tmp/bionode1151116-8415-q9lek3


This is probably an issue with the package, and not browserify-cdn itself.
If you feel differently, feel free to file a bug report at:

https://github.com/jfhbrook/browserify-cdn/issues

Include the ENTIRETY of the contents of this message, and the maintainer(s)
can try to help you out.

Have a nice day!

ES6 Discussion

Continuing from bionode/bionode-fasta#6

Goals:

  • backwards compatibility as much as possible (maybe legacy v0.12.x)
  • help engage new members by not being super bleeding edge
  • never require transpilation of Node code
  • browser code will need to be bundled anyways, so transpilation can be OK for that? (i.e. browserify with -t babelify, or webpack + babel-loader) working examples for browser setup of a module should always be included.
  • follow best streams implementation (probably latest, may be reason to break backwards compatibility, but with major bump as per semver)

ES6 Compatibility Table

ES6 use cases, arguments for, examples in bionode:

  • todo...

Collaboration with BioJS (same module system and package manager)

BioJS was initially a registry of browser components for biological visualization. Bionode is more oriented for data manipulation (finding, parsing, analyzing, etc) and is more similar to other Bio* libraries like BioPython, BioRuby, etc. When possible, Bionode modules work client and server side while BioJS worked only on the Browser. Consequently, there was no overlap between the two projects. Now, BioJS no longer wants to be just a repository, and also wants to work server side. This could be an opportunity for both projects to work together to avoid duplicated efforts.

However, there's one major point where we're not agreeing, Bionode uses Node.js CommonJS with Browserify and BioJS team wants to move their modules to AMD. BioJS argues that AMD is the only system that allows for live module loading while others require building. Bionode went for Browserify because it allows using Node.js core features (like Streams) on the browser. Browserify supports live reload with tools like watchify, gulp, beefy, etc.

The BioJS team suggests discussing the following possibilities for integration between both projects:

  1. use AMD on the server
    e.g. use RequireJS as Node.js module
  2. use a CommonJS bundler for the client
    load compiled modules in the browser (Browserify, RequireJS,...)
  3. define modules two-way: UMD (universal module definiton) [4]
    specify them as AMD and CommonJS module (and global browser constants) in parallel
    e.g. commonjsStrictGlobal.js
  4. Bionode ideas
  5. Stop talking

We hope this issue gets some feedback from the bioinformaticians, Node.js and JavaScript communities.

Google Summer of Code 2015

Bionode submitted with BioJS, projects for GSoC2015. Unfortunately, BioJS wasn't accepted as an organisation this year (Google didn't accept many previously accepted organisations, and accepted new ones, which is fair).
However, those projects can still be carried by anyone interested in them, or by students looking for a projects.
The following is a copy of the Bionode projects listed on the BioJS page. If you are interested, please reply here or at gitter.im/bionode/bionode. You can also send me an email at [email protected]

Bionode Pipeline Building GUI

Rationale & Approach

Making a easy to use graphical user interface to build interactive pipelines would lower the barrier of entry to usage of Bionode to non-bioinformaticians/programmers. This could be achieved through integration with projects like Galaxy, however a more interactive/advanced interface such as Node-RED is what we are aiming for. Another good source for interface inspiration would be the NoFlo project. Node-RED or any other open source project can and should be used/adapted as much as possible instead of writing a new interface from scratch.

The resulting interface should produce as output a descriptive text file representation of the pipeline, that should be able to run on the command line without requiring the GUI. For example, Gasket, datscript, hackfile or Makefile.

Challenges

  • Integration between available interfaces and bionode pipeline
  • Producing a simple text format representation of those pipelines for easy versioning, distribution and collaboration.

Involved Tools / Libraries

  • Node-RED
  • NoFlo (for ideas)
  • Galaxy (for ideas)
  • Gasket, Datscript, Hackfiles, Makefiles (for text representation of pipeline).

Needed Skills

  • Backend JavaScript/Node.js
  • Frontend JavaScript
  • Bash
  • CoffeeScript (for NoFlo)

Mentors

Bionode team (contact: Bruno Vieira)

  • Boris Adryan: Scientist: @Flyjedi, genome gazer. Geek: Founder of @thingslearn, #IoT tinkerer
  • Bruno Vieira: Bioinformatics PhD student at Queen Mary University of London and Node.JS Web Developer. Working on population genomics, bionode.io and dat-data.com
  • Dave C-J: Node-RED developer
  • Karissa McKelvey: Programmer and idea jockey based in Oakland, CA. Former academic experienced in building interactive data visualization and collaboration tools
  • Mathias Buus: Programmer based in Copenhagen, Denmark. Co-creator of node-modules.com and co-founder of ge.tt. open mouth, open source
  • Max Ogden: Programmer based in Portland, OR. Max works on or has worked on things like CSVConf, Code for America, JavaScript for Cats, and Voxel.js
  • Nicholas O'Leary: IBM Emerging Technologies geek. All things MQTT and IoT. Creator of @nodered and one of the @BeardyDads
  • Steve Moos: Passionate Computational and Data Scientist specialising in Bioinformatics, DevOps and SysAdmin
  • Yannick Wurm: Population Genomics, Bioinformatics, Evolution of Social Insects. Senior Lecturer at Queen Mary University London

Bionode integration

Rationale & Approach

Bionode focus is on modular pipelines for data manipulation and analysis, while BioJS focus is on visualisation. It would be interesting to combine both tools to solve a biologically relevant problem while testing and solving issues with the integration between both projects.
For example, one interesting use case is to use Bionode to get transcriptomic data from the Sequence Read Archive (SRA) for any species/experiment and visualise the expression levels of genes with BioJS. During your project you should be able to work on at least three different use cases.
As the data might become larger for specific files (e.g. SAM/BAM) one should be able to use streams to communicate with Bionode modules

Challenges

  • Getting several modules from both projects to work together
  • Might require some architectural changes to those modules.

Involved Tools / Libraries

  • Bionode
  • BioJS

Needed Skills

  • Frontend JavaScript
  • Backend JavaScript/Node.js

Mentors

Bruno Vieira (Bionode) and Miguel Pignatelli (BioJS)

Bionode distribution on HPC Grid

Rationale & Approach

Bionode pipelines can currently only run on one machine, but we would like them to be able to scale and be distributed across nodes of a high performance computing cluster (HPC). There are several ways to distribute Node apps across several CPUs/Machines using native Node.js or libraries but for a scenario were the user does not have administrative access to the cluster and must rely on established queuing tools (i.e., Sun Grid Engine) integrating/wrapping Bionode around those tools might be the best approach.

Challenges

  • Development will require access to a cluster of several machines or a simulated environment. We already have a Docker container that provides Sun Grid Engine.
  • If the student is interested in using Node.js queuing/distribution libraries, it will require a review of the existing options and adapting to bionode pipelines.
  • If the student has more interest or experience with other queuing systems, it will require wrapping those systems with bionode/node.js code.
  • We only expect the student to do one approach, but a very skilled student could do both.

Involved Tools / Libraries

  • Node queuing systems
  • Other queuing systems (i.e. SGE)

Needed Skills

  • Node.js/JavaScript
  • HPC experience
  • Docker (could be useful for development)

Mentors

Steve Moss and Max Ogden

Bionode modules

Rationale & Approach

There are several modules that would be useful for bionode that can be grouped in:

  • Data access (from web APIs)
  • Data parsing/wrangling
  • Tools wrappers

The student could work on improving an existing module or writing from scratch a module that has been requested. If the student is interested in several small modules, improving their architecture and integration among themselves and other UNIX tools could have a huge impact on the usability of the project.

Challenges

The challenges will depend on the module(s) the student is interested in, but there are enough options to adapt to a very diverse range and level of skills.

Involved Tools / Libraries

  • Depends on the module, but everything from web APIs (e.g., NCBI) to command line tools (e.g., SAMTOOLS).
  • Node.js/JavaScript

Needed Skills

JavaScript/Node.js

Mentors

Bionode team (contact: Bruno Vieira)

How do you define a bio-core?

Seb: Why do you have the sequence class as core?
Seb: Something that I am imagining that should go inside this core could be a "GenericParser" which all parsers can inherit. After all they receive text and output an object. On the other side that might change too: arrays of binary data could be needed or with the upcoming ES6 Harmony browsers might finally support more structures.

Bruno: I think bio core should have helpers or util functions that are reused by most of the other bio modules.
The Sequence methods fit that description, although the reason why I have them in core is more historical than anything else. The first bionode module started as a way to provide those functions client side to the Afra project, while also being available for server side usage.
They should probably be moved to a specific bionode-sequence module, but then the bionode module would be empty since I don't have helper functions for now. In that case, the bionode module could become instead of the "core" module the "meta" module that links the other modules together in some kind of framework.

How do we manage to bundle the documentation of all plugins into a Biogems website?

Following Max's advice I thought I just create issue for every open questions, so that is is easier to track things.

Bruno: I'm a fan of having literate code and using docco. I think it's a good practice to write the comments as docs when you're writing the code, and that way you don't have to figure out where the documentation is.
However, this isn't of course an alternative to a global API doc. As I mentioned, some projects do global APIs manually, like underscore, express, nodejs and socket.io. However, something automatic would be better. I'm currently exploring doxx as it seems to be able to generate a single doc for multiple modules. If doxx can solve most of the problem, we can then tweak it to generate an output with some of Biogems features.

Why restrict components to NPM?

Close this up if you feel like I am trolling (which I am) :) No offence taken and I apologize! Just thought that given the cutting edge nature of bioinformatics, you may want to help think of ways to push the boundaries of how we use Node.js :)

There is not an easy alternative to NPM as of yet, but the idea is the main point. Why not just source a package from where ever you please? Instead of a "node_modules" folder at the root, offer a "modules" folder instead. NPM ends up limiting the names people can use and could be described as a cumbersome process. Just about every NPM package is a GitHub repo anyway, why not cut out the middle man.

It would be amazing if there was a simple schema that allowed you to source packages from Git over ssh, http servers and local file systems. To follow the UNIX philosophy well there is likely a few separate projects would have to spin out of the idea. I just feel that eventually NPM has to go, or any body which claims an amount authority as to what stays and goes.

For inspiration there are these projects...

https://github.com/duojs/duo (this is really focused on the browser, but the core idea is key)
https://github.com/ismotgroup/bring (the concept of replacing "require" with an alternative is really the point here)

Thanks!

windows install

npm ERR! Windows_NT 6.1.7601
npm ERR! node v0.10.26
npm ERR! npm  v2.1.14
npm ERR! code ELIFECYCLE

npm ERR! [email protected] preinstall: `sam/build.sh`
npm ERR! Exit status 1

the binode installer should mention that you need to compile samtools BEFORE installing binode

Personas and Pathways - plan for your project users and their participation

This was an activity started at the Mozilla Working Open Workshop in Montreal

"This activity is designed to help you identify potential users and contributors, understand their goals and motivations, help them find a way into your project, and grow them into strong, committed community members."

It might be useful to finish this exercise so that we write down our target community.

If you want to contribute to this issue, please check the slides and the exercise.

Context:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.