Bio4j is a bioinformatics graph based DB including most data available in Uniprot KB (SwissProt + Trembl), Gene Ontology (GO), UniRef (50,90,100), RefSeq, NCBI Taxonomy, and Expasy Enzyme DB.
Bio4j provides a completely new and powerful framework for protein related information querying and management. Since it relies on a high-performance graph engine, data is stored in a way that semantically represents its own structure. On the contrary, traditional relational databases must flatten the data they represent into tables, creating artificial ids in order to connect the different tuples; which can in some cases eventually lead to domain models that have almost nothing to do with the actual structure of data.
-
First of all, Bio4j has an Abstract Domain Model, which allows you to use it without binding to a particular backend implementation.
-
Next, it has an intermediate Blueprints layer, which allows us to make a default implementation of the abstract interface using Tinkerpop Blueprints API and at the same time stay independent from the choice of database technology.
-
And finally, there are technology specific versions:
Bio4j includes a few different data sources and you may not always be interested in having all of them together. That’s why the importing process is modular and customizable, allowing you to import just the data you are interested in.
Also, Bio4j has Statika-based module system, which dramatically simplifies the process of building and deploying custom releases of Bio4j.
In Bio4j data is organized in a way semantically equivalent to what it represents thanks to the graph structure. That means that queries which would even be impossible to perform with a standard Relational DB, can be feasible with Bio4j obtaining good performance results.
Bio4j is an open source platform released under AGPLv3.
- Getting started
- Domain model
- Bio4j modules
- Importing Bio4j
- Entry points and Indexing:
- FAQ
- API Docs: v0.11.0
- Examples
To use it in your sbt-project, add this to build.sbt
:
resolvers += "Era7 maven releases" at "http://releases.era7.com.s3.amazonaws.com"
libraryDependencies += "bio4j" % "bio4j" % "0.11.0"
There is a google user group available for Bio4j. Here you can post any question or general issue you may have related to Bio4j project.
Bio4j twitter account @bio4j is quite active, follow us if you want to be up to date with new features and project versions.
Bio4j LinkedIn group
You can check or open new issues in the Bio4j repository issue tracker.