Giter Site home page Giter Site logo

epaule / blobtools2 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from blobtoolkit/blobtoolkit

0.0 1.0 0.0 437 KB

Redesigning BlobTools to support interactive data exploration

Home Page: http://blobtoolkit.genomehubs.org

License: MIT License

Python 99.33% Shell 0.67%

blobtools2's Introduction

BlobTools2

MIT License Python 3.6 DOI

A new implementation of BlobTools with support for interactive data exploration via the BlobToolKit viewer.

More information and tutorials are available at blobtoolkit.genomehubs.org/blobtools2/

BlobToolKit is described in our BlobToolKit paper:

BlobToolKit โ€“ Interactive quality assessment of genome assemblies Richard Challis, Edward Richards, Jeena Rajan, Guy Cochrane, Mark Blaxter G3: GENES, GENOMES, GENETICS April 1, 2020 vol. 10 no. 4 1361-1374; https://doi.org/10.1534/g3.119.400908

About

Similar to BlobTools v1, BlobTools2 is a command line tool designed to aid genome assembly QC and contaminant/cobiont detection and filtering. In addition to supporting interactive visualisation, a motivation for this reimplementation was to provide greater flexibility to include new types of information, such as BUSCO results and BLAST hit distributions.

BlobTools2 supports command-line filtering of datasets, assembly files and read files based on values or categories assigned to assembly contigs/scaffolds through the blobtools filter command. Interactive filters and selections made using the BlobToolKit Viewer can be reproduced on the command line and used to generate new, filtered datasets which retain all fields from the original dataset.

BlobTools2 is built around a file-based data structure, with data for each field contained in a separate JSON file within a directory (BlobDir) containing a single meta.json file with metadata for each field and the dataset as a whole. Additional fields can be added to an existing BlobDir using the blobtools add command, which parses an input to generate one or more additional JSON files and updates the dataset metadata. Fields are treated as generic datatypes, Variable (e.g. gc content, length and coverage), Category (e.g. taxonomic assignment based on BLAST hits) alongside Array and MultiArray datatypes to store information such as start, end, NCBI taxid and bitscore for a set of blast hits to a single sequence. Support for new analyses can be added to BlobTools2 by creating a new python module with an appropriate parse function.

To learn more about the development of the BlobTools approach, take a look at the papers by Laetsch DR and Blaxter ML, 2017 and Kumar et al., 2013.

Installing

The most up to date instructions can be found at blobtoolkit.genomehubs.org/install/.

Examples

Create a new dataset

Adding more data

Setting dataset metadata

Filtering datasets

Visualising datasets

Contributing

If you find a problem or want to suggest a feature, please submit an issue.

If you want to contribute code, pull requests are welcome but please make sure your code passes the linting, testing and style checks before submitting a pull request. Additional development dependencies are listed in requirements.txt

Run linter/testing locally:

./run_tests.sh

Set up pre-commit hook to automate test running:

ln -s ../../pre-commit.sh .git/hooks/pre-commit

blobtools2's People

Contributors

epaule avatar rjchallis avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.