Giter Site home page Giter Site logo

tcezard / eva-assembly-ingestion Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ebivariation/eva-assembly-ingestion

0.0 1.0 0.0 100 KB

Automation for ingesting a new assembly for a species

License: Apache License 2.0

Shell 3.26% Python 74.49% Java 3.31% Nextflow 18.95%

eva-assembly-ingestion's Introduction

eva-assembly-ingestion

Automation for ingesting a new assembly for a species (taxonomy) in the EVA databases.

Scripts

Adding a target assembly

Primary job for ingesting a new assembly, including finding assemblies for the taxonomy that need to be remapped, doing the remapping and clustering, and updating all related metadata and other databases.

Supports the following tasks:

  • load_tracker: Retrieves source assemblies and number of studies and loads into the tracker. Will not load if any jobs exist for this taxonomy / target assembly pair.
  • remap_cluster: Remaps all source assemblies in the tracker and clusters on the target assembly. Will only start or resume jobs not marked as complete.
  • update_dbs: Updates the following with the new assembly: supported assembly table, metadata, and contig alias. Will not do any updates if any incomplete jobs are present in the tracker.

Example usage:

# Run everything
add_target_assembly.py --taxonomy 9031 --target_assembly GCA_016699485.1 --release_version 5

# Run remapping and clustering only, resume and run on a specific instance
add_target_assembly.py --taxonomy 9031 --target_assembly GCA_016699485.1 --release_version 5 --tasks remap_cluster --instance 3 --resume

Custom assembly generation

Executable to generate custom assemblies and assembly reports. This is called in the main target assembly job and can be used for other remapping jobs as well.

# Standard run
get_custom_assembly.py --assembly-accession GCA_016699485.1 --fasta-file /path/to/fasta --report-file /path/to/report

# Disable contig renaming
get_custom_assembly.py --assembly-accession GCA_016699485.1 --fasta-file /path/to/fasta --report-file /path/to/report --no-rename

Configuration

The scripts require a configuration YAML file proving the locations of executables and other parameters. The default config is .assembly_config.yml in the user's home, or a path can be provided via the environment variable ASSEMBLYCONFIG.

A complete config looks like the following:

maven:
  environment: development
  settings_file: /path/to/settings.xml

remapping:
  base_directory: /path/to/remapping_dir

eutils_api_key: 12345

genome_downloader:
  output_directory: /path/to/genomes_dir

executable:
  python_activate: /path/to/remapping_env
  nextflow: /path/to/nextflow
  bcftools: /path/to/bcftools
  samtools: /path/to/samtools
  bedtools: /path/to/bedtools
  minimap2: /path/to/minimap
  bgzip: /path/to/bgzip
  tabix: /path/to/tabix
  genome_downloader: /path/to/genome_downloader
  custom_assembly: /path/to/custom_assembly

jar:
  vcf_extractor: /path/to/extraction.jar
  vcf_ingestion: /path/to/ingestion.jar
  clustering: /path/to/clustering.jar

nextflow:
  remapping: /path/to/remapping.nf

eva-assembly-ingestion's People

Contributors

apriltuesday avatar tcezard avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.