nbisweden / earth-biogenome-project-pilot Goto Github PK

View Code? Open in Web Editor NEW

9.0 36.0 8.0 394 KB

Assembly and Annotation workflows for analysing data in the Earth Biogenome Project pilot project.

Home Page: https://www.earthbiogenome.org/

License: GNU General Public License v3.0

Nextflow 97.62% Dockerfile 1.76% Awk 0.38% Groovy 0.24%

earth-biogenome-project-pilot's Introduction

Earth Biogenome Project - Pilot Workflow

The primary workflow for the Earth Biogenome Project Pilot at NBIS.

Workflow overview

General aim:

flowchart LR
    hifi[/ HiFi reads /] --> data_inspection
    ont[/ ONT reads /] -->  data_inspection
    hic[/ Hi-C reads /] --> data_inspection
    data_inspection[[ Data inspection ]] --> preprocessing
    preprocessing[[ Preprocessing ]] --> assemble
    assemble[[ Assemble ]] --> validation
    validation[[ Assembly validation ]] --> curation
    curation[[ Assembly curation ]] --> validation

Current implementation:

flowchart TD
    input[/ Input file/] --> hifi
    input --> hic
    input --> taxonkit[[ TaxonKit name2taxid/reformat ]]
    taxonkit --> goat_taxon[[ GOAT taxon search ]]
    goat_taxon --> busco
    goat_taxon --> dtol[[ DToL lookup ]]
    hifi --> samtools_fa[[ Samtools fasta ]]
    samtools_fa --> fastk_hifi
    samtools_fa --> mash_screen
    hifi[/ HiFi reads /] --> fastk_hifi[[ FastK - HiFi ]]
    hifi --> meryl_hifi[[ Meryl - HiFi ]]
    hic[/ Hi-C reads /] --> fastk_hic[[ FastK - Hi-C ]]
    hifi --> meryl_hic[[ Meryl - Hi-C ]]
    assembly[/ Assembly /] --> quast[[ Quast ]]
    fastk_hifi --> histex[[ Histex ]]
    histex --> genescopefk[[ GeneScopeFK ]]
    fastk_hifi --> ploidyplot[[ PloidyPlot ]]
    fastk_hifi --> katgc[[ KatGC ]]
    fastk_hifi --> merquryfk[[ MerquryFK ]]
    assembly --> merquryfk
    meryl_hifi --> merqury[[ Merqury ]]
    assembly --> merqury
    fastk_hifi --> katcomp[[ KatComp ]]
    fastk_hic --> katcomp
    assembly --> busco[[ Busco ]]
    refseq_sketch[( RefSeq sketch )] --> mash_screen[[ Mash Screen ]]
    hifi --> mash_screen
    fastk_hifi --> hifiasm[[ HiFiasm ]]
    hifiasm --> assembly
    assembly --> purgedups[[ Purgedups ]]
    input --> mitoref[[ Mitohifi - Find reference ]]
    assembly --> mitohifi[[ Mitohifi ]]
    assembly --> fcsgx[[ FCS GX ]]
    fcs_fetchdb[( FCS fetchdb )] --> fcsgx
    mitoref --> mitohifi
    genescopefk --> quarto[[ Quarto ]]
    goat_taxon --> multiqc[[ MultiQC ]]
    quarto --> multiqc
    dtol --> multiqc
    katgc --> multiqc
    ploidyplot --> multiqc
    busco --> multiqc
    quast --> multiqc

Usage

nextflow run -params-file <params.yml> \
    [ -c <custom.config> ] \
    [ -profile <profile> ] \
    NBISweden/Earth-Biogenome-Project-pilot

where:

params.yml is a YAML formatted file containing workflow parameters such as input paths to the assembly specification, and settings for tools within the workflow.

Example:

input: 'assembly_spec.yml'
outdir: results
fastk: # Optional
  kmer_size: 31 # default 31
genescopefk: # Optional
  kmer_size: 31 # default 31
hifiasm: # Optional, default = no extra options: Key (e.g. 'opts01') is used in assembly build name (e.g., 'hifiasm-raw-opts01').
  opts01: "--opts A"
  opts02: "--opts B"
busco: # Optional, default: retrieved from GOAT
  lineages: 'auto' # comma separated string of lineages or auto.

Alternatively parameters can be provided on the command-line using the --parameter notation (e.g., --input <path> ).

<custom.config> is a Nextflow configuration file which provides additional configuration. This is used to customise settings other than workflow parameters, such as cpus, time, and command-line options to tools.

Example:

process {
    withName: 'BUSCO' {  // Selects the process to apply settings.
        cpus     = 6     // Overrides cpu settings defined in nextflow.config
        time     = 4.d   // Overrides time settings defined in nextflow.config to 4 days. Use .h for hours, .m for minutes.
        memory   = '20GB'  // Overrides memory settings defined in nextflow.config to 20 GB.
        // ext.args supplies command-line options to the process tool
        // overrides settings found in configs/modules.config
        ext.args = '--long'  // Supplies these as command-line options to Busco
    }
}

<profile> is one of the preconfigured execution profiles (uppmax, singularity_local, docker_local, etc: see nextflow.config). Alternatively, you can provide a custom configuration to configure this workflow to your execution environment. See Nextflow Configuration for more details.

Workflow parameter inputs

Mandatory:

input: A YAML formatted input file. Example assembly_spec.yml (See also test profile input TODO:: Update test profile):

sample:                          # Required: Meta data
  name: 'Laetiporus sulphureus'  # Required: Species name. Correct spelling is important to look up species information.
  ploidy: 2                      # Optional: Estimated ploidy (default: retrieved from GOAT)
  genome_size: 2345              # Optional: Estimated genome size (default: retrieved from GOAT)
  haploid_number: 13             # Optional: Estimated haploid chromosome count (default: retrieved from GOAT)
  taxid: 5630                    # Optional: Taxon ID (default: retrieved with Taxonkit)
  kingdom: Eukaryota             # Optional: (default: retrived with Taxonkit)
assembly:                        # Optional: List of assemblies to curate and validate.
  - assembler: hifiasm           # For each entry, the assembler,
    stage: raw                   # stage of assembly,
    id: uuid                     # unique id,
    pri_fasta: /path/to/primary_asm.fasta # and paths to sequences are required.
    alt_fasta: /path/to/alternate_asm.fasta
    pri_gfa: /path/to/primary_asm.gfa
    alt_gfa: /path/to/alternate_asm.gfa
  - assembler: ipa
    stage: raw
    id: uuid
    pri_fasta: /path/to/primary_asm.fasta
    alt_fasta: /path/to/alternate_asm.fasta
hic:                             # Optional: List of hi-c reads to QC and use for scaffolding
  - read1: '/path/to/raw/data/hic/LS_HIC_R001_1.fastq.gz'
    read2: '/path/to/raw/data/hic/LS_HIC_R001_2.fastq.gz'
hifi:                            # Required: List of hifi-reads to QC and use for assembly/validation
  - reads: '/path/to/raw/data/hifi/LS_HIFI_R001.bam'
rnaseq:                          # Optional: List of Rna-seq reads to use for validation
  - read1: '/path/to/raw/data/rnaseq/LS_RNASEQ_R001_1.fastq.gz'
    read2: '/path/to/raw/data/rnaseq/LS_RNASEQ_R001_2.fastq.gz'
isoseq:                          # Optional: List of Isoseq reads to use for validation
  - reads: '/path/to/raw/data/isoseq/LS_ISOSEQ_R001.bam'

Optional:

outdir: The publishing path for results (default: results).
publish_mode: (values: 'symlink' (default), 'copy') The file publishing method from the intermediate results folders (see Table of publish modes).
steps: The workflow steps to execute (default is all steps). Choose from:
- inspect: 01 - Read inspection
- preprocess: 02 - Read preprocessing
- assemble: 03 - Assembly
- purge: 04 - Duplicate purging
- polish: 05 - Error polishing
- screen: 06 - Contamination screening
- scaffold: 07 - Scaffolding
- curate: 08 - Rapid curation
- alignRNA: 09 - Align RNAseq data

Software specific:

Tool specific settings are provided by supplying values to specific keys or supplying an array of settings under a tool name. The input to -params-file would look like this:

input: assembly.yml
outdir: results
fastk:
  kmer_size: 31
genescopefk:
  kmer_size: 31
hifiasm:
  opts01: "--opts A"
  opts02: "--opts B"
busco:
  lineages: 'auto'

multiqc_config: Path to MultiQC configuration file (default: configs/multiqc_conf.yaml).

Uppmax and PDC cluster specific:

project: NAISS Compute allocation number.

Workflow outputs

All results are published to the path assigned to the workflow parameter results.

TODO:: List folder contents in results file

Customization for Uppmax

A custom profile named uppmax is available to run this workflow specifically on UPPMAX clusters. The process executor is slurm so jobs are submitted to the Slurm Queue Manager. All jobs submitted to slurm must have a project allocation. This is automatically added to the clusterOptions in the uppmax profile. All Uppmax clusters have node local disk space to do computations, and prevent heavy input/output over the network (which slows down the cluster for all). The path to this disk space is provided by the $SNIC_TMP variable, used by the process.scratch directive in the uppmax profile. Lastly the profile enables the use of Singularity so that all processes must be executed within Singularity containers. See nextflow.config for the profile specification.

The profile is enabled using the -profile parameter to nextflow:

nextflow run -profile uppmax <nextflow_script>

A NAISS compute allocation should also be supplied using the --project parameter.

Customization for PDC

A custom profile named dardel is available to run this workflow specifically on the PDC cluster Dardel. The process executor is slurm so jobs are submitted to the Slurm Queue Manager. All jobs submitted to slurm must have a project allocation. This is automatically added to the clusterOptions in the dardel profile. Calculations are performed in the scratch space allocated by PDC_TMP which is also on the lustre file system and is not node local storage. The path to this disk space is provided by the $PDC_TMP variable, used by the process.scratch directive in the dardel profile. Lastly the profile enables the use of Singularity so that all processes must be executed within Singularity containers. See nextflow.config for the profile specification.

The profile is enabled using the -profile parameter to nextflow:

nextflow run -profile dardel <nextflow_script>

A NAISS compute allocation should also be supplied using the --project parameter.

Workflow organization

The workflows in this folder manage the execution of your analyses from beginning to end.

workflow/
 | - .github/                        Github data such as actions to run
 | - assets/                         Workflow assets such as test samplesheets
 | - bin/                            Custom workflow scripts
 | - configs/                        Configuration files that govern workflow execution
 | - dockerfiles/                    Custom container definition files
 | - docs/                           Workflow usage and interpretation information
 | - modules/                        Process definitions for tools used in the workflow
 | - subworkflows/                   Custom workflows for different stages of the main analysis
 | - tests/                          Workflow tests
 | - main.nf                         The primary analysis script
 | - nextflow.config                 General Nextflow configuration
 \ - modules.json                    nf-core file which tracks modules/subworkflows from nf-core

earth-biogenome-project-pilot's People

Contributors

Stargazers

Watchers

Forkers

mahesh-panchal mptrsen sgtss ksenia-krasheninnikova nylander martinpippel gbdias estel-kitsune

earth-biogenome-project-pilot's Issues

Hifiasm missing p_ctgs in output.

Describe the bug
Hifiasm process does not output *asm.p_ctgs.fa. Only hap1 and hap2 are produced.

To Reproduce
Steps to reproduce the behavior:
Ran the pipeline (version 2b8526a) with PacBio HiFi data only.

#! /usr/bin/env bash
#SBATCH -A naiss2023-5-307
#SBATCH -p core
#SBATCH -n 1
#SBATCH -t 1-00:00:00
#SBATCH -J gc_ebp

RESULTS="${PWD/analyses/data/outputs}"
NEXTFLOW_OPTS=${NEXTFLOW_OPTS:-"-resume -ansi-log false"}
export NXF_SINGULARITY_CACHEDIR=${NXF_SINGULARITY_CACHEDIR:-"/proj/snic2021-6-194/nobackup/ebp-singularity-cache"}

source activate nextflow-env

nextflow run /home/guibo205/git/NBIS/Earth-Biogenome-Project-pilot $NEXTFLOW_OPTS \
    -profile uppmax,execution_report \
    --input assembly_parameters.yml \
    --outdir "${RESULTS}" \
    --project 'naiss2023-5-307'

nextflow clean -f -before $( nextflow log -q | tail -n 1 )

Expected behavior
I expect asm.p_ctg.fa to be produced, as well as hap1 and hap2.

Screenshots

├── 03_assembly
│   └── null
│       ├── Gomphus_clavatus.asm.bp.hap1.p_ctg.assembly_summary
│       ├── Gomphus_clavatus.asm.bp.hap1.p_ctg.fasta.gz
│       ├── Gomphus_clavatus.asm.bp.hap1.p_ctg.gfa
│       ├── Gomphus_clavatus.asm.bp.hap1.p_ctg.gfa.gz
│       ├── Gomphus_clavatus.asm.bp.hap2.p_ctg.assembly_summary
│       ├── Gomphus_clavatus.asm.bp.hap2.p_ctg.fasta.gz
│       ├── Gomphus_clavatus.asm.bp.hap2.p_ctg.gfa
│       ├── Gomphus_clavatus.asm.bp.hap2.p_ctg.gfa.gz
│       ├── Gomphus_clavatus.asm.bp.p_utg.gfa
│       ├── Gomphus_clavatus.asm.bp.r_utg.gfa
│       ├── Gomphus_clavatus.asm.ec.bin
│       ├── Gomphus_clavatus.asm.ovlp.reverse.bin
│       └── Gomphus_clavatus.asm.ovlp.source.bin

Desktop (please complete the following information):

OS: CentOS Linux 7

New Module: La Jolla Assembler

Which tool should be included?
La Jolla Assembler

How is it used?
https://www.nature.com/articles/s41587-022-01220-6

Which workflow should it be included in?
Assemblers

Custom classes to properly maintain metadata

This is a suggestion that may not be feasible, but perhaps using custom objects might make handling meta data easier.

The issue is that meta map manipulation is a common feature of workflows because due to using the nf-core way of passing meta data around the workflow. This often means meta-maps are manipulated putting things in and out to get the correct fields to join on.

Being able to use custom classes might make the metadata handling situation better. As for the most part, these are simply data stores. The primary issue though is handling file staging. This maybe possible though by extending nextflow's ArrayBag class.

It's possible to use custom classes https://github.com/mahesh-panchal/nxf-custom-object-test at least as input and perform operations on.

Potential meta data objects

Sample:

ID:
Ploidy:
Kmersize:
taxon ID:

Busco_lineage:

Read data:

Read1
Read2: null if no pair
single_end: true if not paired end data.
Kmercov
readcov
readgroup

Assembly:

primary: hap1 or consensus sequence
alternate

New Module: DNAapler

Which tool should be included?
DNAapler : https://github.com/gbouras13/dnaapler
Reorients microbial genomes

New Module: BUSCO

We need a module for busco.

Originally on Rackham it's something like this:

module load bioinfo-tools BUSCO
JOB=$SLURM_ARRAY_TASK_ID

GENOME=input_genome.fasta
PREFIX=for_the_output

run_BUSCO.py -i "${GENOME}" -m geno -l vertebrata_odb10 -o "${PREFIX}"

New Module: Yahs

Which tool should be included?
https://github.com/c-zhou/yahs
https://www.biorxiv.org/content/10.1101/2022.06.09.495093v1

How is it used?

yahs contigs.fa hic-to-contigs.bam

Which workflow should it be included in?
hic scaffolding

New Module: GOAT

Which tool should be included?
Genomes on a tree

How is it used?

$ goat-cli taxon search -lt "Laetiporus sulphureus" -v odb10_lineage
taxon_id        taxon_rank      scientific_name odb10_lineage
5303    order   Polyporales     polyporales_odb10
155619  class   Agaricomycetes  agaricomycetes_odb10
5204    phylum  Basidiomycota   basidiomycota_odb10
4751    kingdom Fungi   fungi_odb10
2759    superkingdom    Eukaryota       eukaryota_odb10

https://github.com/nf-core/modules/tree/master/modules/nf-core/goat/taxonsearch

Which workflow should it be included in?
Data inspection.

New Module: Verkko

Which tool should be included?

https://github.com/marbl/verkko
https://www.biorxiv.org/content/10.1101/2022.06.24.497523v1

How is it used?

verkko -d <work-directory> --hifi <hifi-read-files> [--nano <ont-read-files>]

Which workflow should it be included in?

Full assembly workflow ?

New Module: MitoHiFi

Which tool should be included?

https://github.com/marcelauliano/MitoHiFi

How is it used?

singularity exec \
    --bind /path/on/disk/to/data/:/data/ \
    /path/to/mitohifi-v2.2.sif  \
        mitohifi.py \
            -r "/data/f1.fasta /data/f2.fasta /data/f3.fasta" \
            -f /data/reference.fasta \
            -g /data/reference.gb  \
            -t 10 \
            -o 2

Which workflow should it be included in?

Organelle assembly workflow

Question: How to run EVALUATE_ASSEMBLY process without the 'assemble' step

Now that the assembly evaluation is included in the 'assemble' workflow step, how do I evaluate an assembly produced outside of the pipeline?

I can skip assembly by using the steps parameter:

params {
    steps              = 'inspect,preprocess,screen'
}

But that skips evaluation as well.

New Module: Hostile

https://github.com/bede/hostile

New Module: Pairtools

Which tool should be included?
Pairtools

How is it used?

https://github.com/nf-core/modules/tree/master/modules/nf-core/pairtools

Which workflow should it be included in?

Hi-C scaffolding

Output report discussion

Aim

We need a report to summarise output.

What should be in the report?

Need versioning.
We should have a way to highlight what's changed since the last report.

Can this all be in MultiQC, or do we need a Quarto or something else report?

Decisions

The report creation is automated by the workflow.

Upgrade k-mer profiling with FastK, GeneScope2, and MerquryFK.

Is your feature request related to a problem? Please describe.
Upgrade k-mer profiling with FastK.

Describe the solution you'd like
Add a workflow path that allows one to select between using Meryl, or FastK.

FastK replaces Meryl
GeneScope2 replaces GenomeScope2
MerquryFK replaces Merqury, KAT, and Smudgeplot

FASTK: https://github.com/thegenemyers/FASTK
GeneScope2: https://github.com/thegenemyers/GENESCOPE.FK
MerquryFK: https://github.com/thegenemyers/MERQURY.FK

The tools are not packaged either in bioconda or containers

New Module: Shasta

Which tool should be included?

Shasta

How is it used?

https://github.com/nf-core/modules/blob/master/modules/nf-core/shasta

Which workflow should it be included in?

Assembly

New Module: HiFiAdapterFilt

Which tool should be included?
HiFiAdapterFilt

How is it used?
https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-022-08375-1

Which workflow should it be included in?
Preprocessing workflow

How should we treat non-"chromosomal" contigs?

Question from Remi:

How should we treat the non-“chromosomal” contigs, i.e. the debris I filtered out in the inspector hits?

params.yml template doesn't exist

Describe the bug
The params.yml described in the README doesn't exist.

To Reproduce
Check the README

Expected behavior
A link to the params.yml file.

New Module: Telomere Identifier

Which tool should be included?

https://github.com/tolkit/telomeric-identifier

conda install -c bioconda tidk

New Module: teloclip

https://github.com/Adamtaranto/teloclip

A tool for the recovery of unassembled telomeres from soft-clipped read alignments.

New Module: PREP_HIFI

The HiFi reads come in a BAM file, which is not easily parsed downstream, so converting to FASTA.

originally:

#!/usr/bin/env bash
module load bioinfo-tools samtools

output=/where/to/write

while read name read; do
	samtools fasta "${read}" > "${output}/${name}.fasta"
done < list.tab

What are the “streaking” artefacts in the Hi-C contact maps?

Question from Remi:

What are the “streaking” artefacts in the Hi-C contact maps? I’ve seen them in some other assemblies, but have never drilled down into what is causing this. Is it just repeats?

Pipeline fail at BUSCO and PURGE_DUPS

Describe the bug
Pipeline execution trace shows failed status for the process EVALUATE_ASSEMBLY:BUSCO, and aborted status for the process PURGE_DUPLICATES:MINIMAP2_ALIGN_READS.

To Reproduce
Steps to reproduce the behavior:

RESULTS="${PWD/analyses/data/outputs}"
NEXTFLOW_OPTS=${NEXTFLOW_OPTS:-"-resume -ansi-log false"}
export NXF_SINGULARITY_CACHEDIR=${NXF_SINGULARITY_CACHEDIR:-"/proj/snic2021-6-194/nobackup/ebp-singularity-cache"}

source activate nextflow-env

nextflow run /home/guibo205/git/NBIS/Earth-Biogenome-Project-pilot $NEXTFLOW_OPTS \
    -profile uppmax,execution_report \
    --input assembly_parameters.yml \
    --outdir "${RESULTS}" \
    --project 'naiss2023-5-307' \
    -c custom.config

nextflow clean -f -before $( nextflow log -q | tail -n 1 )

# Mandatory - sample metadata
sample:
  id: 'Gomphus_clavatus'
  kmer_size: 31
  ploidy: 2
  busco_linages:
    - 'bacteria_odb10'
    - 'basidiomycota_odb10'
    - 'agaricomycetes_odb10'
# Optional - frozen/finalized assemblies
#assembly:
#  - id: 'prefix-buildID'
#    pri_fasta: '/path/to/data'
#    alt_fasta: '/path/to/data'
# Optional - Hi-C data if available
#hic:
#  - read1: ''
#    read2: '/path/to/data'
# Optional - HiFi data if available
hifi:
  - reads: '/proj/snic2021-6-194/VREBP-Gomphus_clavatus-2023-AsmAnno/data/raw-data/PacBio-HiFi-WGS/hifiwgs.fastq.gz'
#  - reads: '/path/to/data'
# Optional - RNASeq data if available
rnaseq:
  - read1: '/proj/snic2021-6-194/VREBP-Gomphus_clavatus-2023-AsmAnno/data/raw-data/Illumina-RNAseq/rnaseq_R1.fastq.gz'
    read2: '/proj/snic2021-6-194/VREBP-Gomphus_clavatus-2023-AsmAnno/data/raw-data/Illumina-RNAseq/rnaseq_R2.fastq.gz'
# Optional - Isoseq data if available
isoseq:
  - reads: '/proj/snic2021-6-194/VREBP-Gomphus_clavatus-2023-AsmAnno/data/raw-data/PacBio-HiFi-ISOSEQ/hq_transcripts.fasta'

Expected behavior
Purge_dups and BUSCO should complete and generate the expected outputs.

Screenshots

#! /usr/bin/env bash
#SBATCH -A naiss2023-5-307
#SBATCH -p core
#SBATCH -n 1
#SBATCH -t 3-00:00:00
#SBATCH -J gc_ebp

RESULTS="${PWD/analyses/data/outputs}"
NEXTFLOW_OPTS=${NEXTFLOW_OPTS:-"-resume -ansi-log false"}
export NXF_SINGULARITY_CACHEDIR=${NXF_SINGULARITY_CACHEDIR:-"/proj/snic2021-6-194/nobackup/ebp-singularity-cache"}

source activate nextflow-env

nextflow run /home/guibo205/git/NBIS/Earth-Biogenome-Project-pilot $NEXTFLOW_OPTS \
    -profile uppmax,execution_report \
    --input assembly_parameters.yml \
N E X T F L O W  ~  version 23.04.1
WARN: It appears you have never run this project before -- Option `-resume` is ignored
Launching `/home/guibo205/git/NBIS/Earth-Biogenome-Project-pilot/main.nf` [happy_hilbert] DSL2 - revision: c23d5c5980

    Running NBIS Earth Biogenome Project Assembly workflow.

Pulling Singularity image docker://quay.io/biocontainers/hifiasm:0.19.8--h43eeafb_0 [cache /proj/snic2021-6-194/nobackup/ebp-singularity-cache/quay.io-biocontainers-hifiasm-0.19.8--h43eeafb_0.img]
[40/ca3078] Submitted process > BUILD_HIFI_DATABASES:FASTK_FASTK (Gomphus_clavatus)
Staging foreign file: https://gembox.cbcb.umd.edu/mash/refseq.genomes%2Bplasmid.k21s1000.msh
[ee/835901] Submitted process > HIFIASM (Gomphus_clavatus)
[6f/50f0ad] Submitted process > SCREEN_READS:MASH_SCREEN (Gomphus_clavatus)
[59/8de15b] Submitted process > GENOME_PROPERTIES:MERQURYFK_PLOIDYPLOT (Gomphus_clavatus)
[56/5808a4] Submitted process > GENOME_PROPERTIES:MERQURYFK_KATGC (Gomphus_clavatus)
[2d/ebf96a] Submitted process > GENOME_PROPERTIES:FASTK_HISTEX (Gomphus_clavatus)
[9f/32940b] Submitted process > GENOME_PROPERTIES:GENESCOPEFK (Gomphus_clavatus)
[cb/44927f] Submitted process > SCREEN_READS:MASH_FILTER (Gomphus_clavatus)
[bb/58374f] Submitted process > GFASTATS (Gomphus_clavatus)
[45/eed2ad] Submitted process > GFATOOLS_GFA2FA (Gomphus_clavatus)
[20/36f3b4] Submitted process > GFATOOLS_GFA2FA (Gomphus_clavatus)
[1b/9ad4a6] Submitted process > GFASTATS (Gomphus_clavatus)
[e9/dd136b] Submitted process > EVALUATE_ASSEMBLY:BUSCO (hifiasm-auto)
[b6/dcd967] Submitted process > PURGE_DUPLICATES:PURGEDUPS_SPLITFA_PRIMARY (hifiasm)
[05/981ea1] Submitted process > PURGE_DUPLICATES:MINIMAP2_ALIGN_READS (hifiasm)
[05/a16ade] Submitted process > COMPARE_ASSEMBLIES:QUAST (Gomphus_clavatus)
[ff/ff9cc9] Submitted process > PURGE_DUPLICATES:MINIMAP2_ALIGN_ASSEMBLY_PRIMARY (hifiasm)
ERROR ~ Error executing process > 'EVALUATE_ASSEMBLY:BUSCO (hifiasm-auto)'

Caused by:
  Missing output file(s) `*-busco/*/run_*/busco_sequences` expected by process `EVALUATE_ASSEMBLY:BUSCO (hifiasm-auto)`

Command executed:

  # Nextflow changes the container --entrypoint to /bin/bash (container default entrypoint: /usr/local/env-execute)
  # Check for container variable initialisation script and source it.
  if [ -f "/usr/local/env-activate.sh" ]; then
      set +u  # Otherwise, errors out because of various unbound variables
      . "/usr/local/env-activate.sh"
      set -u
  fi

  # If the augustus config directory is not writable, then copy to writeable area
  if [ ! -w "${AUGUSTUS_CONFIG_PATH}" ]; then
      # Create writable tmp directory for augustus
      AUG_CONF_DIR=$( mktemp -d -p $PWD )
      cp -r $AUGUSTUS_CONFIG_PATH/* $AUG_CONF_DIR
      export AUGUSTUS_CONFIG_PATH=$AUG_CONF_DIR
      echo "New AUGUSTUS_CONFIG_PATH=${AUGUSTUS_CONFIG_PATH}"
  fi

  # Ensure the input is uncompressed
  INPUT_SEQS=input_seqs
  mkdir "$INPUT_SEQS"
  cd "$INPUT_SEQS"
  for FASTA in ../tmp_input/*; do
      if [ "${FASTA##*.}" == 'gz' ]; then
          gzip -cdf "$FASTA" > $( basename "$FASTA" .gz )
      else
          ln -s "$FASTA" .
      fi
  done
  cd ..

  busco \
      --cpu 6 \
      --in "$INPUT_SEQS" \
      --out hifiasm-auto-busco \
      --auto-lineage \
       \
       \
      --mode genome

  # clean up
  rm -rf "$INPUT_SEQS"

  # Move files to avoid staging/publishing issues
  mv hifiasm-auto-busco/batch_summary.txt hifiasm-auto-busco.batch_summary.txt
  mv hifiasm-auto-busco/*/short_summary.*.{json,txt} . || echo "Short summaries were not available: No genes were found."

  cat <<-END_VERSIONS > versions.yml
  "EVALUATE_ASSEMBLY:BUSCO":
      busco: $( busco --version 2>&1 | sed 's/^BUSCO //' )
  END_VERSIONS

Command exit status:
  0

Command output:
  2023-11-14 00:51:40 INFO:     [hmmsearch]     51 of 255 task(s) completed
  2023-11-14 00:51:41 INFO:     [hmmsearch]     77 of 255 task(s) completed
  2023-11-14 00:51:41 INFO:     [hmmsearch]     102 of 255 task(s) completed
  2023-11-14 00:51:41 INFO:     [hmmsearch]     128 of 255 task(s) completed
  2023-11-14 00:51:42 INFO:     [hmmsearch]     153 of 255 task(s) completed
  2023-11-14 00:51:43 INFO:     [hmmsearch]     179 of 255 task(s) completed
  2023-11-14 00:51:44 INFO:     [hmmsearch]     204 of 255 task(s) completed
  2023-11-14 00:51:44 INFO:     [hmmsearch]     230 of 255 task(s) completed
  2023-11-14 00:51:47 INFO:     [hmmsearch]     255 of 255 task(s) completed
  2023-11-14 00:51:48 INFO:     Results:        C:94.1%[S:93.7%,D:0.4%],F:4.3%,M:1.6%,n:255

  2023-11-14 00:51:48 INFO:     Extracting missing and fragmented buscos from the file refseq_db.faa...
  2023-11-14 00:51:50 INFO:     Running 1 job(s) on metaeuk, starting at 11/14/2023 00:51:50
  2023-11-14 00:57:32 INFO:     [metaeuk]       1 of 1 task(s) completed
  2023-11-14 00:57:33 INFO:     ***** Run HMMER on gene sequences *****
  2023-11-14 00:57:33 INFO:     Running 15 job(s) on hmmsearch, starting at 11/14/2023 00:57:33
  2023-11-14 00:57:34 INFO:     [hmmsearch]     2 of 15 task(s) completed
  2023-11-14 00:57:34 INFO:     [hmmsearch]     3 of 15 task(s) completed
  2023-11-14 00:57:34 INFO:     [hmmsearch]     5 of 15 task(s) completed
  2023-11-14 00:57:34 INFO:     [hmmsearch]     6 of 15 task(s) completed
  2023-11-14 00:57:34 INFO:     [hmmsearch]     8 of 15 task(s) completed
  2023-11-14 00:57:34 INFO:     [hmmsearch]     9 of 15 task(s) completed
  2023-11-14 00:57:34 INFO:     [hmmsearch]     11 of 15 task(s) completed
  2023-11-14 00:57:34 INFO:     [hmmsearch]     12 of 15 task(s) completed
  2023-11-14 00:57:34 INFO:     [hmmsearch]     14 of 15 task(s) completed
  2023-11-14 00:57:34 INFO:     [hmmsearch]     15 of 15 task(s) completed
  2023-11-14 00:57:41 INFO:     Validating exons and removing overlapping matches
  2023-11-14 00:57:42 INFO:     Results:        C:96.1%[S:95.7%,D:0.4%],F:2.7%,M:1.2%,n:255

  2023-11-14 00:57:43 INFO:     eukaryota_odb10 selected

  2023-11-14 00:57:43 INFO:     ***** Searching tree for chosen lineage to find best taxonomic match *****

  2023-11-14 00:57:44 INFO:     Extract markers...
  2023-11-14 00:57:44 INFO:     Downloading file 'https://busco-data.ezlab.org/v5/data/placement_files/list_of_reference_markers.eukaryota_odb10.2019-12-16.txt.tar.gz'
  2023-11-14 00:57:44 INFO:     Decompressing file '/scratch/42565899/nxf.TXgxyRoRSA/busco_downloads/placement_files/list_of_reference_markers.eukaryota_odb10.2019-12-16.txt.tar.gz'
  2023-11-14 00:57:44 INFO:     Downloading file 'https://busco-data.ezlab.org/v5/data/placement_files/tree.eukaryota_odb10.2019-12-16.nwk.tar.gz'
  2023-11-14 00:57:45 INFO:     Decompressing file '/scratch/42565899/nxf.TXgxyRoRSA/busco_downloads/placement_files/tree.eukaryota_odb10.2019-12-16.nwk.tar.gz'
  2023-11-14 00:57:45 INFO:     Downloading file 'https://busco-data.ezlab.org/v5/data/placement_files/tree_metadata.eukaryota_odb10.2019-12-16.txt.tar.gz'
  2023-11-14 00:57:46 INFO:     Decompressing file '/scratch/42565899/nxf.TXgxyRoRSA/busco_downloads/placement_files/tree_metadata.eukaryota_odb10.2019-12-16.txt.tar.gz'
  2023-11-14 00:57:46 INFO:     Downloading file 'https://busco-data.ezlab.org/v5/data/placement_files/supermatrix.aln.eukaryota_odb10.2019-12-16.faa.tar.gz'
  2023-11-14 00:57:49 INFO:     Decompressing file '/scratch/42565899/nxf.TXgxyRoRSA/busco_downloads/placement_files/supermatrix.aln.eukaryota_odb10.2019-12-16.faa.tar.gz'
  2023-11-14 00:57:49 INFO:     Downloading file 'https://busco-data.ezlab.org/v5/data/placement_files/mapping_taxids-busco_dataset_name.eukaryota_odb10.2019-12-16.txt.tar.gz'
  2023-11-14 00:57:49 INFO:     Decompressing file '/scratch/42565899/nxf.TXgxyRoRSA/busco_downloads/placement_files/mapping_taxids-busco_dataset_name.eukaryota_odb10.2019-12-16.txt.tar.gz'
  2023-11-14 00:57:49 INFO:     Downloading file 'https://busco-data.ezlab.org/v5/data/placement_files/mapping_taxid-lineage.eukaryota_odb10.2019-12-16.txt.tar.gz'
  2023-11-14 00:57:50 INFO:     Decompressing file '/scratch/42565899/nxf.TXgxyRoRSA/busco_downloads/placement_files/mapping_taxid-lineage.eukaryota_odb10.2019-12-16.txt.tar.gz'
  2023-11-14 00:57:50 INFO:     Place the markers on the reference tree...
  2023-11-14 00:57:50 INFO:     Running 1 job(s) on sepp, starting at 11/14/2023 00:57:50
  2023-11-14 01:00:52 INFO:     [sepp]  1 of 1 task(s) completed
  Short summaries were not available: No genes were found.

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  INFO:    Environment variable SINGULARITYENV_SNIC_TMP is set, but APPTAINERENV_SNIC_TMP is preferred
  2023-11-14 01:00:52 ERROR:    Placements failed. Try to rerun increasing the memory or select a lineage manually.
  mv: cannot stat 'hifiasm-auto-busco/*/short_summary.*.json': No such file or directory
  mv: cannot stat 'hifiasm-auto-busco/*/short_summary.*.txt': No such file or directory

Work dir:
  /crex/proj/snic2021-6-194/VREBP-Gomphus_clavatus-2023-AsmAnno/analyses/01_assembly-workflow_fourth-run_rackham/work/e9/dd136be530ebb9845f40d19f627a84

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details

        The workflow completed unsuccessfully.

        Please read over the error message. If you are unable to solve it, please
        post an issue at https://github.com/NBISweden/Earth-Biogenome-Project-pilot/issues
        where we will do our best to help.

WARN: Killing running tasks (1)

Additional context
Add any other context about the problem here.

New Module: OMArk

Which tool should be included?
OMArk.
https://github.com/DessimozLab/OMArk

How is it used?

# installation
pip install omark
# download full database
wget https://omabrowser.org/All/LUCA.h5
# run search and benchmark
omamer search --db LUCA.h5 --query my_proteome.fa --out my_proteome.omamer --nthreads 10
omark -f my_proteome.omamer -d LUCA.h5 -o results

Which workflow should it be included in?
Quality control.

New Module: GFAstats

Which tool should be included?
GFA stats

How is it used?

https://github.com/nf-core/modules/tree/master/modules/nf-core/gfastats

nf-core modules install gfastats

Which workflow should it be included in?

Quality control

New Module: FCS Decontamination

Which tool should be included?
FCS Adaptor and FCS-GX

How is it used?

https://github.com/nf-core/modules/tree/master/subworkflows/nf-core/fasta_clean_fcs

Which workflow should it be included in?

Decontamination

New Module: Inspector

Which tool should be included?
Inspector: https://github.com/ChongLab/Inspector

How is it used?

inspector.py -d 'hifi' -c contigs.fa -r reads.fastq.gz -o "$PREFIX" -t "$CPUS"

Which workflow should it be included in?
Assembly validation

Reporting a warning when below coverage for GenomeScope and Smudgeplot

In general, having at least 15x coverage per homolog for GenomeScope and 25x coverage per homolog for Smudgeplot is required.

https://www.nature.com/articles/s41467-020-14998-3

If coverage is below this, we should display a warning of some kind.

New Module: Asset

Which tool should be included?
https://github.com/dfguan/asset

How is it used?

# Find Gaps
bin/detgaps $asm > $output_dir/gaps.bed

# Process Pacbio data
for fl in $pblist
do
	minimap2 -xmap-hifi -t 12 $asm $fl > $fl.paf
done

bin/ast_pb $fl1.paf $fl2.paf $fl3.paf ... >$output_dir/pb.bed 2>ast_pb.log

# Process Hi-C data 
bin/split_fa $asm > split.fa
samtools faidx split.fa 
bwa index split.fa
while read -r r1 r2
do
	prefix=`basename $r1 .fq.gz`
	dirn=`dirname $r1`
	bwa mem -SP -B10 -t12 split.fa $r1 $r2 | samtools view -b - > $dirn/$prefix.bam
done < $hiclist
bin/col_conts *.bam > $output_dir/links.mat
bin/ast_hic2 split.fa.fai $output_dir/links.mat >$output_dir/hic2.bed 2>ast_hic.log

# Accumulate evidence
bin/acc $output_dir/gaps.bed $output_dir/{pb,bn}.bed $output_dir/bn.bed > $output_dir/pb_bn.bed 
bin/acc $output_dir/gaps.bed $output_dir/{10x,hic2,bn}.bed > $output_dir/10x_hic2_bn.bed  

# Detect misassemblies
bin/pchlst -c $output_dir/gaps.bed $output_dir/pb_bn.bed > $output_dir/pchlst_ctg.bed
bin/pchlst $output_dir/gaps.bed $output_dir/10x_hic2_bn.bed > $output_dir/pchlst_scaf.bed 
bin/union_brks $output_dir/gaps.bed $output_dir/pchlst_{ctg,scaf}.bed > $output_dir/pchlst_final.bed

Which workflow should it be included in?

Curation

When should we classify a scaffold as a chromosome?

Question from Remi:

When should we classify a scaffold as a chromosome?

New Module: OMArk

https://github.com/DessimozLab/OMArk

Possible BUSCO alternative.

https://www.nature.com/articles/s41587-024-02147-w#Sec22

New Module: PGAP

https://github.com/ncbi/pgap

NCBI Prokaryotic Genome Annotation Pipeline

The NCBI Prokaryotic Genome Annotation Pipeline is designed to annotate bacterial and archaeal genomes (chromosomes and plasmids).

It's CWL pipeline though.
Do we need it?

Structured Output Folders

Aim

To make data findable, and rigid folder structure of output is needed that clearly describes content, version, and logical grouping.

Desired Features

Minimal folders to navigate.
A partition between results (data we want to keep) for public archiving, and other results that that don't need to be publicly archived.
Some kind of clear versioning method so we can keep older results, but know they're not the latest, and conversely some way to mark more recent runs which will not be used for further analyses.
A versioning method that makes it clear which former folders a tool run refers to (e.g. busco was run on build 3 of hifi-asm haplotypes).

Decisions

New Module: PretextMap

Which tool should be included?

PretextMap

How is it used?

https://github.com/nf-core/modules/tree/master/modules/nf-core/pretextmap

Which workflow should it be included in?

Hi-C Curation

New Module: CRAQ

Which tool should be included?
CRAQ described here: https://www.nature.com/articles/s41467-023-42336-w.epdf

How is it used?
Not sure

Which workflow should it be included in?
Assembly QC

New Module: HiFiasm

Which tool should be included?

HiFiASM

How is it used?

https://github.com/nf-core/modules/tree/master/modules/nf-core/hifiasm

Which workflow should it be included in?

Assembly

What analyses should be run on the various states of assembly.

IPA produces:

*.purged.haplotigs.fasta
*.purged.primary.fasta

HiFi asm produces:

*.bp.hap1.p_ctg.{fasta,gfa}
*.bp.hap2.p_ctg.{fasta,gfa}
*.bp.p_ctg.{fasta,gfa}
*.bp.p_utg.gfa
*.bp.r_utg.gfa

What analyses should we be running on which files?

Quast, Busco, Blobtools, Merqury, Inspector [ *.bp.p_ctg.fasta, *.purged.primary.fasta ]
Bandage [ *.bp.p_ctg.gfa ]
Merqury [ *.bp.hap1.p_ctg.fasta+*.bp.hap2.p_ctg.fasta ] ?

Inspector is still being evaluated, but for now we'll include it anyway.

@iggyB What are you currently running on these outputs?
@aersoares81 Any other opinions on what we should be analyzing here.

What should we be doing with the haplotigs and unitigs?

Circos plot to visualize before and after curating.

Can create a circos plot of before and after curating.

https://wellcomeopenresearch.org/articles/6-172

How to flag mitochondrial or chloroplast contigs?

Question from Henrik:

What are we doing at the moment to flag mitochondrial or chloroplast contigs?

New Module: Charcoal

Which tool should be included?
https://github.com/dib-lab/charcoal
Remove contaminated contigs from genomes using k-mers and taxonomies.

How is it used?
Provide an example of how the tool is used.

Write a configuration file a la demo/demo.conf

charcoal download-db
charcoal init newproject --genome-dir example-genomes \
    --lineages example-genomes/provided-lineages.csv
python -m charcoal run newproject.conf -j 4

Which workflow should it be included in?
Assembly curation

Test Data - Laetiporus sulphureus - Chicken of the Woods

We need a suitable test data set. Matthieu Muffato ( Wellcome Sanger Institute ) suggested Laetiporus sulphureus ( 37MB; Fungus )

PacBio Sequel IIe: ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR680/ERR6808041/m64229e_210602_121910.ccs.bc1020_BAK8B_OA--bc1020_BAK8B_OA.bam

Hi-C Arima v2: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR668/000/ERR6688740/ERR6688740_1.fastq.gz, ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR668/000/ERR6688740/ERR6688740_2.fastq.gz

There's also 10XGenomics data available too.

Pipeline fails on "PREPARE_INPUT:GOAT_TAXONSEARCH" because of missing taxon identifier

I am trying to run the pipeline using the assembly-project-template after first editing "assembly_parameters.yml" according to thee instructions for the template.

Running: bash ./run_nextflow.sh

Gives me this error now:

N E X T F L O W ~ version 23.10.0
Launching https://github.com/NBISweden/Earth-Biogenome-Project-pilot [romantic_cajal] DSL2 - revision: 5b9171f [main]

Running NBIS Earth Biogenome Project Assembly workflow.

ERROR ~ Error executing process > 'PREPARE_INPUT:GOAT_TAXONSEARCH (1)'

Caused by:
No input. Valid input: single taxon identifier or a .txt file with identifiers -- Check script '/home/tomaslar/.nextflow/assets/NBISweden/Earth-Biogenome-Project-pilot/modules/nf-core/goat/taxonsearch/main.nf' at line: 24

Source block:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
input = taxa_file ? "-f ${taxa_file}" : "-t "${taxon}""
if (!taxon && !taxa_file) error "No input. Valid input: single taxon identifier or a .txt file with identifiers"
if (taxon && taxa_file ) error "Only one input is required: a single taxon identifier or a .txt file with identifiers"
"""
goat-cli taxon search \
$args \
$input > ${prefix}.tsv

  cat <<-END_VERSIONS > versions.yml
  "${task.process}":
      goat: \$(goat-cli --version | cut -d' ' -f2)
  END_VERSIONS
  """

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

-- Check '.nextflow.log' file for details

    The workflow completed unsuccessfully.

    Please read over the error message. If you are unable to solve it, please
    post an issue at https://github.com/NBISweden/Earth-Biogenome-Project-pilot/issues
    where we will do our best to help.

Dardel profile

Is your feature request related to a problem? Please describe.
Rackham is coming to its end of life

Describe the solution you'd like
Add a profile similar to the nf-core profile

New Module: QC3C

https://github.com/cerebis/qc3C

For quality checking Hi-C data

Test Data - Amphimallon solstitiale

Candidate for full data? Genome size 1.34 GB.

https://goat.genomehubs.org/record?recordId=360071&result=taxon&taxonomy=ncbi#Amphimallon%20solstitiale

PacBio Revio data: ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR117/ERR11788370/m84047_230428_164743_s4.hifi_reads.bc1011_BAK8A_OA.bam
Hi-C Restriction Digest: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR114/003/ERR11439603/ERR11439603_1.fastq.gz,ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR114/003/ERR11439603/ERR11439603_2.fastq.gz

New Module: BLOBTOOLKIT

Which tool should be included?
blobtoolkit

How is it used?
The toolkit is a Snakemake Workflow which uses blobtools2.

snakemake -p \
                     --use-conda \
                     --conda-prefix /blobtoolkit/.conda \
                     --directory /blobtoolkit/data \
                     --configfile /blobtoolkit/data/$ASSEMBLY.yaml \
                     --stats $ASSEMBLY.snakemake.stats \
                     -j $THREADS \
                     -s /blobtoolkit/insdc-pipeline/Snakefile \
                     --resources btk=1

Pipeline instructions are here

Which workflow should it be included in?
Assembly validation.

New Module: PurgeDups

Which tool should be included?

https://github.com/dfguan/purge_dups

How is it used?

for i in $pb_list
do
	minimap2 -x map-hifi $pri_asm $i | gzip -c - > $i.paf.gz
done
bin/pbcstat *.paf.gz (produces PB.base.cov and PB.stat files)
bin/calcuts PB.stat > cutoffs 2>calcults.log
bin/split_fa $pri_asm > $pri_asm.split
minimap2 -xasm5 -DP $pri_asm.split $pri_asm.split | gzip -c - > $pri_asm.split.self.paf.gz
bin/purge_dups -2 -T cutoffs -c PB.base.cov $pri_asm.split.self.paf.gz > dups.bed 2> purge_dups.log
bin/get_seqs -e dups.bed $pri_asm 
cat hap.fa $hap_asm

Which workflow should it be included in?

Post assembly curation

Path to params.yml template is broken

Describe the bug
A path to the params.yml template is broken.

To Reproduce
Steps to reproduce the behavior:

Go to the main page and click on the link. Also, try searching for a params.yml file.

New Module: Taxon Kit

Which tool should be included?

taxonkit

How is it used?

taxonkit name2taxid

Which workflow should it be included in?

Get the taxid for the organism to allow FCSGX to work when it's not a eukaryote. Add meta data such that GOAT only runs/gets data for eukaryotes.

New Module: KAT_COMP

Add a new module KAT_COMP to generate a histogram of the k-mer spectra.

Usage (old script):

module load bioinfo-tools KAT

CPUS="${SLURM_NPROCS:-8}"
JOB=$SLURM_ARRAY_TASK_ID

SAMPLE_PREFIX=SampleA_trimmed_no_human_normalised
DATA_DIR=/path/to/reads
FASTA_DIR=/path/to/assemblies
FILES=( $FASTA_DIR/*.fasta )

apply_katcomp () {
	ASSEMBLY="$1"		# The assembly is the first parameter to this function
	READ1="$2" 		# The first read pair is the second parameter to this function
	READ2="$3" 		# The second read pair is the third parameter to this function
	PREFIX=$( basename "${ASSEMBLY}" .fasta)
	TMP_FASTQ=$(mktemp -u --suffix ".fastq")
	mkfifo "${TMP_FASTQ}" && zcat "$READ1" "$READ2" > "${TMP_FASTQ}" &		# Make a named pipe and combine reads
	sleep 5																				# Give a little time for the pipe to be made
	kat comp -H 800000000 -t "$CPUS" -o "${PREFIX}_vs_reads.cmp" "${TMP_FASTQ}" "$ASSEMBLY" 	# Compare Reads to Assembly
	rm "${TMP_FASTQ}"
}

FASTA="${FILES[$JOB]}"
apply_katcomp "$FASTA" "$DATA_DIR/${SAMPLE_PREFIX}_R"{1,2}.fastq.gz

nbisweden / earth-biogenome-project-pilot Goto Github PK

earth-biogenome-project-pilot's Introduction

Earth Biogenome Project - Pilot Workflow

Workflow overview

Usage

Workflow parameter inputs

Workflow outputs

Customization for Uppmax

Customization for PDC

Workflow organization

earth-biogenome-project-pilot's People

Contributors

Stargazers

Watchers

Forkers

earth-biogenome-project-pilot's Issues

Potential meta data objects

Sample:

Read data:

Assembly:

Aim

Decisions

Aim

Desired Features

Decisions

Recommend Projects

Recommend Topics

Recommend Org