Giter Site home page Giter Site logo

pathogen-genomics-cymru / lodestone Goto Github PK

View Code? Open in Web Editor NEW
12.0 6.0 3.0 584.14 MB

Mycobacterial pipeline

License: GNU Affero General Public License v3.0

Python 52.91% Nextflow 32.90% Roff 14.19%
bioinformatics bioinformatics-pipeline genomics global-health infectious-diseases next-generation-sequencing nextflow pathogen sequencing tuberculosis

lodestone's Introduction

Lodestone

Build Status Build Status Build Status

This pipeline takes as input reads presumed to be from one of 10 mycobacterial genomes: abscessus, africanum, avium, bovis, chelonae, chimaera, fortuitum, intracellulare, kansasii, tuberculosis. Input should be in the form of one directory containing pairs of fastq(.gz) or bam files.

Pipeline cleans and QCs reads with fastp and FastQC, classifies with Kraken2 & Afanc, removes non-bacterial content, and - by alignment to any minority genomes - disambiguates mixtures of bacterial reads. Cleaned reads are aligned to either of the 10 supported genomes and variants called. Produces as output one directory per sample, containing cleaned fastqs, sorted, indexed BAM, VCF, F2 and F47 statistics, an antibiogram and summary reports.

Note that while Mykrobe is included within this pipeline, it runs as an independent process and is not used for any downstream reporting.

WARNING: There are currently known errors with vcfmix, as such errorStrategy 'ignore' has been added to the processes vcfpredict:vcfmix to stop the pipeline from crashing. Please check the stdout from nextflow to see whether these processes have ran successfully.

Quick Start

This is a Nextflow DSL2 pipeline, it requires a version of Nextflow that supports DSL2 and the stub-run feature. It is recommended to run the pipeline with NXF_VER=20.11.0-edge, as the pipeline has been tested using this version. E.g. to download

export NXF_VER="20.11.0-edge"
curl -fsSL https://get.nextflow.io | bash

The workflow is designed to run with either docker -profile docker or singularity -profile singularity. The container images are pulled from quay.io and a singularity cache directory is set in the nextflow.config.

E.g. to run the workflow:

NXF_VER=20.11.0-edge nextflow run main.nf -profile singularity --filetype fastq --input_dir fq_dir --pattern "*_R{1,2}.fastq.gz" --unmix_myco yes \
--output_dir . --kraken_db /path/to/database --bowtie2_index /path/to/index --bowtie_index_name hg19_1kgmaj

NXF_VER=20.11.0-edge nextflow run main.nf -profile docker --filetype bam --input_dir bam_dir --unmix_myco no \
--output_dir . --kraken_db /path/to/database --bowtie2_index /path/to/index --bowtie_index_name hg19_1kgmaj

There is also a pre-configured climb profile to run Lodestone on a CLIMB Jupyter Notebook Server. Add -profile climb to your command invocation. The input directory can point to an S3 bucket natively (e.g. --input_dir s3://my-team/bucket). By default this will run the workflow in Docker containers and take advantage of kubernetes pods. The Kraken2, Bowtie2 and Afanc databases will by default point to the pluspf16, hg19_1kgmaj_bt2 and Mycobacteriaciae_DB_7.0 directories by default. These are mounted on a public S3 bucket hosted on CLIMB.

Executors

By default, the pipeline will just run on the local machine. To run on a cluster, modifications will have to be made to the nextflow.config to add in the executor. E.g. for a SLURM cluster add process.executor = 'slurm'. For more information on executor options see the Nextflow docs: https://www.nextflow.io/docs/latest/executor.html

System Requirements

Minimum recommended requirements: 32GB RAM, 8CPU

Params

The following parameters should be set in nextflow.config or specified on the command line:

  • input_dir
    Directory containing fastq OR bam files
  • filetype
    File type in input_dir. Either "fastq" or "bam"
  • pattern
    Regex to match fastq files in input_dir, e.g. "*_R{1,2}.fq.gz". Only mandatory if --filetype is "fastq"
  • output_dir
    Output directory for results
  • unmix_myco
    Do you want to disambiguate mixed-mycobacterial samples by read alignment? Either "yes" or "no":
    • If "yes" workflow will remove reads mapping to any minority mycobacterial genomes but in doing so WILL ALMOST CERTAINLY ALSO reduce coverage of the principal species
    • If "no" then mixed-mycobacterial samples will be left alone. Mixtures of mycobacteria + non-mycobacteria will still be disambiguated
  • species
    Principal species in each sample, assuming genus Mycobacterium. Default 'null'. If parameter used, takes 1 of 10 values: abscessus, africanum, avium, bovis, chelonae, chimaera, fortuitum, intracellulare, kansasii, tuberculosis. Using this parameter will apply an additional sanity test to your sample
    • If you DO NOT use this parameter (default option), pipeline will determine principal species from the reads and consider any other species a contaminant
    • If you DO use this parameter, pipeline will expect this to be the principal species. It will fail the sample if reads from this species are not actually the majority
  • kraken_db
    Directory containing *.k2d Kraken2 database files (k2_pluspf_16gb recommended, obtain from https://benlangmead.github.io/aws-indexes/k2)
  • bowtie2_index
    Directory containing Bowtie2 index (obtain from ftp://ftp.ccb.jhu.edu/pub/data/bowtie2_indexes/hg19_1kgmaj_bt2.zip). The specified path should NOT include the index name
  • bowtie_index_name
    Name of the bowtie index, e.g. hg19_1kgmaj
  • vcfmix
    Run vcfmix, yes or no. Set to no for synthetic samples
  • resistance_profiler
    Run resistance profiling for Mycobacterium tubercuclosis. Either "tb-profiler" or "none".
  • afanc_myco_db
    Path to the afanc database used for speciation. Obtain from https://s3.climb.ac.uk/microbial-bioin-sp3/Mycobacteriaciae_DB_7.0.tar.gz
  • update_tbprofiler
    Update tb-profiler. Either "yes" or "no". "yes" may be useful when running outside of a container for the first time as we will not have constructed a tb-profiler database matching our reference. This is not needed with the climb, docker and singluarity profiles as the reference has already been added. Alternatively you can run tb-profiler update_tbdb --match_ref <lodestone_dir>/resources/tuberculosis.fasta.

For more information on the parameters run nextflow run main.nf --help

The path to the singularity images can also be changed in the singularity profile in nextflow.config. Default value is ${baseDir}/singularity

Stub-run

To test the stub run:

NXF_VER=20.11.0-edge nextflow run main.nf -stub -config testing.config

Checkpoints

Checkpoints used throughout this workflow to fail a sample/issue warnings:

processes preprocessing:checkFqValidity or preprocessing:checkBamValidity

  1. (Fail) If sample does not pass fqtools 'validate' or samtools 'quickcheck', as appropriate.

process preprocessing:countReads

  1. (Fail) If sample contains < 100k pairs of raw reads.

process preprocessing:fastp

  1. (Fail) If sample contains < 100k pairs of cleaned reads, required to all be > 50bp (cleaning using fastp with --length_required 50 --average_qual 10 --low_complexity_filter --correction --cut_right --cut_tail --cut_tail_window_size 1 --cut_tail_mean_quality 20).

process preprocessing:kraken2

  1. (Fail) If the top family hit is not Mycobacteriaceae
  2. (Fail) If there are fewer than 100k reads classified as Mycobacteriaceae
  3. (Warn) If the top family classification is mycobacterial, but this is not consistent with top genus and species classifications
  4. (Warn) If the top family is Mycobacteriaceae but no G1 (species complex) classifications meet minimum thresholds of > 5000 reads or > 0.5% of the total reads (this is not necessarily a concern as not all mycobacteria have a taxonomic classification at this rank)
  5. (Warn) If sample is mixed or contaminated - defined as containing reads > the 5000/0.5% thresholds from multiple non-human species
  6. (Warn) If sample contains multiple classifications to mycobacterial species complexes, each meeting the > 5000/0.5% thresholds
  7. (Warn) If no species classification meets the 5000/0.5% thresholds
  8. (Warn) If no genus classification meets the 5000/0.5% thresholds

process preprocessing:identifyBacterialContaminants

  1. (Fail) If regardless of what Kraken reports, Afanc does not make a species-level mycobacterial classification (note that we do not use Kraken mycobacterial classifications other than to determine whether 100k reads are family Mycobacteriaceae; for higher-resolution classification, we defer to Afanc)
  2. (Fail) If the sample is not contaminated and the top species hit is not one of the 10 supported Mycobacteria: abscessus|africanum|avium|bovis|chelonae|chimaera|fortuitum|intracellulare|kansasii|tuberculosis
  3. (Fail) If the sample is not contaminated and the top species hit is contrary to the species expected (e.g. "avium" rather than "tuberculosis" - only tested if you provide that expectation)
  4. (Warn) If the top Afanc species hit, on the basis of highest % coverage, does not also have the highest median depth
  5. (Warn) If we are unable to associate an NCBI taxon ID to any given contaminant species, which means we will not be able to locate its genome, and thereby remove it as a contaminant
  6. (Warn) If we are unable to determine a URL for the latest RefSeq genome associated with a contaminant species' taxon ID
  7. (Warn) If no complete genome could be found for a contaminant species. The workflow will proceed with alignment-based contaminant removal, but you're warned that there's reduced confidence in detecting reads from this species

process preprocessing:downloadContamGenomes

  1. (Fail) If a contaminant is detected but we are unable to download a representative genome, and thereby remove it

process preprocessing:summarise

  1. (Fail) If after having taken an alignment-based approach to decontamination, Kraken still detects a contaminant species
  2. (Fail) If after having taken an alignment-based approach to decontamination, the top species hit is not one of the 10 supported Mycobacteria
  3. (Fail) If, after successfully removing contaminants, the top species hit is contrary to the species expected (e.g. "avium" rather than "tuberculosis" - only tested if you provide that expectation)

process clockwork:alignToRef

  1. (Fail) If < 100k reads could be aligned to the reference genome
  2. (Fail) If, after aligning to the reference genome, the average read mapping quality < 10
  3. (Fail) If < 50% of the reference genome was covered at 10-fold depth

process clockwork:minos

  1. (Warn) If sample is not TB, then it is not passed to a resistance profiler

Acknowledgements

For a list of direct authors of this pipeline, please see the contributors list. All of the software dependencies of this pipeline are recorded in the version.json

The preprocessing sub-workflow is based on the preprocessing nextflow DSL1 pipeline written by Stephen Bush, University of Oxford. The clockwork sub-workflow uses aspects of the variant calling workflow from https://github.com/iqbal-lab-org/clockwork, lead author Martin Hunt, Iqbal Lab at EMBL-EBI

lodestone's People

Contributors

annacprice avatar arthurvm avatar jezsw avatar maximfilimonovgh avatar whalleyt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

lodestone's Issues

Update identify_tophit_and_contaminants2.py to reflect the new NCBI taxonomy

The NCBI taxonomy for Mycobacteriaceae has been expanded in recent years to include the following genus: Mycobacterium, Mycobacteroides, Mycolicibacter, Mycolicibacterium, and Mycolicibacillus. Afanc and recent versions of Kraken2 databases use this taxonomy. Ref: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8376243/

Older versions of the NCBI taxonomy just use the genus Mycobacterium. Mykrobe and older (3 years+) Kraken2 databases use this taxonomy.

The script identify_tophit_and_contaminants2.py, identifies the top hit from a Mykrobe/Afanc report and the contaminant genomes from a Kraken2 report (and the Mykrobe/Afanc report if unmix_myco=yes).

The script identify_tophit_and_contaminants2.py has been written to recognise the old Mycobacterium taxonomy, and doesn't recognise Mycolicibacterium etc. as being part of Mycobacteriaceae. This means that the script only works for Mykrobe and old Kraken2 databases. When running the script on Afanc and recent Kraken2 reports, Mycolicibacterium etc. are incorrectly identified as contaminants (when unmix_myco=no) and the workflow tries to remove them.

Suggested fix:
Update identify_tophit_and_contaminants2.py to reflect the new taxonomy and drop support for Mykrobe which uses the old taxonomy. Mykrobe will still run as an independent process, but will NOT be used in any downstream reporting

species hits from Afanc missing dot abbreviations

Problems with parsing of assembly_summary_refseq.txt, matches aren’t found for some of the species hits from Afanc due to missing dot abbreviations, e.g. Afanc reports Mycobacterium avium subsp paratuberculosis (after underscore is removed), but in assembly_summary_refseq.txt it’s reported as Mycobacterium avium subsp. paratuberculosis

AttributeError: 'list' object attribute 'append' is read-only in bin/create_final_json.py

The create_final_json.py script fails when trying to append to a list used to create the warnings field in the final JSON in process clockwork:alignToRef:

Traceback (most recent call last): File "/scratch/c.c1656075/sp3_testing_2/tb-pipeline/bin/create_final_json.py", line 128, in <module> out = read_and_parse_input_files(stats_file, report_file) File "/scratch/c.c1656075/sp3_testing_2/tb-pipeline/bin/create_final_json.py", line 96, in read_and_parse_input_files warnings.append = "there was %d error but no warnings" %num_errors AttributeError: 'list' object attribute 'append' is read-only

It appears this is because the script is trying to assign to a member called append rather than append to the warnings list.

Samples not passing to clockwork

Logic for samples passing to clockwork is broken. When unmix_myco=no and contaminants are found, samples are not passing to clockwork

samtools sort error in clockwork:alignToRef

Following error has been recorded:

[ERROR] failed to open file 'null'
[bam_mating_core] ERROR: Couldn't read header
samtools sort: failed to read header from "-"
[markdup] error reading header

Issue with preprocessing:bowtie2 (knock-on effect on clockwork workflow)

Channel going into preprocessing:bowtie2 is incorrectly defined. This causes preprocessing:bowtie2 to run for the first processed sample only, with subsequent samples incorrectly skipping this process. This will have a knock-on effect on the clockwork sub-workflow, meaning clockwork:alignToRef will only run for the first sample

Singularity permissions error


Caused by:
  Process `vcfpredict:tbprofiler (1)` terminated with an error exit status (1)

Command executed:

  bgzip SAMPLE_ID_allelic_depth.minos.vcf
  tb-profiler profile --vcf SAMPLE_ID_allelic_depth.minos.vcf.gz --threads 1
  mv results/tbprofiler.results.json SAMPLE_ID.tbprofiler-out.json
  
  cp SAMPLE_ID_report.json SAMPLE_ID_report_previous.json
  
  echo '{"complete":"workflow complete without error"}' | jq '.' > SAMPLE_ID_err.json
  
  jq -s ".[0] * .[1] * .[2]" SAMPLE_ID_err.json SAMPLE_ID_report_previous.json  SAMPLE_ID.tbprofiler-out.json > SAMPLE_ID_report.json

Command exit status:
  1

Command output:
  [00:05:44] INFO     Using ref file:                                    db.py:594
                      /opt/conda/share/tbprofiler//tbdb.fasta                     
             INFO     Using gff file:                                    db.py:594
                      /opt/conda/share/tbprofiler//tbdb.gff                       
             INFO     Using bed file:                                    db.py:594
                      /opt/conda/share/tbprofiler//tbdb.bed                       
             INFO     Using json_db file:                                db.py:594
                      /opt/conda/share/tbprofiler//tbdb.dr.json                   
             INFO     Using variables file:                              db.py:594
                      /opt/conda/share/tbprofiler//tbdb.variables.json            
             INFO     Using spoligotype_spacers file:                    db.py:594
                      /opt/conda/share/tbprofiler//tbdb.spoligotype_spac          
                      ers.txt                                                     
             INFO     Using spoligotype_annotations file:                db.py:594
                      /opt/conda/share/tbprofiler//tbdb.spoligotype_list          
                      .csv                                                        
             INFO     Using bedmask file:                                db.py:594
                      /opt/conda/share/tbprofiler//tbdb.mask.bed                  
             INFO     Using barcode file:                                db.py:594
                      /opt/conda/share/tbprofiler//tbdb.barcode.bed               
  [00:05:45] INFO     Running snpEff                                    vcf.py:119
  [00:05:47] ERROR    mkdtemp(/bcftools.p6luoZ) failed: Read-only     utils.py:391
                      file system                                                 
                                                                                  
             ERROR                                                  tb-profiler:58
                                                                                  
                      ################################# ERROR                     
                      #######################################                     
                                                                                  
                      This run has failed. Please check all                       
                      arguments and make sure all input files                     
                      exist. If no solution is found, please open                 
                      up an issue at                                              
                      https://github.com/jodyphelan/TBProfiler/issu               
                      es/new and paste or attach the                              
                      contents of the error log                                   
                      (tbprofiler.errlog.txt)                                     
                                                                                  
                      #############################################               
                      ##################################                          
                                                                                  

Command error:
  Traceback (most recent call last):
    File "/opt/conda/bin/tb-profiler", line 562, in <module>
      args.func(args)
    File "/opt/conda/bin/tb-profiler", line 110, in main_profile
      results.update(pp.run_profiler(args))
    File "/opt/conda/lib/python3.10/site-packages/pathogenprofiler/cli.py", line 74, in run_profiler
      results = vcf_profiler(conf=args.conf,prefix=args.files_prefix,sample_name=args.prefix,vcf_file=args.vcf,delly_vcf_file=args.delly_vcf)
    File "/opt/conda/lib/python3.10/site-packages/pathogenprofiler/profiler.py", line 121, in vcf_profiler
      vcf_obj = vcf_obj.run_snpeff(conf["snpEff_db"],conf["ref"],conf["gff"],rename_chroms= conf.get("chromosome_conversion",None))
    File "/opt/conda/lib/python3.10/site-packages/pathogenprofiler/vcf.py", line 134, in run_snpeff
      run_cmd("bcftools view -c 1 -a %(filename)s | bcftools view -v snps | combine_vcf_variants.py --ref %(ref_file)s --gff %(gff_file)s | %(rename_cmd)s snpEff ann %(snpeff_data_dir_opt)s -noLog -noStats %(db)s - %(re_rename_cmd)s | bcftools sort -Oz -o %(tmp_file1)s && bcftools index %(tmp_file1)s" % vars(self))
    File "/opt/conda/lib/python3.10/site-packages/pathogenprofiler/utils.py", line 392, in run_cmd
      raise ValueError("Command Failed:\n%s\nstderr:\n%s" % (cmd,result.stderr.decode()))
  ValueError: Command Failed:
  /bin/bash -c set -o pipefail; bcftools view -c 1 -a bec971b8-a6c5-4d7b-8fc0-f4321e950049.targets.vcf.gz | bcftools view -v snps | combine_vcf_variants.py --ref /opt/conda/share/tbprofiler//tbdb.fasta --gff /opt/conda/share/tbprofiler//tbdb.gff | rename_vcf_chrom.py --source NC_000962.3 --target Chromosome | snpEff ann -dataDir /opt/conda/share/snpeff-5.2-0/data -noLog -noStats Mycobacterium_tuberculosis_h37rv - | rename_vcf_chrom.py --source Chromosome --target NC_000962.3 | bcftools sort -Oz -o 2e834baf-bbb5-41be-ba37-b6192ea6df35.vcf.gz && bcftools index 2e834baf-bbb5-41be-ba37-b6192ea6df35.vcf.gz
  stderr:
  mkdtemp(/bcftools.p6luoZ) failed: Read-only file system
  
  Cleaning up after failed run

Work dir:
  /home/ubuntu/data2/lodestone/work/63/cc1294a65c49c2fb547641964a5c48

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line```



===========================================
Finished with errors

Need to pin version of clockwork in container

In the container, the dependencies for clockwork are installed manually and are pinned by version/git commit according to pre-v0.11.0 clockwork, but the git clone is using the main branch for clockwork, which is now at v0.11.0.

Samples not passing to gnomonicus

Due to differences in Mycobacterium tuberculosis species reporting in the Afanc report compared to the Mykrobe report, TB samples are failing to pass to gnomonicus

E.g.
Mykrobe top hit: Mycobacterium tuberculosis
Afanc top hit: Mycobacterium tuberculosis H37Rv

If-statement in clockwork:minos needs to be updated to reflect Afanc reporting, i.e. "starts with" Mycobcaterium tuberculosis

Bowtie2 database

Thank you for the excellent workflows, and I'm trying to run it on PC. But for bowtie2 database, I'm confused about what exactly to download. I found at least three links to download. Please let me know if the link is correct.

https://genome-idx.s3.amazonaws.com/bt/hg19.zip (from https://bowtie-bio.sourceforge.net/bowtie2/index.shtml)
https://genome-idx.s3.amazonaws.com/bt/hg19_1kgmaj_snvs_bt2.zip
https://genome-idx.s3.amazonaws.com/bt/hg19_1kgmaj_snvindels_bt2.zip
(both link 2&3 from github https://github.com/BenLangmead/bowtie-majref)

Best,
Trung

Update: please forget it, I just make myself remember how to download a ftp link.

intracellulare when statement bug


Caused by:
  Process `vcfpredict:add_allelic_depth (1)` terminated with an error exit status (2)

Command executed:

  samtools faidx intracellulare.fasta
  samtools dict intracellulare.fasta -o intracellulare.dict
  gatk VariantAnnotator -R intracellulare.fasta -I SAMPLE_ID.bam -V SAMPLE_ID.minos.vcf -A DepthPerAlleleBySample -O SAMPLE_ID_allelic_depth.minos.vcf

Command exit status:
  2

Command output:
  (empty)

Command error:
  Using GATK jar /opt/conda/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar
  Running:
      java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /opt/conda/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar VariantAnnotator -R intracellulare.fasta -I SAMPLE_ID.bam -V SAMPLE_ID.minos.vcf -A DepthPerAlleleBySample -O SAMPLE_ID_allelic_depth.minos.vcf
  04:09:45.030 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/conda/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
  04:09:45.135 INFO  VariantAnnotator - ------------------------------------------------------------
  04:09:45.194 INFO  VariantAnnotator - The Genome Analysis Toolkit (GATK) v4.4.0.0
  04:09:45.194 INFO  VariantAnnotator - For support and documentation go to https://software.broadinstitute.org/gatk/
  04:09:45.194 INFO  VariantAnnotator - Executing as mambauser@parallelrunningtb on Linux v5.4.0-170-generic amd64
  04:09:45.194 INFO  VariantAnnotator - Java runtime: OpenJDK 64-Bit Server VM v17.0.8-internal+0-adhoc..src
  04:09:45.195 INFO  VariantAnnotator - Start Date/Time: February 13, 2024 at 4:09:44 AM UTC
  04:09:45.195 INFO  VariantAnnotator - ------------------------------------------------------------
  04:09:45.195 INFO  VariantAnnotator - ------------------------------------------------------------
  04:09:45.196 INFO  VariantAnnotator - HTSJDK Version: 3.0.5
  04:09:45.196 INFO  VariantAnnotator - Picard Version: 3.0.0
  04:09:45.197 INFO  VariantAnnotator - Built for Spark Version: 3.3.1
  04:09:45.197 INFO  VariantAnnotator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
  04:09:45.197 INFO  VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
  04:09:45.198 INFO  VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
  04:09:45.199 INFO  VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
  04:09:45.200 INFO  VariantAnnotator - Deflater: IntelDeflater
  04:09:45.200 INFO  VariantAnnotator - Inflater: IntelInflater
  04:09:45.200 INFO  VariantAnnotator - GCS max retries/reopens: 20
  04:09:45.200 INFO  VariantAnnotator - Requester pays: disabled
  04:09:45.200 INFO  VariantAnnotator - Initializing engine
  WARNING	2024-02-13 04:09:45	SamFiles	The index file /home/ubuntu/data2/lodestone/work/97/14dacc900afcd5200dc703457c0ba7/SAMPLE_ID.bam.bai was found by resolving the canonical path of a symlink: SAMPLE_ID.bam -> /home/ubuntu/data2/lodestone/work/97/14dacc900afcd5200dc703457c0ba7/SAMPLE_ID.bam
  04:09:45.329 INFO  FeatureManager - Using codec VCFCodec to read file file://SAMPLE_ID.minos.vcf
  04:09:45.341 INFO  VariantAnnotator - Shutting down engine
  [February 13, 2024 at 4:09:45 AM UTC] org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator done. Elapsed time: 0.01 minutes.
  Runtime.totalMemory()=201326592
  ***********************************************************************
  
  A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found.
    reference contigs = [NC_016946.1]
    features contigs = [NC_000962.3]
  
  ***********************************************************************
  Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

Work dir:
  /home/ubuntu/data2/lodestone/work/d7/5b3bfc56c15af6202ac715590fa01e

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`



===========================================
Finished with errors

Additions for v0.9.8

  • Add compatibility for CLIMB Jupyter Notebooks (uses K8s/S3)
  • Replace gnomonicus with tb-profiler
  • Add csv output from mykrobe
  • Minor reporting bug fix (add missing publishDir statement to reAfanc and reKraken)
  • Publish full afanc report

Update TB profiler manually

tb-profiler update_db doesn't work directly in Docker recipes without specifiying the version and commits for some reason. So they need to be manually updated.

Empty contaminated.fa and failed in preprocessing:mapToContamFa process

Hello, I found that tb-pipeline would be very useful to our lab.
However, the pipeline terminated at the mapToContamFa step.
And I found that a contam_dir was created in the work directory and GCF_000001405.39_GRCh38.p13_genomic.fna.gz (~920Mb) was downloaded at the dowloadConta step.

After this step, a contaminants.fa file was created and the contam_dir was removed.
However, the contaminants.fa is empty and the pipeline terminated.

I am not sure the pipeline failed was due to the empty contaminants.fa file.
Is there any reason why the file is empty? Is it possible to skip the contaminants mapping step?

And I suppose the human reads were processed using bowtie2 against hg19_1kgmaj ?
Why GCF_000001405.39_GRCh38.p13_genomic.fna.gz was downloaded again?

Thanks in advance for any reply.

Output json of parse_samtools_stats.py is in incorrect format

The json produced by parse_samtools_stats.py is not in the format expected by create_final_json.pl. This causes the clockwork sub-workflow to prematurely exit after clockwork:alignToRef. In addition this causes an error message to be incorrectly recorded in the error log and the final report json is incomplete.

Minos fails when trying to compare against an empty VCF file

Running Minos with an empty VCF file gives an error in process minos on the command:

minos adjudicate --force --reads sample minos ref.fa A19U007635_1561353218R1_M04557_164.bcftools.vcf sample.cortex.vcf

In this case the BCFTools VCF is empty and the Cortex VCF has a low number of variants. I have not seen any other incidences of this happening. I'll have a proper look and update accordingly.

Read pair counting bug in preprocessing:fastp

There is a bug in preprocessing:fastp where the read pair count is checked as >100k. The count is performed by pulling it from the fastp output json, however, this is a count for the total number of reads rather than the number of pairs. Using the fqtools approach as in preprocessing:countReads resolves this issue.

Unable to analyze single FASTQ reads

Hi.

I am trying to analyze single reads but I am getting this error message:

WARN: Input tuple does not match input set cardinality declared by process preprocessing:checkFqValidity -- offending value: [ERR11243647, /home/olawoyei/projects/def-guthriej/olawoyei/mtb/fastq/ERR11243647.fastq.gz, /project/6083771/olawoyei/work/39/36be7d9f46f13a9a5a5d62ea719385/version.json]

Does lodestone only work with paired FASTQ reads?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.