pathogen-genomics-cymru / lodestone Goto Github PK
View Code? Open in Web Editor NEWMycobacterial pipeline
License: GNU Affero General Public License v3.0
Mycobacterial pipeline
License: GNU Affero General Public License v3.0
Caused by:
Process `vcfpredict:tbprofiler (1)` terminated with an error exit status (1)
Command executed:
bgzip SAMPLE_ID_allelic_depth.minos.vcf
tb-profiler profile --vcf SAMPLE_ID_allelic_depth.minos.vcf.gz --threads 1
mv results/tbprofiler.results.json SAMPLE_ID.tbprofiler-out.json
cp SAMPLE_ID_report.json SAMPLE_ID_report_previous.json
echo '{"complete":"workflow complete without error"}' | jq '.' > SAMPLE_ID_err.json
jq -s ".[0] * .[1] * .[2]" SAMPLE_ID_err.json SAMPLE_ID_report_previous.json SAMPLE_ID.tbprofiler-out.json > SAMPLE_ID_report.json
Command exit status:
1
Command output:
[00:05:44] INFO Using ref file: db.py:594
/opt/conda/share/tbprofiler//tbdb.fasta
INFO Using gff file: db.py:594
/opt/conda/share/tbprofiler//tbdb.gff
INFO Using bed file: db.py:594
/opt/conda/share/tbprofiler//tbdb.bed
INFO Using json_db file: db.py:594
/opt/conda/share/tbprofiler//tbdb.dr.json
INFO Using variables file: db.py:594
/opt/conda/share/tbprofiler//tbdb.variables.json
INFO Using spoligotype_spacers file: db.py:594
/opt/conda/share/tbprofiler//tbdb.spoligotype_spac
ers.txt
INFO Using spoligotype_annotations file: db.py:594
/opt/conda/share/tbprofiler//tbdb.spoligotype_list
.csv
INFO Using bedmask file: db.py:594
/opt/conda/share/tbprofiler//tbdb.mask.bed
INFO Using barcode file: db.py:594
/opt/conda/share/tbprofiler//tbdb.barcode.bed
[00:05:45] INFO Running snpEff vcf.py:119
[00:05:47] ERROR mkdtemp(/bcftools.p6luoZ) failed: Read-only utils.py:391
file system
ERROR tb-profiler:58
################################# ERROR
#######################################
This run has failed. Please check all
arguments and make sure all input files
exist. If no solution is found, please open
up an issue at
https://github.com/jodyphelan/TBProfiler/issu
es/new and paste or attach the
contents of the error log
(tbprofiler.errlog.txt)
#############################################
##################################
Command error:
Traceback (most recent call last):
File "/opt/conda/bin/tb-profiler", line 562, in <module>
args.func(args)
File "/opt/conda/bin/tb-profiler", line 110, in main_profile
results.update(pp.run_profiler(args))
File "/opt/conda/lib/python3.10/site-packages/pathogenprofiler/cli.py", line 74, in run_profiler
results = vcf_profiler(conf=args.conf,prefix=args.files_prefix,sample_name=args.prefix,vcf_file=args.vcf,delly_vcf_file=args.delly_vcf)
File "/opt/conda/lib/python3.10/site-packages/pathogenprofiler/profiler.py", line 121, in vcf_profiler
vcf_obj = vcf_obj.run_snpeff(conf["snpEff_db"],conf["ref"],conf["gff"],rename_chroms= conf.get("chromosome_conversion",None))
File "/opt/conda/lib/python3.10/site-packages/pathogenprofiler/vcf.py", line 134, in run_snpeff
run_cmd("bcftools view -c 1 -a %(filename)s | bcftools view -v snps | combine_vcf_variants.py --ref %(ref_file)s --gff %(gff_file)s | %(rename_cmd)s snpEff ann %(snpeff_data_dir_opt)s -noLog -noStats %(db)s - %(re_rename_cmd)s | bcftools sort -Oz -o %(tmp_file1)s && bcftools index %(tmp_file1)s" % vars(self))
File "/opt/conda/lib/python3.10/site-packages/pathogenprofiler/utils.py", line 392, in run_cmd
raise ValueError("Command Failed:\n%s\nstderr:\n%s" % (cmd,result.stderr.decode()))
ValueError: Command Failed:
/bin/bash -c set -o pipefail; bcftools view -c 1 -a bec971b8-a6c5-4d7b-8fc0-f4321e950049.targets.vcf.gz | bcftools view -v snps | combine_vcf_variants.py --ref /opt/conda/share/tbprofiler//tbdb.fasta --gff /opt/conda/share/tbprofiler//tbdb.gff | rename_vcf_chrom.py --source NC_000962.3 --target Chromosome | snpEff ann -dataDir /opt/conda/share/snpeff-5.2-0/data -noLog -noStats Mycobacterium_tuberculosis_h37rv - | rename_vcf_chrom.py --source Chromosome --target NC_000962.3 | bcftools sort -Oz -o 2e834baf-bbb5-41be-ba37-b6192ea6df35.vcf.gz && bcftools index 2e834baf-bbb5-41be-ba37-b6192ea6df35.vcf.gz
stderr:
mkdtemp(/bcftools.p6luoZ) failed: Read-only file system
Cleaning up after failed run
Work dir:
/home/ubuntu/data2/lodestone/work/63/cc1294a65c49c2fb547641964a5c48
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line```
===========================================
Finished with errors
tb-profiler update_db
doesn't work directly in Docker recipes without specifiying the version and commits for some reason. So they need to be manually updated.
The create_final_json.py script fails when trying to append to a list used to create the warnings field in the final JSON in process clockwork:alignToRef:
Traceback (most recent call last): File "/scratch/c.c1656075/sp3_testing_2/tb-pipeline/bin/create_final_json.py", line 128, in <module> out = read_and_parse_input_files(stats_file, report_file) File "/scratch/c.c1656075/sp3_testing_2/tb-pipeline/bin/create_final_json.py", line 96, in read_and_parse_input_files warnings.append = "there was %d error but no warnings" %num_errors AttributeError: 'list' object attribute 'append' is read-only
It appears this is because the script is trying to assign to a member called append
rather than append to the warnings
list.
The json produced by parse_samtools_stats.py is not in the format expected by create_final_json.pl. This causes the clockwork sub-workflow to prematurely exit after clockwork:alignToRef. In addition this causes an error message to be incorrectly recorded in the error log and the final report json is incomplete.
Thank you for the excellent workflows, and I'm trying to run it on PC. But for bowtie2 database, I'm confused about what exactly to download. I found at least three links to download. Please let me know if the link is correct.
https://genome-idx.s3.amazonaws.com/bt/hg19.zip (from https://bowtie-bio.sourceforge.net/bowtie2/index.shtml)
https://genome-idx.s3.amazonaws.com/bt/hg19_1kgmaj_snvs_bt2.zip
https://genome-idx.s3.amazonaws.com/bt/hg19_1kgmaj_snvindels_bt2.zip
(both link 2&3 from github https://github.com/BenLangmead/bowtie-majref)
Best,
Trung
Update: please forget it, I just make myself remember how to download a ftp link.
Channel going into preprocessing:bowtie2 is incorrectly defined. This causes preprocessing:bowtie2 to run for the first processed sample only, with subsequent samples incorrectly skipping this process. This will have a knock-on effect on the clockwork sub-workflow, meaning clockwork:alignToRef will only run for the first sample
Report json needs to be generated for the scenario where contaminants are unsuccessfully removed from sample
The NCBI taxonomy for Mycobacteriaceae has been expanded in recent years to include the following genus: Mycobacterium, Mycobacteroides, Mycolicibacter, Mycolicibacterium, and Mycolicibacillus. Afanc and recent versions of Kraken2 databases use this taxonomy. Ref: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8376243/
Older versions of the NCBI taxonomy just use the genus Mycobacterium. Mykrobe and older (3 years+) Kraken2 databases use this taxonomy.
The script identify_tophit_and_contaminants2.py, identifies the top hit from a Mykrobe/Afanc report and the contaminant genomes from a Kraken2 report (and the Mykrobe/Afanc report if unmix_myco=yes).
The script identify_tophit_and_contaminants2.py has been written to recognise the old Mycobacterium taxonomy, and doesn't recognise Mycolicibacterium etc. as being part of Mycobacteriaceae. This means that the script only works for Mykrobe and old Kraken2 databases. When running the script on Afanc and recent Kraken2 reports, Mycolicibacterium etc. are incorrectly identified as contaminants (when unmix_myco=no) and the workflow tries to remove them.
Suggested fix:
Update identify_tophit_and_contaminants2.py to reflect the new taxonomy and drop support for Mykrobe which uses the old taxonomy. Mykrobe will still run as an independent process, but will NOT be used in any downstream reporting
fqtools count is running on the raw reads when it should be running on the cleaned reads from fastp
Following error has been recorded:
[ERROR] failed to open file 'null'
[bam_mating_core] ERROR: Couldn't read header
samtools sort: failed to read header from "-"
[markdup] error reading header
There is a bug in preprocessing:fastp where the read pair count is checked as >100k. The count is performed by pulling it from the fastp output json, however, this is a count for the total number of reads rather than the number of pairs. Using the fqtools approach as in preprocessing:countReads resolves this issue.
Caused by:
Process `vcfpredict:add_allelic_depth (1)` terminated with an error exit status (2)
Command executed:
samtools faidx intracellulare.fasta
samtools dict intracellulare.fasta -o intracellulare.dict
gatk VariantAnnotator -R intracellulare.fasta -I SAMPLE_ID.bam -V SAMPLE_ID.minos.vcf -A DepthPerAlleleBySample -O SAMPLE_ID_allelic_depth.minos.vcf
Command exit status:
2
Command output:
(empty)
Command error:
Using GATK jar /opt/conda/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /opt/conda/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar VariantAnnotator -R intracellulare.fasta -I SAMPLE_ID.bam -V SAMPLE_ID.minos.vcf -A DepthPerAlleleBySample -O SAMPLE_ID_allelic_depth.minos.vcf
04:09:45.030 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/conda/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
04:09:45.135 INFO VariantAnnotator - ------------------------------------------------------------
04:09:45.194 INFO VariantAnnotator - The Genome Analysis Toolkit (GATK) v4.4.0.0
04:09:45.194 INFO VariantAnnotator - For support and documentation go to https://software.broadinstitute.org/gatk/
04:09:45.194 INFO VariantAnnotator - Executing as mambauser@parallelrunningtb on Linux v5.4.0-170-generic amd64
04:09:45.194 INFO VariantAnnotator - Java runtime: OpenJDK 64-Bit Server VM v17.0.8-internal+0-adhoc..src
04:09:45.195 INFO VariantAnnotator - Start Date/Time: February 13, 2024 at 4:09:44 AM UTC
04:09:45.195 INFO VariantAnnotator - ------------------------------------------------------------
04:09:45.195 INFO VariantAnnotator - ------------------------------------------------------------
04:09:45.196 INFO VariantAnnotator - HTSJDK Version: 3.0.5
04:09:45.196 INFO VariantAnnotator - Picard Version: 3.0.0
04:09:45.197 INFO VariantAnnotator - Built for Spark Version: 3.3.1
04:09:45.197 INFO VariantAnnotator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
04:09:45.197 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
04:09:45.198 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
04:09:45.199 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
04:09:45.200 INFO VariantAnnotator - Deflater: IntelDeflater
04:09:45.200 INFO VariantAnnotator - Inflater: IntelInflater
04:09:45.200 INFO VariantAnnotator - GCS max retries/reopens: 20
04:09:45.200 INFO VariantAnnotator - Requester pays: disabled
04:09:45.200 INFO VariantAnnotator - Initializing engine
WARNING 2024-02-13 04:09:45 SamFiles The index file /home/ubuntu/data2/lodestone/work/97/14dacc900afcd5200dc703457c0ba7/SAMPLE_ID.bam.bai was found by resolving the canonical path of a symlink: SAMPLE_ID.bam -> /home/ubuntu/data2/lodestone/work/97/14dacc900afcd5200dc703457c0ba7/SAMPLE_ID.bam
04:09:45.329 INFO FeatureManager - Using codec VCFCodec to read file file://SAMPLE_ID.minos.vcf
04:09:45.341 INFO VariantAnnotator - Shutting down engine
[February 13, 2024 at 4:09:45 AM UTC] org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=201326592
***********************************************************************
A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found.
reference contigs = [NC_016946.1]
features contigs = [NC_000962.3]
***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
Work dir:
/home/ubuntu/data2/lodestone/work/d7/5b3bfc56c15af6202ac715590fa01e
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
===========================================
Finished with errors
Running Minos with an empty VCF file gives an error in process minos
on the command:
minos adjudicate --force --reads sample minos ref.fa A19U007635_1561353218R1_M04557_164.bcftools.vcf sample.cortex.vcf
In this case the BCFTools VCF is empty and the Cortex VCF has a low number of variants. I have not seen any other incidences of this happening. I'll have a proper look and update accordingly.
Human reads are removed by preprocessing:bowtie2, however preprocessing:identifyBacterialContaminants is identifying human reads as a contaminant to remove
identify_tophit_and_contaminants2.py should not be identifying human reads as a contaminant
Hi.
I am trying to analyze single reads but I am getting this error message:
WARN: Input tuple does not match input set cardinality declared by process preprocessing:checkFqValidity
-- offending value: [ERR11243647, /home/olawoyei/projects/def-guthriej/olawoyei/mtb/fastq/ERR11243647.fastq.gz, /project/6083771/olawoyei/work/39/36be7d9f46f13a9a5a5d62ea719385/version.json]
Does lodestone only work with paired FASTQ reads?
Logic for samples passing to clockwork is broken. When unmix_myco=no and contaminants are found, samples are not passing to clockwork
Observed that sometimes contaminants.fa is empty (due to failed download) and no error is thrown in preprocessing:downloadContamGenomes
Afanc is exclusively used for downstream reporting, but error messages etc. are still referring to mykrobe
Due to differences in Mycobacterium tuberculosis species reporting in the Afanc report compared to the Mykrobe report, TB samples are failing to pass to gnomonicus
E.g.
Mykrobe top hit: Mycobacterium tuberculosis
Afanc top hit: Mycobacterium tuberculosis H37Rv
If-statement in clockwork:minos needs to be updated to reflect Afanc reporting, i.e. "starts with" Mycobcaterium tuberculosis
E.g. "error": "top hit ($top_hit) is not one of the 10 accepted mycobacteria"
Need to use --arg parameter with jq
Samples that aren’t mixed and hence don’t pass through TM05-TM08 (because there’s no contaminant to download and map to) are not being passed to clockwork
In the container, the dependencies for clockwork are installed manually and are pinned by version/git commit according to pre-v0.11.0 clockwork, but the git clone is using the main branch for clockwork, which is now at v0.11.0.
Hello, I found that tb-pipeline would be very useful to our lab.
However, the pipeline terminated at the mapToContamFa step.
And I found that a contam_dir was created in the work directory and GCF_000001405.39_GRCh38.p13_genomic.fna.gz (~920Mb) was downloaded at the dowloadConta step.
After this step, a contaminants.fa file was created and the contam_dir was removed.
However, the contaminants.fa is empty and the pipeline terminated.
I am not sure the pipeline failed was due to the empty contaminants.fa file.
Is there any reason why the file is empty? Is it possible to skip the contaminants mapping step?
And I suppose the human reads were processed using bowtie2 against hg19_1kgmaj ?
Why GCF_000001405.39_GRCh38.p13_genomic.fna.gz was downloaded again?
Thanks in advance for any reply.
withLabel cpu and memory declarations are not working
.fai files are being written outside the working directory to /tb-pipeline/resources. This can cause permission issues with read-only partitions.
Problems with parsing of assembly_summary_refseq.txt, matches aren’t found for some of the species hits from Afanc due to missing dot abbreviations, e.g. Afanc reports Mycobacterium avium subsp paratuberculosis (after underscore is removed), but in assembly_summary_refseq.txt it’s reported as Mycobacterium avium subsp. paratuberculosis
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.