Giter Site home page Giter Site logo

metagenome-atlas / atlas Goto Github PK

View Code? Open in Web Editor NEW
359.0 15.0 100.0 20.77 MB

ATLAS - Three commands to start analyzing your metagenome data

Home Page: https://metagenome-atlas.github.io/

License: BSD 3-Clause "New" or "Revised" License

Python 95.05% CSS 1.35% Shell 1.73% Ruby 0.86% HTML 1.00%
metagenomics annotation snakemake assembly genomic-binning functional-annotation taxonomic-classifications

atlas's Introduction

Metagenome-Atlas

Anaconda-Server Badge Bioconda Documentation Status Mastodon Follow

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, to Annotation.

scheme of workflow

You can start using atlas with three commands:

    mamba install -y -c bioconda -c conda-forge metagenome-atlas={latest_version}
    atlas init --db-dir databases path/to/fastq/files
    atlas run all

where {latest_version} should be replaced by Version

Webpage

metagenome-atlas.github.io

Documentation

https://metagenome-atlas.readthedocs.io/

Tutorial

Citation

ATLAS: a Snakemake workflow for assembly, annotation, and genomic binning of metagenome sequence data.
Kieser, S., Brown, J., Zdobnov, E. M., Trajkovski, M. & McCue, L. A.
BMC Bioinformatics 21, 257 (2020).
doi: 10.1186/s12859-020-03585-4

Developpment/Extensions

Here are some ideas I work or want to work on when I have time. If you want to contribute or have some ideas let me know via a feature request issue.

  • Optimized MAG recovery (e.g. Spacegraphcats)
  • Integration of viruses/plasmid that live for now as extensions
  • Add statistics and visualisations as in atlas_analyze
  • Implementation of most rules as snakemake wrapper
  • Cloud execution
  • Update to new Snakemake version and use cool reports.

atlas's People

Contributors

alienzj avatar aroarz avatar brwnj avatar cedricmidoux avatar colinbrislawn avatar coverall2357 avatar ctb avatar github-actions[bot] avatar jmtsuji avatar johnne avatar jotech avatar llansing avatar mdehollander avatar mladen5000 avatar molecules avatar njohner avatar philippbayer avatar raw937 avatar roshni-b avatar silask avatar smcolby avatar vmikk avatar wangzhichao1990 avatar waschina avatar yanhui09 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

atlas's Issues

16S reads are not incorporated to final output.

I downloaded the new version of atlas, and run them by mistake on 16S reads.

I found that all my reads were lost after the decontamination step. So there has to be a bug with the new way of getting the quality reads.

diamond database

Apparently there is still some problems downloading the database. #48

Fix:

@camel315 It seems that you didn't correctly downloaded the diamond database refseq.dmnd

can you check if is downloaded and that the path in the config file points to the right file?

You can download the database from here.
But even better is to download the refseq.fasta and to build the database on your own (take some time)

Infinite runtime with not allowed sample names

Just noticed that I had not allowed characters in my sample names. This resulted in snakemake running forever while not doing anything. The sample names in config.yaml are auto generated from fastq files with the make-config option. They look like: I16-1253-31-AH34-TACGGTCA-CTCTCTAT-L002-001
I see in the README that only A-Z characters are allowed. When I change to only AZ characters snakemake starts :)
Is there a specific reason not to allow any other characters? It would be nice to see at least a warning when running the config that the sample names are invalid. Or maybe fix it directly, so the automatic creation of the config file works.
I am happy to contribute if you think this is a good plan.

ambigous mapping for coverage calculation

To calculate the coverage over features or contigs, there is always the problem of ambiguous reads.
Ideally one should keep all ambiguous positions, where a read maps and then divide the count by the number of ambiguous positions or do some more sophisticated repartition of reads.

bbmap has the possibility to keep all ambiguous mappings and pileup.sh (from the bbmap package) can count the secondary alignements, but this means a read can be counted multiple times.

@brwnj Could you send me your original code with samtools. how does it handle the ambiguous reads?

Adding deduplication step

What do you think of adding a deduplication step. Especially when using a PCR library this could remove bias. In the BBmap suite there is the script clumpify which also reduces file size of the fastq.gz

Contaminant references not found in config file

I'm having the same problem already reported here, but I can't fix it..

KeyError in line 116 of /home/NIOO.INT/mattiash/lib/python3.6/site-packages/pnnl_atlas-1.0.14-py3.6.egg/atlas/rules/assemble.snakefile:
'contaminant_references'
File "/home/NIOO.INT/mattiash/lib/python3.6/site-packages/pnnl_atlas-1.0.14-py3.6.egg/atlas/Snakefile", line 186, in
File "/home/NIOO.INT/mattiash/lib/python3.6/site-packages/pnnl_atlas-1.0.14-py3.6.egg/atlas/rules/assemble.snakefile", line 116, in

I already try to change the conf.py, but this didn't work.

rule initialize_checkm failed

I met this problem with error ' Waiting at most 10 seconds for missing files.
Error in job initialize_checkm while creating output files....
The checkm-genome version is checkm-genome: 1.0.7-py35_0 bioconda
--latency-wait has been set to 10
Please see the attached file. Looking forward to your instruction. Thank you.
err.txt

Wrong default shell when using docker

Problem description

I am attempting to use ATLAS within a docker container (based off continuumio/miniconda). Within the container, the Snakefile appears to run using /bin/sh instead of the typical default /bin/bash, which prevents the pipeline from recognizing where conda binaries are storied.

Both in the previous and current ATLAS releases, I get the following type of error (last line in the code block):

rule init_QC:
    input: /home/atlas/data/NE1_R1_sub100k.fastq, /home/atlas/data/NE1_R2_sub100k.fastq
    output: NE1-sub100k/sequence_quality_control/NE1-sub100k_raw_R1.fastq.gz, NE1-sub100k/sequence_quality_control/NE1-sub100k_raw_R2.fastq.gz
    log: NE1-sub100k/logs/NE1-sub100k_init.log
    jobid: 34
    wildcards: sample=NE1-sub100k
    priority: 80
    threads: 12
    resources: mem=32

reformat.sh in=/home/atlas/data/NE1_R1_sub100k.fastq in2=/home/atlas/data/NE1_R2_sub100k.fastq         interleaved=f         out1=NE1-sub100k/sequence_quality_control/NE1-sub100k_raw_R1.fastq.gz out2=NE1-sub100k/sequence_quality_control/NE1-sub100k_raw_R2.fastq.gz         qout=33         overwrite=true         verifypaired=t         addslash=t         trimreaddescription=t         threads=12         -Xmx32G 2> NE1-sub100k/logs/NE1-sub100k_init.log
        
/bin/sh: 1: source: not found

Proposed solution

Adding the following to the top of the Snakefile (as recommended here) appears to solve the issue:

shell.executable("/bin/bash")
shell.prefix("source ~/.bashrc; ")

This sets the shell explicitly.

Init error

when starting the pipeline and the samples declared in the config file are not found a specific error should be given instead of the default snakemeke error

Update python package manifest

With the additional import of qc.snakefile, we need to update the manifest (MANIFEST.in) to ensure python moves the file into the install directory.

Make java_mem_min config option

Background

Currently, ATLAS has a java_mem setting in the configuration (.yaml) file that allows users to specify the memory allowance of tools running using java (e.g., most BBTools programs). This appears to control the -Xmx flag in java for the max. allowed memory.

I am running ATLAS on a server with a ZFS ARC cache. Our ARC cache allows up to half of our system's RAM to be used to store files for faster read/write speed. Upon memory pressure, the ARC cache will automatically scale down to allow programs to utilize more RAM.

Problem description

ATLAS fails when first initializing various BBTools programs due to a memory mapping error, e.g.,

# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 71583137792 bytes for committing reserved memory.

I think this is because, although the RAM is available on our server, it is currently occupied by the ARC cache. The ARC cache would shrink if given the chance, but the BBTools program fails immediately.

Proposed solution

By only setting the -Xmx flag for most BBTools, the program requires that the max. RAM be completely available when the program first initializes. I propose also setting the -Xms flag to give the minimum required memory for the program to initialize. This should alleviate any problems interacting with our ARC cache.

To do so, the java_mem parameter in the config (.yaml) file would need to be split into java_mem_min and java_mem_max. Then, both settings could be called for each BBTool. For example rule_deduplicate could be changed as follows (last portion of rule shown):

Current:

        resources:
            mem = config.get("java_mem", JAVA_MEM)
        shell:
            """
            clumpify.sh \
                {params.inputs} \
                {params.outputs} \
                overwrite=true\
                dedupe=t \
                dupesubs={params.dupesubs} \
                optical={params.only_optical}\
                threads={threads} \
                -Xmx{resources.mem}G 2> {log}
            """

Updated

        resources:
            mem_min = config.get("java_mem_min", JAVA_MEM_MIN)
            mem_max = config.get("java_mem_max", JAVA_MEM_MAX)
        shell:
            """
            clumpify.sh \
                {params.inputs} \
                {params.outputs} \
                overwrite=true\
                dedupe=t \
                dupesubs={params.dupesubs} \
                optical={params.only_optical}\
                threads={threads} \
                -Xms{resources.mem_min}G \
                -Xmx{resources.mem_max}G 2> {log}
            """

Does this seem like a reasonable change? If so, I'd be willing to start a branch with the updates.

Thanks for all your work on this tool.

picard fails with new installs (JRE SIGSEGV)

Running on a new install of atlas and am getting a new error during the remove_pcr_duplicates job. I get the same error when running a config file that previously ran without issue. Any insights? Thanks.

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f68e765952a, pid=9385, tid=0x00007f68e51cf700
#
# JRE version: OpenJDK Runtime Environment (8.0_121-b15) (build 1.8.0_121-b15)
# Java VM: OpenJDK 64-Bit Server VM (25.121-b15 mixed mode linux-amd64 )
# Problematic frame:
# V  [libjvm.so+0x61652a]  InstanceKlass::oop_follow_contents(ParCompactionManager*, oopDesc*)+0x14a
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/mdjlynch/working/metagenombio/analysis_metagenome/greenhouse/data/processed/atlas/hs_err_pid9385.log
#
# If you would like to submit a bug report, please visit:
#   http://www.azulsystems.com/support/
#
/home/mdjlynch/working/metagenombio/analysis_metagenome/greenhouse/data/processed/atlas/.snakemake/conda/cc9cda9a/bin/picard: line 62:  9385 Aborted                 (core dumped) /home/mdjlynch/working/metagenombio/analysis_metagenome/greenhouse/data/processed/atlas/.snakemake/conda/cc9cda9a/bin/java -Xmx64g -jar /home/mdjlynch/working/metagenombio/analysis_metagenome/greenhouse/data/processed/atlas/.snakemake/conda/cc9cda9a/share/picard-2.11.0-0/picard.jar MarkDuplicates INPUT=tRN5-S41-L001-001/megahit_21_121_20_normalization_k21_t100/alignments/tRN5-S41-L001-001.bam OUTPUT=tRN5-S41-L001-001/megahit_21_121_20_normalization_k21_t100/alignments/tRN5-S41-L001-001_markdup.bam METRICS_FILE=tRN5-S41-L001-001/megahit_21_121_20_normalization_k21_t100/alignments/tRN5-S41-L001-001_markdup_metrics.txt ASSUME_SORT_ORDER=coordinate MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=1000 REMOVE_DUPLICATES=TRUE VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=TRUE
Error in job remove_pcr_duplicates while creating output files tRN5-S41-L001-001/megahit_21_121_20_normalization_k21_t100/alignments/tRN5-S41-L001-001_markdup_metrics.txt, tRN5-S41-L001-001/megahit_21_121_20_normalization_k21_t100/alignments/tRN5-S41-L001-001_markdup.bam.
RuleException:
CalledProcessError in line 618 of /home/mdjlynch/anaconda3/lib/python3.5/site-packages/atlas/rules/assemble.snakefile:
Command ' picard MarkDuplicates -Xmx64g INPUT=tRN5-S41-L001-001/megahit_21_121_20_normalization_k21_t100/alignments/tRN5-S41-L001-001.bam                OUTPUT=tRN5-S41-L001-001/megahit_21_121_20_normalization_k21_t100/alignments/tRN5-S41-L001-001_markdup.bam METRICS_FILE=tRN5-S41-L001-001/megahit_21_121_20_normalization_k21_t100/alignments/tRN5-S41-L001-001_markdup_metrics.txt ASSUME_SORT_ORDER=coordinate                MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=1000 REMOVE_DUPLICATES=TRUE                VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=TRUE' returned non-zero exit status 134
  File "/home/mdjlynch/anaconda3/lib/python3.5/site-packages/atlas/rules/assemble.snakefile", line 618, in __rule_remove_pcr_duplicates
  File "/home/mdjlynch/anaconda3/lib/python3.5/concurrent/futures/thread.py", line 55, in run
Removing output files of failed job remove_pcr_duplicates since they might be corrupted:
tRN5-S41-L001-001/megahit_21_121_20_normalization_k21_t100/alignments/tRN5-S41-L001-001_markdup.bam
Will exit after finishing currently running jobs.

Contaminant references not found in config file

After creating the config with altas make-config the contaminant references are not configured in config.yaml:

KeyError in line 116 of /home/NIOO.INT/mattiash/lib/python3.6/site-packages/pnnl_atlas-1.0.14-py3.6.egg/atlas/rules/assemble.snakefile:
'contaminant_references'
  File "/home/NIOO.INT/mattiash/lib/python3.6/site-packages/pnnl_atlas-1.0.14-py3.6.egg/atlas/Snakefile", line 186, in <module>
  File "/home/NIOO.INT/mattiash/lib/python3.6/site-packages/pnnl_atlas-1.0.14-py3.6.egg/atlas/rules/assemble.snakefile", line 116, in <module>

What is this license?

Hi,

Excuse my curiosity but what is the license you're using? Is this a handcrafted license, specificly written for this project? Does it have a name? Why not using a more common one?

Bests.

deduplication

It seems deduplication is better at the begining of the pipeline. I' will make deduplication on by default and put it at the beginning of the pipeline.

From a discussion on the bbmap forum:

I probably have a problem with PCR duplicates and thought I want to use clumpy to remove duplicates. I did some tests and realised if the reads don't have the same length they are not marked as duplicates. E.g. I remove one nucleotide from the end of the read and they are no more marked as duplicates.

Is this behaviour voluntarily or a bug? I think that if tow paired end reads start at the same position and are identical (except some missmatches allowed) they can be considered PCR duplicates, can't they. The paris of reads doesn't necessarily need to stop at the same position. Especially if you recommend in the
Processing guide to use duplication after quality trimming. during the trimming PCR duplicated reads can be trimmed to different lengths.

In the quality trimming one pair might also be removed, and I don't know how to find duplicates between one single end and one paired end library.

 

Here is my take (could be wrong). That processing guide may have been written before clumpify existed. You should use clumpify on raw data (before anything is done to it). That is the best way to identify duplicates. You can then follow that up with trimming.

initial database download fail

I had to edit workflows.py to remove references to 'conda' and 'dryrun' to get the downloads to run in 1.0.14

def download(jobs, out_dir, snakemake_args):
    out_dir = os.path.realpath(out_dir)
    cmd = ("snakemake --snakefile {snakefile} --directory {parent_dir} "
           "--printshellcmds --jobs {jobs} --rerun-incomplete "
           "--config workflow=download db_dir='{out_dir}' {add_args} "
           "{args}").format(snakefile=get_snakefile(),
                            parent_dir=os.path.dirname(out_dir),
                            jobs=jobs,
                            out_dir=out_dir,
                            add_args=" " if snakemake_args and snakemake_args[0].startswith("-") else "--",
                            args=" ".join(snakemake_args))
    logging.info("Executing: %s" % cmd)
    check_call(cmd, shell=True)

Updating the documentation

Default protocol:

Initial QC
Deduplication (optional)
Quality trimming
Decontamination

*Reads stats after each of those steps

[There are some insert size calculation steps here, but the output is currently not used downstream]

Normalize coverage
Error correction (optional)
Merge read pairs (optional)
Assemble (and stats)
Filter contigs

On the filtered contigs we:

Prokka with annotation
Diamond blastp against RefSeq for taxonomy

Bin genomes
QC bins with CheckM

Align QC'd sequences to final contigs
Feature counts using ORFs from Prodigal

Unicode&snakemake problems while setting up atlas

Hi! I'm having trouble setting up atlas to my Macbook pro (OSX El Capitan 10.11.6). After initially trying to install atlas with pip, ran into this:

    return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2365: ordinal not in range(128)

but yesterday installation went through but now I'm getting this error when downloading databases:

Traceback (most recent call last):
  File "/Users/lplotta/miniconda3/bin/atlas", line 11, in <module>
    sys.exit(cli())
  File "/Users/lplotta/miniconda3/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/lplotta/miniconda3/lib/python3.5/site-packages/click/core.py", line 676, in main
    _verify_python3_env()
  File "/Users/lplotta/miniconda3/lib/python3.5/site-packages/click/_unicodefun.py", line 118, in _verify_python3_env
    'for mitigation steps.' + extra)
RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment.  Consult http://click.pocoo.org/python3/for mitigation steps.

Similar (?) issue is raised when trying to look into atlas executable for example with atlas --version

Traceback (most recent call last):
  File "/Users/lplotta/miniconda3/bin/atlas", line 11, in <module>
    sys.exit(cli())
  File "/Users/lplotta/miniconda3/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/lplotta/miniconda3/lib/python3.5/site-packages/click/core.py", line 676, in main
    _verify_python3_env()
  File "/Users/lplotta/miniconda3/lib/python3.5/site-packages/click/_unicodefun.py", line 118, in _verify_python3_env
    'for mitigation steps.' + extra)
RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment.  Consult http://click.pocoo.org/python3/for mitigation steps.

ALSO: previously had this error when trying to download databases:

[2017-05-09 09:54] Executing: snakemake -s /Users/lplotta/miniconda3/lib/python3.5/site-packages/atlas/Snakefile -d / -p -j 8 --nolock --rerun-incomplete --config db_dir='/databases' workflow=download -- 
Traceback (most recent call last):
  File "/Users/lplotta/miniconda3/lib/python3.5/site-packages/snakemake/__init__.py", line 426, in snakemake
    force_use_threads=use_threads)
  File "/Users/lplotta/miniconda3/lib/python3.5/site-packages/snakemake/workflow.py", line 317, in execute
    list_params_changes)
  File "/Users/lplotta/miniconda3/lib/python3.5/site-packages/snakemake/persistence.py", line 24, in __init__
    os.mkdir(self.path)
PermissionError: [Errno 13] Permission denied: '/.snakemake'
Traceback (most recent call last):
  File "/Users/lplotta/miniconda3/bin/atlas", line 11, in <module>
    sys.exit(cli())
  File "/Users/lplotta/miniconda3/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Users/lplotta/miniconda3/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Users/lplotta/miniconda3/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/lplotta/miniconda3/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/lplotta/miniconda3/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Users/lplotta/miniconda3/lib/python3.5/site-packages/atlas/atlas.py", line 308, in run_download
    download(jobs, out_dir, snakemake_args)
  File "/Users/lplotta/miniconda3/lib/python3.5/site-packages/atlas/workflows.py", line 42, in download
    check_call(cmd, shell=True)
  File "/Users/lplotta/miniconda3/lib/python3.5/subprocess.py", line 271, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'snakemake -s /Users/lplotta/miniconda3/lib/python3.5/site-packages/atlas/Snakefile -d / -p -j 8 --nolock --rerun-incomplete --config db_dir='/databases' workflow=download -- ' returned non-zero exit status 1

Initalise conda envs in download

We should cycle trough the env and install them in the download command which is called from a server with internet. Then atlas can be run without internet connection I think.

Memory requirments

checkm in run_checkm_lineage_wf, needs up to 80GB memory with no possiblity to restrict it.

Update documentation

We need to outline the new features (deduplicate, merge_pairs) that were added and their associated config parameters in the skeleton config generation method as well as the docs for assembly. We've also updated the qc output files a bit and docs should reflect those changes.

Bbduck2.sh is not installed with bbmap

Hello,

Atlas is realy a good pipeline.
I'm trying it out and had multiple errors because bbduk2.sh was not installed.

I propose you mark the bbsuite as a prerequesite and especially bbduk2.sh which wasn't shipped with the latest bbmap package.

Kind regards
Silas

Rule find counts per region - error

I encountered an error while running the assemble workflow on Atlas (version 1.0.19). From the counts_per_region_.log, the error was

Warning: Unknown annotation format: gtf. GTF format is used.
ERROR: invalid parameter: '−−minOverlap'

It seems the encoding of the two dashes ('--') used in the minimum overlap parameter is the issue.

import chardet
s_encoding = chardet.detect('−−minOverlap')['encoding']
print s_encoding
utf-8

Possible solution was to retype dashes from "−−minOverlap" parameter.

s_retyped = chardet.detect('--minOverlap')['encoding']
print s_retyped
ascii

I also added a space before the backslash on lines 684,685, and 689 from the assemble.snakefile. After both changes, the assemble workflow was successfully completed.

Below is the complete log

Building DAG of jobs...
Creating conda environment /home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/envs/optional_genome_binning.yaml...
Environment for ../../../../../home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/envs/optional_genome_binning.yaml created (location: .snakemake/conda/9596cb25)
Creating conda environment /home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/envs/required_packages.yaml...
Environment for ../../../../../home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/envs/required_packages.yaml created (location: .snakemake/conda/212a2e89)
Provided cores: 24
Rules claiming more threads will be scaled down.
Unlimited resources: mem
Job counts:
count jobs
1 QC_report
1 add_contig_metadata
1 align_reads_to_final_contigs
1 all
1 build_decontamination_db
2 calculate_contigs_stats
1 calculate_insert_size
1 calculate_prefiltered_contig_coverage_stats
1 combine_insert_stats
1 combine_read_counts
1 combine_read_length_stats
1 convert_gff_to_gtf
1 convert_sam_to_bam
1 decontamination
1 deduplicate
1 error_correction
1 filter_by_coverage
1 finalize_QC
1 finalize_contigs
1 find_counts_per_region
1 init_QC
1 initialize_checkm
1 make_maxbin_abundance_file
1 merge_pairs
1 merge_sample_tables
1 normalize_coverage_across_kmers
1 parse_blastp
1 pileup
1 postprocess_after_decontamination
1 quality_filter
5 read_stats
1 rename_contigs
1 rename_megahit_output
1 run_checkm_lineage_wf
1 run_checkm_tree_qa
1 run_diamond_blastp
1 run_maxbin
1 run_megahit
1 run_prokka_annotation
1 sort_munged_blast_hits
1 update_prokka_tsv
46

rule init_QC:
input: /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/data/PHH12_O-8024.3.89990.GGTAGC.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R2.fastq.gz
log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_init.log
jobid: 32
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
priority: 80
threads: 24
resources: mem=40

reformat.sh in=/media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/data/PHH12_O-8024.3.89990.GGTAGC.fastq.gz interleaved=t out1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R1.fastq.gz out2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R2.fastq.gz qout=33 overwrite=true verifypaired=t addslash=t trimreaddescription=t threads=24 -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_init.log

Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Finished job 32.
1 of 46 steps (2%) done

rule read_stats:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R2.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/raw.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/raw_read_counts.tsv
log: PHH12-O-8024.3.89990.GGTAGC/logs/read_stats.log
jobid: 13
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, step=raw
priority: 30
threads: 24
resources: mem=40

Finished job 13.
2 of 46 steps (4%) done

rule deduplicate:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R2.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R2.fastq.gz
log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_deduplicate.log
jobid: 33
benchmark: logs/benchmarks/deduplicate/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
resources: mem=40

        clumpify.sh                 in=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R1.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R2.fastq.gz                 out1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R1.fastq.gz out2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R2.fastq.gz                 overwrite=true                dedupe=t                 dupesubs=2                 optical=f                threads=24                 -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_deduplicate.log

Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R1.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R2.fastq.gz.
Finished job 33.
3 of 46 steps (7%) done

rule read_stats:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R2.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/deduplicated.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/deduplicated_read_counts.tsv
log: PHH12-O-8024.3.89990.GGTAGC/logs/read_stats.log
jobid: 17
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, step=deduplicated
priority: 30
threads: 24
resources: mem=40

Finished job 17.
4 of 46 steps (9%) done

rule quality_filter:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R2.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_quality_filtering_stats.txt
log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_quality_filter.log
jobid: 19
benchmark: logs/benchmarks/quality_filter/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
resources: mem=40

    bbduk2.sh in1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R1.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R2.fastq.gz             out1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R1.fastq.gz out2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R2.fastq.gz outs=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_se.fastq.gz             rref=/home/william/dbs/atlas_db.v2/adapters.fa lref=/home/william/dbs/atlas_db.v2/adapters.fa             mink=8 qout=33 stats=PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_quality_filtering_stats.txt             hdist=1 k=27 trimq=10             qtrim=rl threads=24             minlength=51 trd=t             minbasefrequency=0.05             interleaved=t            overwrite=true             ecco=t             -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_quality_filter.log

Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R1.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R2.fastq.gz.
Finished job 19.
5 of 46 steps (11%) done

rule read_stats:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R2.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/filtered.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/filtered_read_counts.tsv
log: PHH12-O-8024.3.89990.GGTAGC/logs/read_stats.log
jobid: 15
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, step=filtered
priority: 30
threads: 24
resources: mem=40

Finished job 15.
6 of 46 steps (13%) done

rule build_decontamination_db:
output: ref/genome/1/summary.txt
log: logs/build_decontamination_db.log
jobid: 31
threads: 24
resources: mem=40

bbsplit.sh -Xmx40G ref_PhiX=/home/william/dbs/atlas_db.v2/phiX174_virus.fa ref_rRNA=/home/william/dbs/atlas_db.v2/silva_rfam_all_rRNAs.fa threads=24 k=13 local=t 2> logs/build_decontamination_db.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Finished job 31.
7 of 46 steps (15%) done

rule decontamination:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_se.fastq.gz, ref/genome/1/summary.txt
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/PhiX_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/PhiX_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/PhiX_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/rRNA_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/rRNA_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/rRNA_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_decontamination_reference_stats.txt
log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_decontamination.log
jobid: 12
benchmark: logs/benchmarks/decontamination/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
resources: mem=40

        if [ "true" = true ] ; then
            bbsplit.sh in1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R1.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R2.fastq.gz                     outu1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R1.fastq.gz outu2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R2.fastq.gz                     basename="PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/%_R#.fastq.gz"                     maxindel=20 minratio=0.65                     minhits=1 ambiguous=best refstats=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_decontamination_reference_stats.txt                    threads=24 k=13 local=t                     -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_decontamination.log
        fi

        bbsplit.sh in=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_se.fastq.gz                  outu=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_se.fastq.gz                 basename="PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/%_se.fastq.gz"                 maxindel=20 minratio=0.65                 minhits=1 ambiguous=best refstats=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_decontamination_reference_stats.txt append                 interleaved=f threads=24 k=13 local=t                 -Xmx40G 2>> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_decontamination.log

Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R2.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_se.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R1.fastq.gz.
Finished job 12.
8 of 46 steps (17%) done

rule read_stats:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R2.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/clean.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/clean_read_counts.tsv
log: PHH12-O-8024.3.89990.GGTAGC/logs/read_stats.log
jobid: 14
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, step=clean
priority: 30
threads: 24
resources: mem=40

Finished job 14.
9 of 46 steps (20%) done

localrule postprocess_after_decontamination:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_se.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz
jobid: 11
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC

localrule initialize_checkm:
output: logs/checkm_init.txt
log: logs/initialize_checkm.log
jobid: 29

python /home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/rules/initialize_checkm.py /home/william/dbs/atlas_db.v2/checkm logs/checkm_init.txt logs/initialize_checkm.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/9596cb25.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R1.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_se.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R2.fastq.gz.
Finished job 11.
10 of 46 steps (22%) done
Finished job 29.
11 of 46 steps (24%) done

rule read_stats:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_counts.tsv
log: PHH12-O-8024.3.89990.GGTAGC/logs/read_stats.log
jobid: 16
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, step=QC
priority: 30
threads: 24
resources: mem=40

Finished job 16.
12 of 46 steps (26%) done

rule normalize_coverage_across_kmers:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_se.fastq.gz
log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_normalization.log
jobid: 45
benchmark: logs/benchmarks/normalization/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
resources: mem=40

    if [ in=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz != "null" ];
    then
        bbnorm.sh in=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz                 extra=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz,PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz                 out=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_se.fastq.gz                 k=21 t=100                 interleaved=f minkmers=15 prefilter=t                 threads=24                 -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_normalization.log
    fi

    if [ t = "t" ];
    then
        bbnorm.sh in=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz                 extra=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz                 out=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R1.fastq.gz out2=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R2.fastq.gz                 k=21 t=100                 interleaved=f minkmers=15 prefilter=t                 threads=24                 -Xmx40G 2>> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_normalization.log
    fi

Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Finished job 45.
13 of 46 steps (28%) done

rule merge_pairs:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_se.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_se.fastq.gz
log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_merge_pairs.log
jobid: 44
benchmark: logs/benchmarks/merge_pairs/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, previous_steps=normalized
threads: 24
resources: mem=40

Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R2.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_se.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R1.fastq.gz.
Finished job 44.
14 of 46 steps (30%) done

rule error_correction:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_se.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_se.fastq.gz
log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_error_correction.log
jobid: 43
benchmark: logs/benchmarks/error_correction/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, previous_steps=normalized.merged
threads: 24
resources: mem=40

    tadpole.sh -Xmx40G             prealloc=1             in1=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R1.fastq.gz,PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_se.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R2.fastq.gz             out1=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R1.fastq.gz,PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_se.fastq.gz out2=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R2.fastq.gz             mode=correct             threads=24             ecc=t ecco=t 2>> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_error_correction.log

Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R2.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R1.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_se.fastq.gz.
Finished job 43.
15 of 46 steps (33%) done

rule run_megahit:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_se.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter.contigs.fa
log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_megahit.log
jobid: 40
benchmark: logs/benchmarks/assembly/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 8
resources: mem=50

Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R2.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_se.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R1.fastq.gz.
Finished job 40.
16 of 46 steps (35%) done

localrule rename_megahit_output:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter.contigs.fa
output: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_raw_contigs.fasta
jobid: 34
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC

cp PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter.contigs.fa PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_raw_contigs.fasta
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter.contigs.fa.
Finished job 34.
17 of 46 steps (37%) done

rule rename_contigs:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_raw_contigs.fasta
output: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta
jobid: 23
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC

rename.sh in=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_raw_contigs.fasta out=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta ow=t prefix=PHH12-O-8024.3.89990.GGTAGC
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_raw_contigs.fasta.
Finished job 23.
18 of 46 steps (39%) done

rule calculate_prefiltered_contig_coverage_stats:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta
output: PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/prefilter_coverage_stats.txt, PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/alignement_to_prefilter_contigs.sam
log: PHH12-O-8024.3.89990.GGTAGC/assembly/logs/prefiltered_contig_coverage_stats.log
jobid: 35
benchmark: logs/benchmarks/calculate_prefiltered_contig_coverage_stats/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
resources: mem=40

bbwrap.sh nodisk=t ref=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta in1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz,PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz,null fast=t interleaved=auto threads=24 -Xmx40G append out=PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/alignement_to_prefilter_contigs.sam 2> PHH12-O-8024.3.89990.GGTAGC/assembly/logs/prefiltered_contig_coverage_stats.log

        pileup.sh ref=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta in=PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/alignement_to_prefilter_contigs.sam threads=24             -Xmx40G covstats=PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/prefilter_coverage_stats.txt physcov 2>> PHH12-O-8024.3.89990.GGTAGC/assembly/logs/prefiltered_contig_coverage_stats.log

Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/alignement_to_prefilter_contigs.sam.
Finished job 35.
19 of 46 steps (41%) done

rule filter_by_coverage:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta, PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/prefilter_coverage_stats.txt
output: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta, PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_discarded_contigs.fasta
log: PHH12-O-8024.3.89990.GGTAGC/assembly/logs/filter_by_coverage.log
jobid: 24
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
resources: mem=40

filterbycoverage.sh in=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta cov=PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/prefilter_coverage_stats.txt out=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta outd=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_discarded_contigs.fasta minc=5 minp=40 minr=0 minl=2200 trim=100 -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/assembly/logs/filter_by_coverage.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.

rule calculate_contigs_stats:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta
output: PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/prefilter_contig_stats.txt
jobid: 4
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, assembly_step=prefilter
resources: mem=40

stats.sh in=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta format=3 -Xmx40G > PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/prefilter_contig_stats.txt
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Finished job 4.
20 of 46 steps (43%) done
Finished job 24.
21 of 46 steps (46%) done

rule calculate_contigs_stats:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta
output: PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/final_contig_stats.txt
jobid: 5
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, assembly_step=final
resources: mem=40

stats.sh in=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta format=3 -Xmx40G > PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/final_contig_stats.txt
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.

localrule finalize_contigs:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta
output: PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta
jobid: 36
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC

cp PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta
Finished job 36.
22 of 46 steps (48%) done
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta.
Finished job 5.
23 of 46 steps (50%) done

rule align_reads_to_final_contigs:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta
output: PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam, PHH12-O-8024.3.89990.GGTAGC/assembly/unmapped_post_filter/PHH12-O-8024.3.89990.GGTAGC_unmapped_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/unmapped_post_filter/PHH12-O-8024.3.89990.GGTAGC_unmapped_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/unmapped_post_filter/PHH12-O-8024.3.89990.GGTAGC_unmapped_se.fastq.gz
log: PHH12-O-8024.3.89990.GGTAGC/assembly/logs/contig_coverage_stats.log
jobid: 38
benchmark: logs/benchmarks/align_reads_to_filtered_contigs/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
resources: mem=40

bbwrap.sh nodisk=t ref=PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta in1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz,PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz,null trimreaddescriptions=t outm=PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam outu1=PHH12-O-8024.3.89990.GGTAGC/assembly/unmapped_post_filter/PHH12-O-8024.3.89990.GGTAGC_unmapped_R1.fastq.gz,PHH12-O-8024.3.89990.GGTAGC/assembly/unmapped_post_filter/PHH12-O-8024.3.89990.GGTAGC_unmapped_se.fastq.gz outu2=PHH12-O-8024.3.89990.GGTAGC/assembly/unmapped_post_filter/PHH12-O-8024.3.89990.GGTAGC_unmapped_R2.fastq.gz,null threads=24 pairlen=1000 pairedonly=t mdtag=t xstag=fs nmtag=t sam=1.3 local=t ambiguous=best secondary=t ssao=t maxsites=10 -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/assembly/logs/contig_coverage_stats.log

Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Finished job 38.
24 of 46 steps (52%) done

rule pileup:
input: PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta, PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam
output: PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_base_coverage.txt.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_histogram.txt, PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_stats.txt, PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_binned.txt
log: PHH12-O-8024.3.89990.GGTAGC/assembly/logs/contig_coverage_stats.log
jobid: 42
benchmark: logs/benchmarks/align_reads_to_filtered_contigs/PHH12-O-8024.3.89990.GGTAGC_pileup.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
resources: mem=40

pileup.sh ref=PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta in=PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam threads=24 -Xmx40G covstats=PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_stats.txt hist=PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_histogram.txt basecov=PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_base_coverage.txt.gz concise=t physcov=t secondary=f bincov=PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_binned.txt 2>> PHH12-O-8024.3.89990.GGTAGC/assembly/logs/contig_coverage_stats.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_base_coverage.txt.gz.
Finished job 42.
25 of 46 steps (54%) done

rule convert_sam_to_bam:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam
output: PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.bam
jobid: 27
wildcards: file=PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC
threads: 24

samtools view -@ 24 -bSh1 PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam | samtools sort -m 1536M -@ 24 -T /tmp/PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC_tmp -o PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.bam -O bam -
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam.
Finished job 27.
26 of 46 steps (57%) done

rule run_prokka_annotation:
input: PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta
output: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.err, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.faa, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.ffn, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.fna, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.fsa, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gbk, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gff, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.log, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.sqn, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.tbl, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.tsv, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.txt
jobid: 25
benchmark: logs/benchmarks/prokka/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24

prokka --outdir PHH12-O-8024.3.89990.GGTAGC/annotation/prokka --force --prefix PHH12-O-8024.3.89990.GGTAGC --locustag PHH12-O-8024.3.89990.GGTAGC --kingdom Bacteria --metagenome --cpus 24 PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Finished job 25.
27 of 46 steps (59%) done

rule calculate_insert_size:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_insert_size_hist.txt, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_length_hist.txt
log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_calculate_insert_size.log
jobid: 18
benchmark: logs/benchmarks/merge_pairs/PHH12-O-8024.3.89990.GGTAGC_insert_size.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
resources: mem=40

        bbmerge.sh -Xmx40G threads=24                 in1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz                 loose ecct k=62                 extend2=50                 ihist=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_insert_size_hist.txt merge=f                 mininsert0=35 minoverlap0=8                 prealloc=t prefilter=t                 minprob=0.8 2> >(tee PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_calculate_insert_size.log)

        readlength.sh in=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz out=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_length_hist.txt 2> >(tee PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_calculate_insert_size.log)

Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Finished job 18.
28 of 46 steps (61%) done

rule convert_gff_to_gtf:
input: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gff
output: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gtf
jobid: 28
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC

localrule combine_insert_stats:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_insert_size_hist.txt
output: stats/insert_stats.tsv
jobid: 21

localrule combine_read_length_stats:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_length_hist.txt
output: stats/read_length_stats.tsv
jobid: 22

localrule finalize_QC:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_decontamination_reference_stats.txt, PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_quality_filtering_stats.txt, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/raw.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/deduplicated.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/filtered.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/clean.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/raw_read_counts.tsv, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/deduplicated_read_counts.tsv, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/filtered_read_counts.tsv, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/clean_read_counts.tsv, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_counts.tsv, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_length_hist.txt
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/finished_QC, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/read_counts.tsv
jobid: 2
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC

rule update_prokka_tsv:
input: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gff
output: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC_plus.tsv
jobid: 6
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC

rule make_maxbin_abundance_file:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_stats.txt
output: PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC_contig_coverage.tsv
jobid: 39
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
atlas gff2tsv PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gff PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC_plus.tsv

Finished job 28.
29 of 46 steps (63%) done
Finished job 39.
30 of 46 steps (65%) done
Finished job 21.
31 of 46 steps (67%) done
Finished job 22.
32 of 46 steps (70%) done
Touching output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/finished_QC.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/raw_read_counts.tsv.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/clean_read_counts.tsv.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/filtered_read_counts.tsv.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_counts.tsv.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/deduplicated_read_counts.tsv.
Finished job 2.
33 of 46 steps (72%) done

localrule combine_read_counts:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/read_counts.tsv
output: stats/read_counts.tsv
jobid: 20

Finished job 6.
34 of 46 steps (74%) done
Finished job 20.
35 of 46 steps (76%) done

rule run_diamond_blastp:
input: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.faa, /home/william/dbs/atlas_db.v2/refseq.dmnd
output: PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits.tsv
jobid: 41
benchmark: logs/benchmarks/run_diamond_blastp/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24

diamond blastp --threads 24 --outfmt 6 --out PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits.tsv --query PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.faa --db /home/william/dbs/atlas_db.v2/refseq.dmnd --top 2 --evalue 1e-06 --id 50 --query-cover 50 --gapopen 11 --gapextend 1 --tmpdir /tmp --block-size 2 --index-chunks 4
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Finished job 41.
36 of 46 steps (78%) done

rule add_contig_metadata:
input: PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits.tsv, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gff
output: PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus.tsv
jobid: 37
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC

localrule QC_report:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/finished_QC, stats/read_counts.tsv, stats/insert_stats.tsv, stats/read_length_stats.tsv
output: finished_QC
jobid: 3
atlas munge-blast PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits.tsv PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gff PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus.tsv

    if [ -d ref ]; then
        rm -r ref
    fi

Touching output file finished_QC.
Finished job 3.
37 of 46 steps (80%) done
Finished job 37.
38 of 46 steps (83%) done

rule sort_munged_blast_hits:
input: PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus.tsv
output: PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus_sorted.tsv
jobid: 26
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC

sort -k1,1 -k2,2 -k13,13rn PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus.tsv > PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus_sorted.tsv
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus.tsv.
Finished job 26.
39 of 46 steps (85%) done

rule run_maxbin:
input: PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta, PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC_contig_coverage.tsv
output: PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC.summary, PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC.marker
log: PHH12-O-8024.3.89990.GGTAGC/logs/maxbin2.log
jobid: 30
benchmark: logs/benchmarks/maxbin2/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24

run_MaxBin.pl -contig PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta -abund PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC_contig_coverage.tsv -out PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC -min_contig_length 200 -thread 24 -prob_threshold 0.9 -max_iteration 50 > PHH12-O-8024.3.89990.GGTAGC/logs/maxbin2.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/9596cb25.
Finished job 30.
40 of 46 steps (87%) done

rule run_checkm_lineage_wf:
input: logs/checkm_init.txt, PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC.marker
output: PHH12-O-8024.3.89990.GGTAGC/genomic_bins/checkm/completeness.tsv
jobid: 10
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24

rm -r PHH12-O-8024.3.89990.GGTAGC/genomic_bins/checkm && checkm lineage_wf --file PHH12-O-8024.3.89990.GGTAGC/genomic_bins/checkm/completeness.tsv --tab_table --quiet --extension fasta --threads 24 PHH12-O-8024.3.89990.GGTAGC/genomic_bins PHH12-O-8024.3.89990.GGTAGC/genomic_bins/checkm
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/9596cb25.
Finished job 10.
41 of 46 steps (89%) done

rule find_counts_per_region:
input: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gtf, PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.bam
output: PHH12-O-8024.3.89990.GGTAGC/annotation/feature_counts/PHH12-O-8024.3.89990.GGTAGC_counts.txt.summary, PHH12-O-8024.3.89990.GGTAGC/annotation/feature_counts/PHH12-O-8024.3.89990.GGTAGC_counts.txt
log: PHH12-O-8024.3.89990.GGTAGC/logs/counts_per_region.log
jobid: 9
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24

featureCounts -p −−minOverlap 1 -B -F gtf -T 24 --primary -O --fraction -t CDS -g ID -a PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gtf -o PHH12-O-8024.3.89990.GGTAGC/annotation/feature_counts/PHH12-O-8024.3.89990.GGTAGC_counts.txt PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.bam 2> PHH12-O-8024.3.89990.GGTAGC/logs/counts_per_region.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Error in rule find_counts_per_region:
jobid: 9
output: PHH12-O-8024.3.89990.GGTAGC/annotation/feature_counts/PHH12-O-8024.3.89990.GGTAGC_counts.txt.summary, PHH12-O-8024.3.89990.GGTAGC/annotation/feature_counts/PHH12-O-8024.3.89990.GGTAGC_counts.txt
log: PHH12-O-8024.3.89990.GGTAGC/logs/counts_per_region.log

RuleException:
CalledProcessError in line 683 of /home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/rules/assemble.snakefile:
Command 'source activate /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89; set -euo pipefail; featureCounts -p −−minOverlap 1 -B -F gtf -T 24 --primary -O --fraction -t CDS -g ID -a PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gtf -o PHH12-O-8024.3.89990.GGTAGC/annotation/feature_counts/PHH12-O-8024.3.89990.GGTAGC_counts.txt PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.bam 2> PHH12-O-8024.3.89990.GGTAGC/logs/counts_per_region.log ' returned non-zero exit status 255.
File "/home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/rules/assemble.snakefile", line 683, in __rule_find_counts_per_region
File "/home/william/miniconda3/envs/atlas.72/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Complete log: /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/log/2018-01-10T094504.256226.snakemake.log

Issue with databases download

Hi

While running the databases download I run into this error for each of the files to download

Error in job transfer_files while creating output file /data1/Atlas/databases/refseq.tree.
RuleException:
CalledProcessError in line 30 of /home/test/miniconda3/envs/atlas_env/lib/python3.5/site-packages/atlas/rules/download.snakefile:
Command 'curl 'ftp://observers:[email protected]/outgoing/atlas/refseq.tree' -s > /data1/Atlas/databases/refseq.tree' returned non-zero exit status 8
File "/home/test/miniconda3/envs/atlas_env/lib/python3.5/site-packages/atlas/rules/download.snakefile", line 30, in __rule_transfer_files
File "/home/test/miniconda3/envs/atlas_env/lib/python3.5/concurrent/futures/thread.py", line 55, in run
Exiting because a job execution failed. Look above for error message

Best
Greg

Genome binning

@brwnj I' started implementing concoct and am hesitating to use anvi'o. which has a lot of additional tools (may be too much).. what do you think?

Co-assembly

@brwnj My professor and I don't think that a co-assembly over multiple files is a good Idea. He proposes me to do filter out known genomes to better assemble unknown.
On the other hand, co-assembly is something which is done in the field (e.g. recomended by Meren anvi'o).

Let's try and see if megahit manages the computational burden.

Make normalization optional

Normalization how it is implemented now normalizes :

  1. the single end/merged paired-end reads, but takes the unnormed paired end reads into account.
  2. the paired end reads but takes the unnormed single end reads into account.

To me this seems like two biased normalizations.

Megahit shout be able to handle very different abundant metagenomes. Doesn't help for the assembly?Spades has a internal normalization function, so we could use this one instead of bbnorm.

diamond database version

Hi. Installed atlas on a new server through conda. I am now getting this error message:

#CPU threads: 12
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Percentage range of top alignment score to report hits: 2
Temporary directory: /tmp
Opening the database... [0.00483s]
Error: Database was built with a different version of diamond as is incompatible.
Error in job run_diamond_blastp while creating output file cRN5-S40-L001-001/megahit_21_121_20_normalization_k21_t100/functional_annotation/refseq/cRN5-S40-L001-001_hits.tsv.
RuleException:
CalledProcessError in line 672 of /home/mdjlynch/anaconda3/lib/python3.5/site-packages/atlas/rules/assemble.snakefile:
Command ' diamond blastp --threads 12 --outfmt 6 --out cRN5-S40-L001-001/megahit_21_121_20_normalization_k21_t100/functional_annotation/refseq/cRN5-S40-L001-001_hits.tsv --query cRN5-S40-L001-001/megahit_21_121_20_normalization_k21_t100/functional_annotation/prokka/cRN5-S40-L001-001.faa --db /home/mdjlynch/databases/refseq.dmnd --top 2 --evalue 1e-06 --id 50 --query-cover 60 --gapopen 11 --gapextend 1 --tmpdir /tmp --block-size 2 --index-chunks 4' returned non-zero exit status 1
File "/home/mdjlynch/anaconda3/lib/python3.5/site-packages/atlas/rules/assemble.snakefile", line 672, in __rule_run_diamond_blastp
File "/home/mdjlynch/anaconda3/lib/python3.5/concurrent/futures/thread.py", line 55, in run
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message

I will rebuild with a current version of diamond to see if that resolves the problem but thought I should post the issue to github. Thanks.

snakemake tries to recreate conda environments on cluster without internet connection

Hello,

I have a new problem with the new version of Atlas

I run atlas on a slurm cluster without internet connection. In the log I see, that snakemake tries to create the conda environement for my rules.

kieser@bee Preprocessing]$ cat slurm-2901691.out 
[Fri Jan 19 16:37:32 2018] Building DAG of jobs... 
[Fri Jan 19 16:37:42 2018] Creating conda environment
../atlas/atlas/envs/required_packages.yaml...

I didn't had this problem before using the same cluster submission scripts. Before it was only activated on the cluster.

Can someone say me what changes?

The same happens when I run first the command with --create-envs-only,

Count table and assignment output

Should be able to easily determine confidence of assignment. Will also need to know more about the contig like length, number of ORFs, etc.

snakemake issue with atlas assemble

Hello. Having an issue with atlas assemble on Ubuntu 16.04. Dependencies have been installed and using atlas_env.

atlas assemble mgm4637809.3.yaml
...
subprocess.CalledProcessError: Command 'snakemake -s... returned non-zero exit status 1

Looks like a very useful pipeline. Any help is appreciated.

Thanks.

bioconda

need to add click as prerequisite

cluster deployment

@camel315 have you already heard of the snakemake cluster configuration?

It is the ideal way to let atlas control the submission to the cluster system.
I implemented a cluster script for a slurm system, but they should be easily adaptable for an LSF system. Do you want it to try?

16S analysis

Hey @brwnj

I remember in the original figure you had planned to recover 16S sequences and to apply MerCat.

I started applying Atlas on my data. Now I have a discrepancy between 16S rDNA amplicon sequencing and 16S recovered from the metagenome (decontamination step). I find an OTU increased with amplicon sequencing which I don't find increased when mapping the metagenome metagenome reads to the representatives of my amplicon OTUs.

We think it might originate from the deduplication step. Clumpify removes exact duplicates. what do you think? Is it a good Idea to do de-duplication on the 16S reads in the metagenome?
The same question for error correction?

If not it would be an additional argument for #31

Cluster support

@brwnj Cluster support

If I understood it correctly you want to use the environmental variables SHPFXM to allow cluster support.

Currently I'm using cluster scheduler scripts described here.

what do you think?

Conda Env create fails on fraggenescan

Having an issue with installing ATLAS on an Ubuntu 16.04 server using miniconda. All dependencies install correctly, with the exception of maxbin2. When installing I receive the following error:

Using Anaconda API: https://api.anaconda.org
Fetching package metadata ...............
Solving package specifications:

NoPackagesFoundError: Dependency missing in current linux-64 channels:

  • maxbin2 -> fraggenescan >=1.30 -> perl 5.22.0*

Any help is appreciated!

Max bin don't find enough marker genes.

I had several times the problem that maxbin gave me the following error.

MaxBin 2.2.1
Input contig: F35/F35_contigs.fasta
Located abundance file [F35/genomic_bins/F35_contig_coverage.tsv]
out header: F35/genomic_bins/F35
Min contig length: 200
Thread: 6
Probability threshold: 0.9
Max iteration: 50
Searching against 107 marker genes to find starting seed contigs for [F35/F35_contigs.fasta]...
Try harder to dig out marker genes from contigs.
Marker gene search reveals that the dataset cannot be binned (the medium of marker gene number <= 1). Program stop.

But my contigs statistics don't look so bad. I don't know what's the problem.

n_scaffolds | scaf_bp    | scaf_N50 | scaf_L50 | scaf_N90 | scaf_L90 | scaf_max |
12470         | 65564177 | 1130       | 13095       | 7423        | 1725       | 165795     | 

and the cov file looks like this:

F35_0   18.9348
F35_1   15.5894
F35_2   18.8915
F35_3   20.1290
F35_4   28.7693
F35_5   33.6804
F35_6   24.2962
F35_7   19.3721
F35_8   20.3123
F35_9   17.1395

QC report

Are the number of reads passing a steps recorded somewhere? So we could aggregate them at the end to a QC report?

Preprocessing sequences

In the configuration we added:

conf["assembly_preprocessing_steps"]=['normalized', 'errorcorr', 'merged']

This gives the impression that these steps are optional or that their order can be changed. They are currently required steps of preprocessing and occur linearly starting with normalization.

Error in assemble

When I try make assemble (atlas assemble ../config.yaml), I'm having the follow error:

5 of 31 steps (16%) done
Error in rule normalize_coverage_across_kmers:
    jobid: 30
    output: testeverificar/sequence_quality_control/testeverificar_04_pe.fastq.gz
    log: testeverificar/logs/testeverificar_normalization.log

RuleException:
CalledProcessError in line 213 of /home/lricardo/miniconda3/lib/python3.5/site-packages/pnnl_atlas-1.0.15-py3.5.egg/atlas/rules/assemble.snakefile:
Command ' bbnorm.sh in=testeverificar/sequence_quality_control/testeverificar_03_pe.fastq.gz out=testeverificar/sequence_quality_control/testeverificar_04_pe.fastq.gz k=21 t=100                interleaved=t minkmers=15 prefilter=t                threads=40 2> testeverificar/logs/testeverificar_normalization.log' returned non-zero exit status 137
  File "/home/lricardo/miniconda3/lib/python3.5/site-packages/pnnl_atlas-1.0.15-py3.5.egg/atlas/rules/assemble.snakefile", line 213, in __rule_normalize_coverage_across_kmers
  File "/home/lricardo/miniconda3/lib/python3.5/concurrent/futures/thread.py", line 55, in run
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2017-10-02T14:03:40.202404.snakemake.log
Traceback (most recent call last):
  File "/home/lricardo/miniconda3/bin/atlas", line 11, in <module>
    load_entry_point('pnnl-atlas==1.0.15', 'console_scripts', 'atlas')()
  File "/home/lricardo/miniconda3/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/lricardo/miniconda3/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/lricardo/miniconda3/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/lricardo/miniconda3/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/lricardo/miniconda3/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/lricardo/miniconda3/lib/python3.5/site-packages/pnnl_atlas-1.0.15-py3.5.egg/atlas/atlas.py", line 218, in run_assemble
    assemble(os.path.realpath(config), jobs, out_dir, no_conda, dryrun, snakemake_args)
  File "/home/lricardo/miniconda3/lib/python3.5/site-packages/pnnl_atlas-1.0.15-py3.5.egg/atlas/workflows.py", line 47, in assemble
    check_call(cmd, shell=True)
  File "/home/lricardo/miniconda3/lib/python3.5/subprocess.py", line 271, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'snakemake -s /home/lricardo/miniconda3/lib/python3.5/site-packages/pnnl_atlas-1.0.15-py3.5.egg/atlas/Snakefile -d /home/lricardo/resultados_testecerto -p -j 80 --rerun-incomplete --configfile '/home/lricardo/config.yaml' --nolock --use-conda --config workflow=complete  --' returned non-zero exit status 1

Config file is invalid

Hi,
following the tutorial, I generated the config file and when I try to run the software it tells me that the "The configuration file is invalid."
Greg

config.txt

decontamination databases may overwrite each other

I got the following exception during decontamination.
F33_decontamination.log

A similar error was discussed here:

This means that you were running multiple different indexing processes in the same directory at the same time. Unless you use a different directory for each process, or specify a different index location with "path=", or specify a different build number, the indexes can overwrite each other leading to corrupt zip files (which, fortunately, normally get detected, as in this case).

If you want to do all of these mapping operations to the same references, just index once, wait for it to finish, and then run all the mapping operations without specifying "ref=". E.g.

bbsplit.sh ref=ecoli.fa,salmonella.fa

(wait for finish)

bbsplit.sh a.fq basename=out_a_%.fq
bbsplit.sh b.fq basename=out_b_%.fq
bbsplit.sh c.fq basename=out_c_%.fq
...etc

If each one needs different references, then either run them serially, or use a different directory/build each time.

I definitively think that we should

  • make a rule for building the reference once for all decontamination steps.
  • use database file as input for decontamination step
  • execute decontamination in a shadow directory

qc.snakefile

@brwnj There is probably an error in line 407 of qc.snakefile. Please check.
rule calculate_insert_size:
input: OD3/sequence_quality_control/OD3_QC_R2.fastq.gz, OD3/sequence_quality_control/OD3_QC_R1.fastq.gz, OD3/sequence_quality_control/OD3_QC_se.fastq.gz
output: OD3/sequence_quality_control/read_stats/QC_insert_size_hist.txt, OD3/sequence_quality_control/read_stats/QC_read_length_hist.txt
log: OD3/logs/OD3_calculate_insert_size.log
jobid: 4
benchmark: logs/benchmarks/merge_pairs/OD3_insert_size.txt
wildcards: sample=OD3

threads: 24
resources: mem=72

bbmerge.sh -Xmx60G threads=36 in1=OD3/sequence_quality_control/OD3_QC_R1.fastq.gz in2=OD3/sequence_quality_control/OD3_QC_R2.fastq.gz ecct k=62 extend2=50 ihist=OD3/sequence_quality_control/read_stats/QC_insert_size_hist.txt merge=f mininsert0=35 minoverlap0=8 2> >(tee OD3/logs/OD3_calculate_insert_size.log)

            readlength.sh in=OD3/sequence_quality_control/OD3_QC_R1.fastq.gz in2=OD3/sequence_quality_control/OD3_QC_R2.fastq.gz out=OD3/sequence_quality_control/read_stats/QC_read_length_hist.txt 2> >(tee OD3/logs/OD3_calculate_insert_size.log)

Activating conda environment /panfs/panfs14.gfz-hpcc.cluster/home/gmb/syang/results/.snakemake/conda/dbc7d302.
Error in rule calculate_insert_size:
jobid: 4
output: OD3/sequence_quality_control/read_stats/QC_insert_size_hist.txt, OD3/sequence_quality_control/read_stats/QC_read_length_hist.txt
log: OD3/logs/OD3_calculate_insert_size.log

RuleException:
CalledProcessError in line 407 of /home/syang/anaconda3/lib/python3.5/site-packages/atlas/rules/qc.snakefile:
Command 'source activate /home/syang/results/.snakemake/conda/dbc7d302; set -euo pipefail; bbmerge.sh -Xmx60G threads=36 in1=OD3/sequence_quality_control/OD3_QC_R1.fastq.gz in2=OD3/sequence_quality_control/OD3_QC_R2.fastq.gz ecct k=62 extend2=50 ihist=OD3/sequence_quality_control/read_stats/QC_insert_size_hist.txt merge=f mininsert0=35 minoverlap0=8 2> >(tee OD3/logs/OD3_calculate_insert_size.log)

            readlength.sh in=OD3/sequence_quality_control/OD3_QC_R1.fastq.gz in2=OD3/sequence_quality_control/OD3_QC_R2.fastq.gz out=OD3/sequence_quality_control/read_stats/QC_read_length_hist.txt 2> >(tee OD3/logs/OD3_calculate_insert_size.log) ' returned non-zero exit status 1

File "/home/syang/anaconda3/lib/python3.5/site-packages/atlas/rules/qc.snakefile", line 407, in __rule_calculate_insert_size
File "/home/syang/anaconda3/lib/python3.5/concurrent/futures/thread.py", line 55, in run
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Complete log: /home/syang/.snakemake/log/2017-12-20T232825.193914.snakemake.log
[2017-12-20 23:31 CRITICAL] Command 'snakemake --snakefile /home/syang/anaconda3/lib/python3.5/site-packages/atlas/Snakefile --directory /home/syang/results --printshellcmds --jobs 24 --rerun-incomplete --configfile '/home/syang/config.yaml' --nolock --use-conda --config workflow=complete --latency-wait 240 --until calculate_insert_size' returned non-zero exit status 1

Even after setting the prealloc=t for the tadpole.sh which is involved in rule calculate_insert_size within qc.snakefile, it still report 'run out of memory'

Exception in thread "Thread-8" java.lang.OutOfMemoryError: GC overhead limit exceeded
at ukmer.KmerNodeU.(KmerNodeU.java:22)
at ukmer.KmerNodeU1D.(KmerNodeU1D.java:22)
at ukmer.HashForestU.makeNode(HashForestU.java:47)
at ukmer.HashForestU.makeNode(HashForestU.java:43)
at ukmer.HashForestU.incrementAndReturnNumCreated(HashForestU.java:130)
at ukmer.HashArrayU1D.incrementAndReturnNumCreated(HashArrayU1D.java:72)
at ukmer.HashBufferU.dumpBuffer_inner(HashBufferU.java:196)
at ukmer.HashBufferU.dumpBuffer(HashBufferU.java:168)
at ukmer.HashBufferU.incrementAndReturnNumCreated(HashBufferU.java:57)
at ukmer.KmerTableSetU$LoadThread.addKmersToTable(KmerTableSetU.java:553)
at ukmer.KmerTableSetU$LoadThread.run(KmerTableSetU.java:479)
Exception in thread "Thread-26" java.lang.OutOfMemoryError: GC overhead limit exceeded
at ukmer.KmerNodeU.(KmerNodeU.java:22)
at ukmer.KmerNodeU1D.(KmerNodeU1D.java:22)
at ukmer.HashForestU.makeNode(HashForestU.java:47)
at ukmer.HashForestU.makeNode(HashForestU.java:43)
at ukmer.HashForestU.incrementAndReturnNumCreated(HashForestU.java:130)
at ukmer.HashArrayU1D.incrementAndReturnNumCreated(HashArrayU1D.java:72)
at ukmer.HashBufferU.dumpBuffer_inner(HashBufferU.java:196)
at ukmer.HashBufferU.dumpBuffer(HashBufferU.java:168)
at ukmer.HashBufferU.incrementAndReturnNumCreated(HashBufferU.java:57)
at ukmer.KmerTableSetU$LoadThread.addKmersToTable(KmerTableSetU.java:553)
at ukmer.KmerTableSetU$LoadThread.run(KmerTableSetU.java:489)
Exception in thread "Thread-3" Exception in thread "Thread-14" java.lang.OutOfMemoryError: GC overhead limit exceeded
at ukmer.KmerNodeU.(KmerNodeU.java:22)
at ukmer.KmerNodeU1D.(KmerNodeU1D.java:22)
at ukmer.HashForestU.makeNode(HashForestU.java:47)
at ukmer.HashForestU.makeNode(HashForestU.java:43)
at ukmer.HashForestU.incrementAndReturnNumCreated(HashForestU.java:130)
at ukmer.HashArrayU1D.incrementAndReturnNumCreated(HashArrayU1D.java:72)
at ukmer.HashBufferU.dumpBuffer_inner(HashBufferU.java:196)
at ukmer.HashBufferU.dumpBuffer(HashBufferU.java:168)
at ukmer.HashBufferU.incrementAndReturnNumCreated(HashBufferU.java:57)
at ukmer.KmerTableSetU$LoadThread.addKmersToTable(KmerTableSetU.java:553)
at ukmer.KmerTableSetU$LoadThread.run(KmerTableSetU.java:489)
java.lang.OutOfMemoryError: GC overhead limit exceeded
at ukmer.HashForestU.makeNode(HashForestU.java:47)
at ukmer.HashForestU.makeNode(HashForestU.java:43)
at ukmer.HashForestU.incrementAndReturnNumCreated(HashForestU.java:130)
at ukmer.HashArrayU1D.incrementAndReturnNumCreated(HashArrayU1D.java:72)
at ukmer.HashBufferU.dumpBuffer_inner(HashBufferU.java:196)
at ukmer.HashBufferU.dumpBuffer(HashBufferU.java:168)
at ukmer.HashBufferU.incrementAndReturnNumCreated(HashBufferU.java:57)
at ukmer.KmerTableSetU$LoadThread.addKmersToTable(KmerTableSetU.java:553)
at ukmer.KmerTableSetU$LoadThread.run(KmerTableSetU.java:479)
Exception in thread "Thread-19" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "Thread-28" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "Thread-27" Exception in thread "Thread-9" java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded

This program ran out of memory.
Try increasing the -Xmx flag and using tool-specific memory-related parameters.

Any solutions? Thank you.

Error correction

I think error correction should be later in the pipeline e.g after the decontamination step.
tedpole has a quite aggressive error correction, which is optimized for spades according to Brian, but this might not be the best solution for megahit and downstream analysis, e.g. SNP calling.

rRNA reads should be excluded from error correction, I think.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.