Combine data from CheckM, CheckM2, and GTDB-Tk into a single report
export PATH=/programs/R-4.2.0/bin:$PATH
Rscript binSummary.R \
-q quality_report.tsv \
-a abundance_report.tsv \
-g gtdbtk.bac120.summary.tsv \
-n sample \
-o /path/to/outdir
Option | Description |
---|---|
-q / --quality_report |
REQUIRED: completeness and contamination report generated by checkm2 predict |
-a / --abundance_report |
REQUIRED: abundance report generated by checkm profile |
-g / --gtdbtk |
REQUIRED: taxa report generated by gtdbtk classify_wf |
-o / --out_dir |
Output directory without trailing / (default: getwd() ) |
-n / --name |
REQUIRED: sample name to append to output file (${name}_binSummary.txt ) |
The output file has 18 fields.
Field | Description |
---|---|
bin | bin name |
contamination | checkm2 predict : percent contamination |
model | checkm2 predict : completeness model |
completeness | checkm2 predict : percent completeness |
Mbp | bin size |
mapped_reads | reads mapping to contigs comprising the bin |
percent_mapped_reads | checkm profile : (reads mapped to bin)/(total number of reads mapped to assembly) |
percent_binned_populations | checkm profile : the proportion of a bin relative to all recovered bins |
percent_community | checkm profile : the proportion of a bin relative to the number of reads mapped to the assembly, adjusted for bin size |
domain | from gtbtk classify_wf |
phylum | ... |
class | ... |
order | ... |
family | ... |
genus | ... |
species | ... |
closest_reference | closest RefSeq predicted by gtbtk classify_wf |
gtdbtk_warnings | notes from gtbtk classify_wf |
binSummary.R
makes use of the native pipe and therefore requires R โฅ 4.1.0.- the script will attempt to install
optparse
,dplyr
,readr
,tidyr
, andstringr
if not already installed - tested with CheckM v1.2.1, CheckM2 v0.1.3, and GTDB-Tk v2.1.1