This is a set of miscellaneous bioinformatics tools.
This script extracts cutadapt statistics from from a set of cutadapt log files and outputs the data in a tabular format.
cutadapt-stats [-h] [-v] [-n] [-c] <filename> [<filename> ...]
positional arguments:
<filename>
cutadapt log file(s) to process
optional arguments:
-h
,--help
show this help message and exit-v
,--version
show program's version number and exit-n
,--no-header
suppress column headers-c
,--csv
output data as CSV (ignores -n)
For each log file, either a single entry (if it was a single pair) or two entries (if it was a pair) are returned. The output columns are:
id
: The log file IDfile
: The log filetype
: The typer_total
: Total reads processedb_total
: Total bases processedr_out
: Total reads passing filterb_out
: Total bases passing filterr_adapt
: Number of reads with adapter contaminationr_rm_long
: Reads rejected as too long (flag -M)r_rm_short
: Reads rejected as too short (flag -mr_rm_other
: Reads rejected for any other reasonb_rm_adapt
: Bases trimmed as adapterb_rm_qual
: Bases trimmed as low quality (flag -q)b_rm_other
: Reads trimmed for any other reason
The script should be executable on any system that has python3 installed. On python version <3.2, the argparse package is not included in the standard library and must be separately installed.
This script processes a set of FastQC ZIP files to produce summary tables.
fastqc-extract [-h] [-v] [-c] [-e] [-s] <module> <filename> [<filename> ...]
positional arguments:
<module>
The module<filename>
FastQC ZIP file(s) to process
optional arguments:
-h
,--help
show this help message and exit-v
,--version
show program's version number and exit-c
,--csv
return CSV-formatted data-e
,--header
include header in output-s
,--summary
generate module summary
Modules can be provided by name or by number. Non-standard modules must be supplied by name:
Number | Name |
---|---|
1 | Basic Statistics |
2 | Per base sequence quality |
3 | Per tile sequence quality |
4 | Per sequence quality scores |
5 | Per base sequence content |
6 | Per sequence GC content |
7 | Per base N content |
8 | Sequence Length Distribution |
9 | Sequence Duplication Levels |
10 | Overrepresented sequences |
11 | Adapter Content |
12 | Kmer Content |
The script should be executable on any system that has python3 installed. On python version <3.2, the argparse package is not included in the standard library and must be separately installed.