Giter Site home page Giter Site logo

nbisweden / generode Goto Github PK

View Code? Open in Web Editor NEW
23.0 10.0 7.0 334.68 MB

GitHub repository for GenErode, a Snakemake pipeline for the analysis of whole-genome sequencing data from historical and modern samples to study patterns of genome erosion.

License: GNU General Public License v3.0

Python 99.13% Dockerfile 0.87%

generode's Introduction

GenErode pipeline

logo

GitHub repository for GenErode, a Snakemake pipeline for the analysis of whole-genome sequencing data from historical and modern samples to study patterns of genome erosion.

Documentation

The full pipeline documentation can be found on the repository wiki.

Citation

If you've used GenErode to produce results, please cite our paper:

Kutschera VE, Kierczak M, van der Valk T, von Seth J, Dussex N, Lord E, Dehasque M, Stanton DWG, Emami P, Nystedt B, Dalén L, Díez-del-Molino D (2022) GenErode: a bioinformatics pipeline to investigate genome erosion in endangered and extinct species. BMC Bioinformatics 23, 228 https://doi.org/10.1186/s12859-022-04757-0

Pipeline overview

processing

Figure 1: Overview of the GenErode pipeline data processing tracks. Input and output files formats, dependencies between steps, and main software used are shown. Optional steps are highlighted in red.

analysis

Figure 2: Overview of the GenErode pipeline data analysis tracks and final reports. Input file formats and main software used are shown.

Licence information

GenErode pipeline

Copyright (C) 2022 Verena Kutschera

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Logo: Jonas Söderberg

generode's People

Contributors

verku avatar

Stargazers

 avatar John Whalen avatar ls-wq avatar qianche avatar  avatar Ingo Achim Müller avatar  avatar bingli avatar Julia avatar André Soares avatar Shrikant Sharma avatar Andreas Sjödin avatar zhangwenda avatar Thomas M. Huber avatar Andrey Tomarovsky avatar Lore Ament avatar  avatar Marianne Dehasque avatar Jonas Lescroart avatar  avatar Pedro Morell Miranda avatar Bertrand Servin avatar atongsa avatar

Watchers

James Cloos avatar Joel Hedlund avatar  avatar Jonas Hagberg avatar  avatar Andreas Kähäri avatar Kostas Georgiou avatar  avatar Julia avatar zhangwenda avatar

generode's Issues

Repeatmasker fails when attempting to zip the *cat file

Submitted via email:

The main issue seems to be the following error:
gzip: reference_genomic.upper.fasta.cat: No such file or directory

...

For example, when I run repeatmasker, it creates a folder with the path ../GenErode/reference_genomes/repeatmasker/reference_genomic/reference_genomic.upper.fasta.preSatMar42206472023.RMoutput, and the reference_genomic.upper.fasta.cat file is found in that folder. The job then fails saying the ".cat" file does not exist.

Issues with cookiecutter in UPPMAX

Hello,

I'm trying to set up the slurm profile with cookiecutter (using the latest version of the pipeline v0.4.2), but the options look very different to what is described in GenErode's Wiki.

This is how it looks like:

2022-12-14

Apparently it is asking me to manually input all the configuration instead of retrieving it from the config/cluster.yaml.

I used cookiecutter earlier this year both in UPPMAX and another slurm-based cluster and it was working exactly as it is described in the Wiki. I'm not sure if the issue I'm finding now is related to recent changes in UPPMAX or in the Snakemake git profile.

Thanks in advance, I'd appreciate any suggestions on how to proceed in order to set up the Snakemake profile.

Replace rescale_gerp rule with gerpcol parameter "-s"

Replace the "rescale_gerp" rule with the gerpcol parameter "-s 0.001" in the "compute_gerp" rule, which is doing the exact same thing (dividing GERP scores by 1000). That way, it gets a bit easier for advanced pipeline users to change or remove the scaling parameter if necessary for their project, e.g. if they want to generate GERP scores for another purpose than relative mutational load estimates.

rule 7_mlrho dependant on final 3.1 bam.bai temporary file

Rule 7_mlRho looks for .bam.bai files as input even though it only uses the .bam files. However in both 3.1 and 3.3 bam processing steps, these bam.bai files are marked as temporary and deleted at end of the pipeline run. Thus, if any files are missing from the expected output of 3.1 (i.e. sorted bams), then the pipeline will remap all affected samples from the beginning.

This is different to the 4_genotyping rule, which is only dependant of the final bam file from 3.1/3.2/3.3, and thus the absence of the bam.bai file will not trigger a remapping of any samples.

One option would be to make the bam.bai files not temporary, or otherwise remove the code calling them in 7_mlRho.smk (which works).

Implement shadow rules for some rules (e.g. repeatmodeler)

Shadow rules result in each execution of the rule to be run in isolated temporary directories. This “shadow” directory contains symlinks to files and directories in the current workdir. This is useful for running programs that generate lots of unused files which you don’t want to manually cleanup in your snakemake workflow. It can also be useful if you want to keep your workdir clean while the program executes, or simplify your workflow by not having to worry about unique filenames for all outputs of all rules.
shadow: "minimal" symlinks the inputs to the rule.
Once the rule successfully executes, the output file will be moved if necessary to the real path as indicated by output.
Shadow directories are stored one per rule execution in .snakemake/shadow/, and are cleared on successful execution. Consider running with the --cleanup-shadow argument every now and then to remove any remaining shadow directories from aborted jobs. The base shadow directory can be changed with the --shadow-prefix command line argument.

Singularity issue?

Hi,

Since the last maintenance window on uppmax, I am having issues running the pipeline and got the following error:

WorkflowError:
Minimum singularity version is 2.4.1.
File "/home/nicd/.conda/envs/generode/lib/python3.7/site-packages/snakemake/deployment/singularity.py", line 48, in init

I tried recreating the generode environment, as such:

conda env create -n generode -f environment.yml
conda activate generode

But I still have the same issue when launching the snakemake command.

Would you mind helping me with this please?

Thanks!

Rerun issue with snakemake 7.8

Many of us are now running GenErode with snakemake 7 to avoid issues with Singularity version changes. However, Snakemake has changed their rerun behaviour in Snakemake 7.8 (see snakemake/snakemake#1694). This means that when changing metadata tables for example, snakemake will run everything from the beginning, stating "Set of input files has changed since last execution". To get around this you can use "--rerun-triggers mtime" in the snakemake command. Also applies to any local changes in code or other parameters.

snpEff prepare_db_build does not recognize file type

ImproperOutputException in line 148 of /crex/proj/sllstore2017093/b2016342/b2016342_nobackup/lts/genome_erosion_pipeline/verena_testing/maintenance/issue
s/GenErode/workflow/rules/12_snpEff.smk:
Outputs of incorrect type (directories when expecting files or vice versa). Output directories must be flagged with directory(). for rule prepare_db_buil
d:
/proj/sllstore2017093/b2016342/b2016342_nobackup/lts/genome_erosion_pipeline/verena_testing/development/testdata/gerp/outgroup_Sc9M7eS_2_HRSCAF_41/all_sc
affolds/snpEff/data/GCF_000283155.1_CerSimSim1.0_genomic.Sc9M7eS_2_HRSCAF_41/genes.gtf

rule multiqc_historical_raw: Stuck on 'Searching'

Whenever I try to run multiQC rules, the rule gets stuck on [INFO ] multiqc : Searching : /path/to/stats/, and my job gets cancelled due to reaching the requested job time limit. I've tried increasing the time to 6 hours instead of 2, but all this time it's still searching for files. For rule multiqc_historical_raw I've got around 550 files, but still, should it really take that long to search?

log file:

[WARNING]         multiqc : MultiQC Version v1.14 now available!
[INFO   ]         multiqc : This is MultiQC v1.9
[INFO   ]         multiqc : Template    : default
[INFO   ]         multiqc : Searching   : /path/to/GenErode/workflow/data/raw_reads_symlinks/historical/stats

slurm standard error file:

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 5
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=1000, disk_mb=1000
Select jobs to execute...

[Wed May 10 21:50:20 2023]
rule multiqc_historical_raw:
    input: data/raw_reads_symlinks/historical/stats/K1010_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K1010_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K1011_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K1011_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K1012_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K1012_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K101_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K101_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K102_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K102_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K103_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K103_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K104_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K104_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K105_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K105_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K106_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K106_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K107_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K107_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K108_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K108_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K109_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K109_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K111_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K111_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K121_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K121_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K122_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K122_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K123_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K123_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K1310_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K1310_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13111_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13111_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K1311_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K1311_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K1312_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K1312_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K1313_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K1313_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13141_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13141_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13142_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13142_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K1315_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K1315_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K1316_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K1316_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K1317_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K1317_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13181_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13181_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K1318_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K1318_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13191_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13191_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13192_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13192_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13193_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13193_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13194_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13194_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13195_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13195_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K131_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K131_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13201_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13201_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13202_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13202_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13203_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13203_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13204_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13204_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13205_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13205_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13206_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13206_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13207_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13207_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13208_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13208_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13209_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13209_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13211_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13211_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13212_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13212_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13213_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13213_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13214_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13214_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13215_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13215_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13221_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13221_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13222_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13222_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13223_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13223_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13224_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13224_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13225_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13225_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13231_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13231_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13232_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13232_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13233_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13233_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13234_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13234_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13241_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13241_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13242_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13242_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13243_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13243_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13244_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13244_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13245_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13245_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13251_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13251_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13252_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13252_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13253_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13253_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13254_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13254_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13255_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13255_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13256_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13256_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13261_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13261_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13262_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13262_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13271_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13271_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13272_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13272_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13273_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13273_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13281_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13281_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13282_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13282_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13291_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13291_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13292_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13292_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K132_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K132_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13301_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13301_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13302_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13302_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13311_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13311_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13312_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13312_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13321_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13321_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13322_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13322_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13323_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13323_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13331_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13331_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13332_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13332_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13333_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13333_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13334_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13334_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13335_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13335_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13341_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13341_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13342_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13342_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13343_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13343_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13344_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13344_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13345_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13345_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13346_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13346_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13347_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13347_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13351_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13351_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13352_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13352_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13361_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13361_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13362_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13362_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13363_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13363_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13364_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13364_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13365_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13365_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13366_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13366_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13371_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13371_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K13372_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K13372_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K133_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K133_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K134_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K134_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K135_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K135_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K136_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K136_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K137_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K137_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K138_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K138_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K139_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K139_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K41_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K41_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K42_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K42_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K43_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K43_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K44_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K44_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K51_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K51_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K52_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K52_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K610_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K610_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K611_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K611_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K61_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K61_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K62_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K62_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K631_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K631_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K63_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K63_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K64_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K64_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K65_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K65_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K66_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K66_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K67_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K67_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K68_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K68_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K69_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K69_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K6eleven_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K6eleven_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K71_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K71_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K72_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K72_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K73_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K73_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K81_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K81_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K911_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K911_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K91_01_01_R1_fastqc.html, data/raw_reads_symlinks/historical/stats/K91_01_01_R2_fastqc.html, data/raw_reads_symlinks/historical/stats/K1010_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1010_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1011_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1011_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1012_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1012_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K101_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K101_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K102_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K102_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K103_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K103_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K104_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K104_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K105_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K105_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K106_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K106_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K107_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K107_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K108_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K108_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K109_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K109_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K111_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K111_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K121_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K121_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K122_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K122_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K123_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K123_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1310_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1310_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13111_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13111_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1311_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1311_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1312_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1312_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1313_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1313_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13141_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13141_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13142_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13142_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1315_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1315_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1316_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1316_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1317_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1317_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13181_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13181_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1318_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K1318_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13191_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13191_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13192_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13192_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13193_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13193_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13194_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13194_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13195_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13195_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K131_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K131_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13201_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13201_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13202_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13202_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13203_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13203_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13204_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13204_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13205_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13205_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13206_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13206_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13207_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13207_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13208_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13208_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13209_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13209_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13211_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13211_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13212_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13212_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13213_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13213_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13214_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13214_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13215_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13215_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13221_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13221_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13222_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13222_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13223_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13223_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13224_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13224_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13225_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13225_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13231_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13231_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13232_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13232_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13233_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13233_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13234_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13234_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13241_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13241_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13242_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13242_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13243_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13243_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13244_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13244_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13245_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13245_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13251_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13251_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13252_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13252_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13253_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13253_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13254_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13254_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13255_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13255_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13256_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13256_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13261_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13261_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13262_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13262_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13271_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13271_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13272_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13272_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13273_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13273_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13281_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13281_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13282_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13282_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13291_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13291_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13292_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13292_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K132_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K132_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13301_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13301_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13302_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13302_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13311_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13311_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13312_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13312_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13321_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13321_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13322_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13322_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13323_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13323_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13331_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13331_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13332_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13332_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13333_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13333_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13334_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13334_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13335_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13335_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13341_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13341_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13342_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13342_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13343_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13343_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13344_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13344_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13345_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13345_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13346_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13346_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13347_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13347_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13351_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13351_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13352_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13352_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13361_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13361_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13362_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13362_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13363_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13363_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13364_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13364_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13365_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13365_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13366_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13366_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13371_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13371_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13372_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K13372_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K133_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K133_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K134_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K134_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K135_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K135_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K136_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K136_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K137_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K137_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K138_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K138_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K139_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K139_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K41_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K41_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K42_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K42_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K43_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K43_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K44_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K44_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K51_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K51_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K52_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K52_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K610_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K610_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K611_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K611_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K61_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K61_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K62_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K62_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K631_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K631_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K63_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K63_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K64_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K64_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K65_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K65_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K66_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K66_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K67_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K67_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K68_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K68_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K69_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K69_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K6eleven_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K6eleven_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K71_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K71_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K72_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K72_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K73_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K73_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K81_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K81_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K911_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K911_01_01_R2_fastqc.zip, data/raw_reads_symlinks/historical/stats/K91_01_01_R1_fastqc.zip, data/raw_reads_symlinks/historical/stats/K91_01_01_R2_fastqc.zip
    output: data/raw_reads_symlinks/historical/stats/multiqc/multiqc_report.html
    log: data/logs/1.1_fastq_processing/historical/multiqc_historical_raw.log
    jobid: 0
    reason: Missing output files: data/raw_reads_symlinks/historical/stats/multiqc/multiqc_report.html
    resources: mem_mb=1000, disk_mb=1000, tmpdir=/scratch/37865468

Activating singularity image /path/to/GenErode/workflow/.snakemake/singularity/55ba4dd2d036ee76e44e9a36b417da1d.simg
slurmstepd: error: *** JOB 37865468 ON r340 CANCELLED AT 2023-05-10T23:50:20 DUE TO TIME LIMIT ***

Pipeline report: image embedding

The pipeline logo is not shown whenever the report is moved from its original directory due to a hard coded path to the image

individual vcfs merging failing

Hi,

I am trying to run step 9 ( 9_merge_vcfs.smk) with the latest version of the pipeline but the first job of this rule (i.e. bcftools merge) always fails without any indication of the possible error in the slurm.out or log files. The first merged.bcf is then empty, only containing the header. It could be a memory issue (68 mammalian genomes), but I am not sure. If I run the rule manually outside the pipeline, the same first step fails as well.

Previously, I would run this rule manually with a slightly different script, but since I want to estimate load with GERP scores, I want to merge the vcfs inside the pipeline.

My guess is that the '-m snps' option is causing this problem in the bcftools merge step:

bcftools merge -m snps -O b -o {output.merged} {input.bcf} 2> {log}

I didn't use it in my manual merging script since the selection of snps only is used in the following step:

bcftools view -m2 -M2 -v snps -Ob -o {output.bcf} {input.bcf} 2> {log} &&

But maybe I am wrong.

Thanks for your help!

pipeline stuck at 'Building DAG of jobs'

Hi,

After modifying a rule (common.smk) to run GERP scores and to split the genomes in more chunks so that the step run more quickly, the pipeline (dry or main run) gets stuck to

'Config file config/config.yaml is extended by additional config specified via the command line.
Building DAG of jobs...'

The only job that is run is the job splitting the genomes into chunk*.bed files and this is done as modified in my rule (i.e. divide the genome into 2000 instead of 200 chunks). The first time I did this, I had to cancel the run, delete the 'my_Ref_path/gerp' directtory and restart it as I noticed that those bed files were still in the original set up (i.e. 200 chunks). But other than that, the pipeline was running as it was supposed to. But now, the run gets stuck as above.

So, I the did the following to try and fix it:

  • deleted all logs relevant to the GERP rule
  • deleted again the my_Ref_path/gerp
  • cleaned my conda environment
  • started a new tmux session
  • checked if my other instances of the pipeline worked (they do).

The only file that I edited was common.smk, so I am not sure what else could have gone wrong. Is there any other file/logs I should delete or something else I could check to try and fix this?

Thanks a lot!
Nic

Missing input files error for group job in mitogenome mapping step

On a different HPC than Uppmax, input file names are scrambled for some rules in the mitogenome mapping step, throwing an error in the group job historical_mito_bams_group:

Missing input files for rule mitogenome_bam_stats: results/historical/mitogenomes_mapping/CC022_02_L4_merged_cow_NC_reads_006853.sorted.bam

Try out constraining the wildcard mitoref?

filter_mpile.py does not keep first and last nucleotide in a read

The filter_mpile.py script removes (prints “N”) the first nucleotide in the “read” because they are accompanied by “^]” (read start^ and a character for mapping quality). The script also always removes (prints “N”) the last nucleotide in the “read” because they are accompanied by “$” (read end$).

Add slurm profile to repository that works with Snakemake version in the conda environment

Many clusters are using slurm. The slurm profile is continuously being updated and will at some point not work anymore with the Snakemake version the pipeline is using.
Add a folder with a slurm profile to the repository that works on Rackham, and point the users to that instead of using cookiecutter to set up the profile themselves. Users of other clusters can then manually edit the profile/config.yaml file.

Wrong input file in rule index_realigned_bams in workflow 3.1_bam_rmdup_realign_indels.smk?

Hi,

On line 545 in '3.1_bam_rmdup_realign_indels.smk' you find the rule 'index_realigned_bams', which is designed to index the output bam file from the previous rule 'indel_realigner', as I understood it.

The output bam file from rule 'indel_realigner' is: {sample}.merged.rmdup.merged.realn.bam

The input bam file for rule 'index_realigned_bams' is: {sample}.merged.rmdup.merged.bam
The output bai file from rule 'index_realigned_bams' is: {sample}.merged.rmdup.merged.realn.bam.bai

Am I missing something, or should not the input bam file for rule index_realigned_bams' then be: {sample}.merged.rmdup.merged.realn.bam?

Sincerely,
Johanna

compressing vcf file (rule remove_CpG_vcf - 8.1_vcf_CpG_filtering.smk)

Hi,

In the '8.1_vcf_CpG_filtering.smk' file, the rule 'remove_CpG_vcf' uses bedtools intersect to generate a vcf file without CpG sites, such as:

bedtools intersect -a {input.vcf} -b {input.bed} -header -sorted -g {input.genomefile} > {output.filtered} 2> {log}

However, the {output.filtered} file is not compressed and thus very large, which means that a project directory can very quickly be full.

Would it be possible to compress this vcf file to save space with something like below to generate a *vcf.gz file?

bedtools intersect -a {input.vcf} -b {input.bed} -header -sorted -g {input.genomefile} | bgzip -c > {output.filtered} 2> {log}

Much appreciated,
Nic

RepeatModeler fails on big genomes

I am trying to run repeatmodeler on a 6.2Gb genome (a concatenated genome), but get an error message. I think it might be related to this issue: Dfam-consortium/RepeatModeler#101. I am running GenErode version 0.4.1, but I don't think this issue has been addressed in any of the updates before.

Output of rule repeatmodeler below:

Building database GCF_024166365.1_mEleMax1.human_g1k_v37.DQ188829.2:
Reading ../../GCF_024166365.1_mEleMax1.human_g1k_v37.DQ188829.2.upper.fasta...
The makeblastdb program did not generate the
file GCF_024166365.1_mEleMax1.human_g1k_v37.DQ188829.2.nsq. Please check your input file(s) for potential formating errors.
/usr/local/bin/makeblastdb returned:

Building a new DB, current time: 10/11/2023 13:26:03
New DB name: ...
New DB title: ./OSWae0eqDW
Sequence type: Nucleotide
Deleted existing Nucleotide BLAST database named ....
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 149 sequences in 60.8019 seconds.

Fastqc Memory issues

Apparently the new version of Fastqc is able to handle memory better - might be good to update to that version?

FastQC v0.12.1 is now out which should fix most of these memory related issues by increasing the default memory allocation. If there are still files which generate memory errors then there's now an option (--memory) where you can specifically increase the memory allocation without having to mess around with the number of threads to use.

Add slurm profile config.yaml

As the use of a cluster configuration file cluster.yaml is discouraged, a slurm profile config.yaml with rule-specific compute resources is added

Update cluster.yaml

Users reported:

"At least two rules on the culster.yaml file fastqc_historical_merged (named the same in the rule's file "1.1_fastq_processing.smk") and convert_historical_sam (named as sai2bam in the rule's file "2_mapping.smk" ) are not getting assigned the resources (time and cpus-per-task) written for the specific rule and instead are just getting assigned the minimum default parameters defined at the beginning of the cluster.yaml. As a consequence of this bug, jobs keep getting killed because of OUT OF TIME.
In the case of fastqc_historical_merged, even though both in the cluster.yaml and in the rule's file ask for 2 cores, it only gets assigned 1. The only solution we found was to change the default from 1 to 2.
In the case of sai2bam is a bit trickier since the rule asks for 8 cores and 10 days and is only getting assigned 2 cores and 2 hours. The temporary solution I find here is to extend the time set in the default parameters from 2 to 4 or 6 hours (as I don't think it is a good idea to keep increasing the default minimum number of cores).
Seems like Snakemake is having trouble passing the right instructions onto Slurm in these two specific cases. "

"Changing the rule name in the cluster file from convert_historical_sam to sai2bam should do the trick. If things have different names in the cluster and rule file, snakemake can’t link them to each other. "

Rewrite memory allocation for java-based tools (e.g. qualimap)

resources: mem_mb=lambda wildcards, input, threads, attempt: 6000 * threads - 2000

unset DISPLAY qualimap bamqc -bam {input.bam} --java-mem-size=${resources.mem_mb}M -nt {threads} -outdir {output}

or:
def qualimap_mem(wildcards, input, threads, attempt): return 6000 * threads - 2000

resources: mem_mb=qualimap_mem

Update gerp_derived_alleles.py to work with phased genotypes

From a user contacting us:

I was getting an error and noticed that it crashes when there are genotypes in the vcf file that are phased (0|0 instead of 0/0). So I just modified line 138 to :

focal_genotype = re.split(r"/||", row[samplename])

and imported re at the beginning of the script:

import re

Problems running pipeline in Uppmax using the rhinos test dataset and slurm profile

Dear GenErode team,

First of all, thank you for developing this great tool.

I have been trying to run the pipeline in Uppmax on the test dataset of rhinos provided in the Wiki and the slurm profile, but it fails. I have examined various things, and even deleted and reinstalled both the pipeline and the condo environment several times, without luck. Thus I wonder if any of you could help me identify the source of the problem to fix it?

Attached, please find a PDF (markdown) summarising the steps followed, as well as the input and output/log files of the pipeline.

Thanks in advance for any help!

Best regards,

Angela

Troubleshooting_problem_running_test_dataset_GenErode_in_Uppmax.pdf
help_files.zip

GERP scores - size of chunks for derived alleles calculations

Hi,

I am estimating gerp scores for ~70 mammalian genomes that were mapped to a chromosome-level assembly (~2.5 Gb; 35 chromosomes and ~5000 unplaced scaffolds).

As I understand, the pipeline divides the genomes and vcfs into chunk. In my case, each chunk comprises 27 contigs or so. However, since I have a chromosome-level assembly, chunk1 and chunk2 contain all of my chromosomes (35) and the other chunks cover the remaining much smaller 5000 unplaced scaffolds. So, this means that the jobs to estimate derived alleles for chunk 1 and 2 take ~10 days, whereas the jobs for the other scaffolds are done within minutes.

Would there be a way to split the genome per scaffold/contig or to specify that each chunk should cover X number of scaffolds/contigs?

Cheers,
Nic

Update the WIKI

  • Snakemake version 7
  • remove fix for slurm profile
  • update instructions to set up the slurm profile and point to the new profile config.yaml file

Add code to edit GenErode pipeline report

The rulegraph in the report can get very large if the pipeline is run with several downstream steps set to "True". Edit the html file similarly to the report created by Snakemake prior to version 7.

snpEff build fails with "Out of memory"

For a user, the rule build_snpEff_db failed due to out of memory.

She'll test the following:

java -jar -Xmx64g /usr/local/share/snpeff-4.3.1t-3/snpEff.jar build -gtf22 -c {params.abs_config} -dataDir {params.abs_data_dir} -treatAllAsProteinCoding -v {params.ref_name} 2> {log}

If it works, update all snpEff rules with the java -jar -Xmx flags and automatic calculation of memory from the number of threads (like QualiMap and Picard)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.