snakemake-workflows / rna-seq-kallisto-sleuth Goto Github PK

View Code? Open in Web Editor NEW

63.0 6.0 41.0 5.51 MB

A Snakemake workflow for differential expression analysis of RNA-seq data with Kallisto and Sleuth.

License: MIT License

Python 62.19% R 37.81%

snakemake kallisto sleuth rna-seq differential-expression sciworkflows reproducibility

rna-seq-kallisto-sleuth's Introduction

Snakemake workflow: rna-seq-kallisto-sleuth

A Snakemake workflow for differential expression analysis of RNA-seq data with Kallisto and Sleuth.

Usage

The usage of this workflow is described in the Snakemake Workflow Catalog.

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and its DOI (see above).

rna-seq-kallisto-sleuth's People

Contributors

Stargazers

Watchers

rna-seq-kallisto-sleuth's Issues

I wonder if you guys had plans to add annotation to the transcriptome?

Do you guys have any plans to add an annotation pipeline to the transcriptome (like through Trinotate). I am struggling with that portion of the analysis and really like the snakemake workflows you guys make, they are very helpful in developing my own pipelines (working on a Trinity->Kallisto->Trinotate->Deseq2 pipeline with Snakemake).

Whitespaces in config/units.tsv

Hello,

i am currently trying to run step 4 of the Workflow:
snakemake --cores all --use-conda

Apparently whitespaces are detected in columns fq1 and fq2 in config/units.tsv

But there are none in the file, the path is functional. I used LibreOffice to customize the table, but in the Text Editor everything looks fine (Tab-seperated and no additional whitespaces). Am I missing something obvious? Still quite new in the field.

Thanks a lot!

Libcrypto.so.1.1 not available?

I am seeing an error while running kallisto sleuth pipline.

Error in group sleuth-init:
jobs:
rule sleuth_init:
jobid: 2
output: results/sleuth/model_X.rds, results/sleuth/model_X.designmatrix.rds
log: logs/sleuth/model_X.init.log (check log file(s) for error details)
rule compose_sample_sheet:
jobid: 9
output: results/sleuth/model_X.samples.tsv
log: logs/model_X.compose-sample-sheet.log (check log file(s) for error details)

Error executing group job sleuth-init on cluster (jobid: 9159f0df-40ec-5076-ac29-7ac8600dd138, external: Your job 1509748 ("snakejob.sleuth-init.9159f0df-40ec-5076-ac29-7ac8600dd138.sh") has been submitted, jobscript: /home/RNA/.snakemake/tmp.b2r32iv8/snakejob.sleuth-init.9159f0df-40ec-5076-ac29-7ac8600dd138.sh). For error details see the cluster log and the log files of the involved rule(s).
Removing output files of failed job compose_sample_sheet since they might be corrupted:
results/sleuth/model_X.samples.tsv
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-12-18T142901.516512.snakemake.log

When I looked at the log files.

cat logs/sleuth/model_X.init.log
Error: package or namespace load failed for ‘sleuth’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object >'/home/RNA/.snakemake/conda/e9ae8cf1b054539d07bb34138d5bde75_/lib/R/library/rhdf5/libs/rhdf5.so':
libcrypto.so.1.1: cannot open shared object file: No such file or directory
Execution halted

Do I need to install some other library?
Thank you.

Following is the output of snakemake/log/2023-12-18T142901.516512.snakemake.log

Workflow defines that rule get_transcriptome is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule get_annotation is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule get_transcript_info is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule convert_pfam is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule calculate_cpat_hexamers is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule calculate_cpat_logit_model is eligible for caching between workflows (use the --cache argument to enable this).
Workflow defines that rule get_spia_db is eligible for caching between workflows (use the --cache argument to enable this).
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Using shell: /usr/bin/bash
Provided cluster nodes: 1
Singularity containers: ignored
Job stats:
job count

all 1
compose_sample_sheet 2
cutadapt_pe 2
diffexp_datavzrd 1
get_transcript_info 1
get_transcriptome 1
ihw_fdr_control 3
kallisto_index 1
kallisto_quant 2
logcount_matrix 1
plot_bootstrap 1
plot_diffexp_heatmap 1
plot_diffexp_pval_hist 3
plot_fragment_length_dist 2
plot_group_density 1
plot_pca 1
render_datavzrd_config_diffexp 1
sleuth_diffexp 1
sleuth_init 2
vega_volcano_plot 1
total 29

Select jobs to execute...

[Mon Dec 18 14:29:11 2023]
rule cutadapt_pe:
input: /RNAs/RNAseq_test_PPM1D_231201_N12/EOL-1_GSK_1.fq.gz, /RNAs/RNAseq_test_PPM1D_231201_N12/EOL-1_GSK_2.fq.gz
output: results/trimmed/EOL-2-1.1.fastq.gz, results/trimmed/EOL-2-1.2.fastq.gz, results/trimmed/EOL-2-1.qc.txt
log: results/logs/cutadapt/EOL-2-1.log
jobid: 8
reason: Missing output files: results/trimmed/EOL-2-1.2.fastq.gz, results/trimmed/EOL-2-1.1.fastq.gz
wildcards: sample=EOL-2, unit=1
threads: 8
resources: mem_mb=16486, mem_mib=15723, disk_mb=16486, disk_mib=15723, tmpdir=

Submitted job 8 with external jobid 'Your job 1509741 ("snakejob.cutadapt_pe.8.sh") has been submitted'.
[Mon Dec 18 14:33:01 2023]
Finished job 8.
1 of 29 steps (3%) done
Select jobs to execute...

[Mon Dec 18 14:33:01 2023]
rule cutadapt_pe:
input: /RNAs/RNAseq_test_PPM1D_231201_N12/EOL-1_DMSO_1.fq.gz, /RNAs/RNAseq_test_PPM1D_231201_N12/EOL-1_DMSO_2.fq.gz
output: results/trimmed/EOL-1-1.1.fastq.gz, results/trimmed/EOL-1-1.2.fastq.gz, results/trimmed/EOL-1-1.qc.txt
log: results/logs/cutadapt/EOL-1-1.log
jobid: 4
reason: Missing output files: results/trimmed/EOL-1-1.2.fastq.gz, results/trimmed/EOL-1-1.1.fastq.gz
wildcards: sample=EOL-1, unit=1
threads: 8
resources: mem_mb=15400, mem_mib=14687, disk_mb=15400, disk_mib=14687, tmpdir=

Submitted job 4 with external jobid 'Your job 1509742 ("snakejob.cutadapt_pe.4.sh") has been submitted'.
[Mon Dec 18 14:39:51 2023]
Finished job 4.
2 of 29 steps (7%) done
Select jobs to execute...

[Mon Dec 18 14:39:51 2023]
rule get_transcript_info:
output: resources/transcript-info.rds
log: logs/get_transcript_info.log
jobid: 10
reason: Missing output files: resources/transcript-info.rds
resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=

Submitted job 10 with external jobid 'Your job 1509743 ("snakejob.get_transcript_info.10.sh") has been submitted'.
[Mon Dec 18 14:44:41 2023]
Finished job 10.
3 of 29 steps (10%) done
Select jobs to execute...

[Mon Dec 18 14:44:41 2023]
rule get_transcriptome:
output: resources/transcriptome.cdna.fasta
log: logs/get-transcriptome/cdna.log
jobid: 6
reason: Missing output files: resources/transcriptome.cdna.fasta
wildcards: type=cdna
resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=

Submitted job 6 with external jobid 'Your job 1509744 ("snakejob.get_transcriptome.6.sh") has been submitted'.
[Mon Dec 18 14:47:42 2023]
Finished job 6.
4 of 29 steps (14%) done
Select jobs to execute...

[Mon Dec 18 14:47:42 2023]
rule kallisto_index:
input: resources/transcriptome.cdna.fasta
output: results/kallisto_cdna/transcripts.cdna.idx
log: results/logs/kallisto_cdna/index.cdna.log
jobid: 5
reason: Missing output files: results/kallisto_cdna/transcripts.cdna.idx; Input files updated by another job: resources/transcriptome.cdna.fasta
resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=

Submitted job 5 with external jobid 'Your job 1509745 ("snakejob.kallisto_index.5.sh") has been submitted'.
[Mon Dec 18 14:54:22 2023]
Finished job 5.
5 of 29 steps (17%) done
Select jobs to execute...

[Mon Dec 18 14:54:22 2023]
rule kallisto_quant:
input: results/trimmed/EOL-2-1.1.fastq.gz, results/trimmed/EOL-2-1.2.fastq.gz, results/kallisto_cdna/transcripts.cdna.idx
output: results/kallisto_cdna/EOL-2-1
log: results/logs/kallisto_cdna/quant/EOL-2-1.log
jobid: 7
reason: Missing output files: results/kallisto_cdna/EOL-2-1; Input files updated by another job: results/kallisto_cdna/transcripts.cdna.idx, results/trimmed/EOL-2-1.2.fastq.gz, results/trimmed/EOL-2-1.1.fastq.gz
wildcards: sample=EOL-2, unit=1
threads: 5
resources: mem_mb=20797, mem_mib=19834, disk_mb=20797, disk_mib=19834, tmpdir=

Submitted job 7 with external jobid 'Your job 1509746 ("snakejob.kallisto_quant.7.sh") has been submitted'.
[Mon Dec 18 15:20:13 2023]
Finished job 7.
6 of 29 steps (21%) done
Select jobs to execute...

[Mon Dec 18 15:20:13 2023]
rule kallisto_quant:
input: results/trimmed/EOL-1-1.1.fastq.gz, results/trimmed/EOL-1-1.2.fastq.gz, results/kallisto_cdna/transcripts.cdna.idx
output: results/kallisto_cdna/EOL-1-1
log: results/logs/kallisto_cdna/quant/EOL-1-1.log
jobid: 3
reason: Missing output files: results/kallisto_cdna/EOL-1-1; Input files updated by another job: results/kallisto_cdna/transcripts.cdna.idx, results/trimmed/EOL-1-1.2.fastq.gz, results/trimmed/EOL-1-1.1.fastq.gz
wildcards: sample=EOL-1, unit=1
threads: 5
resources: mem_mb=19758, mem_mib=18843, disk_mb=19758, disk_mib=18843, tmpdir=

Submitted job 3 with external jobid 'Your job 1509747 ("snakejob.kallisto_quant.3.sh") has been submitted'.
[Mon Dec 18 15:37:44 2023]
Finished job 3.
7 of 29 steps (24%) done
Select jobs to execute...
[Mon Dec 18 15:37:45 2023]

group job sleuth-init (jobs in lexicogr. order):

[Mon Dec 18 15:37:45 2023]
rule compose_sample_sheet:
input: config/samples.tsv, config/units.tsv, results/kallisto_cdna/EOL-1-1, results/kallisto_cdna/EOL-2-1
output: results/sleuth/model_X.samples.tsv
log: logs/model_X.compose-sample-sheet.log
jobid: 9
reason: Missing output files: results/sleuth/model_X.samples.tsv; Input files updated by another job: results/kallisto_cdna/EOL-1-1, results/kallisto_cdna/EOL-2-1
wildcards: model=model_X
resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=

[Mon Dec 18 15:37:45 2023]
rule sleuth_init:
input: results/kallisto_cdna/EOL-1-1, results/kallisto_cdna/EOL-2-1, results/sleuth/model_X.samples.tsv, resources/transcript-info.rds
output: results/sleuth/model_X.rds, results/sleuth/model_X.designmatrix.rds
log: logs/sleuth/model_X.init.log
jobid: 2
reason: Missing output files: results/sleuth/model_X.rds; Input files updated by another job: resources/transcript-info.rds, results/kallisto_cdna/EOL-1-1, results/sleuth/model_X.samples.tsv, results/kallisto_cdna/EOL-2-1
wildcards: model=model_X
threads: 6
resources: mem_mb=, disk_mb=, tmpdir=

Submitted group job 9159f0df-40ec-5076-ac29-7ac8600dd138 with external jobid 'Your job 1509748 ("snakejob.sleuth-init.9159f0df-40ec-5076-ac29-7ac8600dd138.sh") has been submitted'.
[Mon Dec 18 15:38:04 2023]
Error in group sleuth-init:
jobs:
rule sleuth_init:
jobid: 2
output: results/sleuth/model_X.rds, results/sleuth/model_X.designmatrix.rds
log: logs/sleuth/model_X.init.log (check log file(s) for error details)
rule compose_sample_sheet:
jobid: 9
output: results/sleuth/model_X.samples.tsv
log: logs/model_X.compose-sample-sheet.log (check log file(s) for error details)

Error executing group job sleuth-init on cluster (jobid: 9159f0df-40ec-5076-ac29-7ac8600dd138, external: Your job 1509748 ("snakejob.sleuth-init.9159f0df-40ec-5076-ac29-7ac8600dd138.sh") has been submitted, jobscript: /home/RNA/.snakemake/tmp.b2r32iv8/snakejob.sleuth-init.9159f0df-40ec-5076-ac29-7ac8600dd138.sh). For error details see the cluster log and the log files of the involved rule(s).
Removing output files of failed job compose_sample_sheet since they might be corrupted:
results/sleuth/model_X.samples.tsv
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-12-18T142901.516512.snakemake.log

Missing input files for rule kallisto_index:

Sorry if this is a dumb question. I'm very new to snakemake. I got the following error and I don't know how to fix it.

MissingInputException in line 1 of /home/user/GitClones/rna-seq-kallisto-sleuth/workflow/rules/quant.smk:
Missing input files for rule kallisto_index:
resources/ref/Homo-sapiens.GRCh38.cdna.chr21.fa

This is probably not what I am supposed to do but I downloaded kallisto indexes and put them in the /resources/ref/ directory hoping it would solve the problem. It did not. Anyway fasta is not divided by chromosome.

Any help would be appreciated.

problem with creating new copy as a template

Hi, I tried to create a new copy of this repository using the "Use this template" button and got this error. "We're sorry something went wrong. We were unable to clone snakemake-workflows/rna-seq-kallisto-sleuth's contents into dence/test_rnaseq_snakemake. The template you used includes files that are larger than 10 megabytes. Please ask snakemake-workflows to remove those files from the template and try again."

I was able to clone the repository and checked out the contents with du and saw that there is a 17M file "./rna-seq-kallisto-sleuth/.test/report.html", which is probably the cause of that error.

Thanks,
Daniel

cutadapt not available?

Hi,
Thank you for building a nice tool and a pipeline. First time user of snakemake. I followed the document to install the snakemake.

my sample.tsv looks like

sample condition batch_effect
EOL-1 DMSO batch1
EOL-1 GSK batch1

units.tsv look like

sample unit fragment_len_mean fragment_len_sd fq1 fq2
EOL-1 1 NA NA /WTS/RNAseq_test_1D_231201_N12/EOL-1_DMSO_1.fq.gz /NAS2/WTS/RNAseq_test_1D_231201_N12/EOL-1_DMSO_2.fq.gz
EOL-1 1 NA NA /WTS/RNAseq_test_1D_231201_N12/EOL-1_GSK_1.fq.gz /WTS/RNAseq_test_1D_231201_N12/EOL-1_GSK_2.fq.gz

when I run snakemake --cluster qsub --jobs 1
following is the output.

Workflow defines that rule get_transcriptome is eligible for caching between workflows (use the -->cache argument to enable this).
Workflow defines that rule get_annotation is eligible for caching between workflows (use the --cache >argument to enable this).
Workflow defines that rule get_transcript_info is eligible for caching between workflows (use the -->cache argument to enable this).
Workflow defines that rule convert_pfam is eligible for caching between workflows (use the --cache >argument to enable this).
Workflow defines that rule calculate_cpat_hexamers is eligible for caching between workflows (use the >--cache argument to enable this).
Workflow defines that rule calculate_cpat_logit_model is eligible for caching between workflows (use >the --cache argument to enable this).
Workflow defines that rule get_spia_db is eligible for caching between workflows (use the --cache >argument to enable this).
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cluster nodes: 1
Conda environments: ignored
Singularity containers: ignored

Job stats:
job count

all 1
compose_sample_sheet 2
cutadapt_pe 2
diffexp_datavzrd 1
get_transcript_info 1
get_transcriptome 1
ihw_fdr_control 3
kallisto_index 1
kallisto_quant 2
logcount_matrix 1
plot_bootstrap 1
plot_diffexp_heatmap 1
plot_diffexp_pval_hist 3
plot_fragment_length_dist 2
plot_group_density 1
plot_pca 1
render_datavzrd_config_diffexp 1
sleuth_diffexp 1
sleuth_init 2
vega_volcano_plot 1
total 29

Select jobs to execute...

[Fri Dec 15 13:56:28 2023]
rule cutadapt_pe:
input: /WTS/RNAseq_test_1D_231201_N12/EOL-1_DMSO_1.fq.gz, /WTS/RNAseq_test_1D_231201_N12/EOL-1_DMSO_2.fq.gz
output: results/trimmed/EOL-1-1.1.fastq.gz, results/trimmed/EOL-1-1.2.fastq.gz, >results/trimmed/EOL-1-1.qc.txt
log: results/logs/cutadapt/EOL-1-1.log
jobid: 4
reason: Missing output files: results/trimmed/EOL-1-1.1.fastq.gz, results/trimmed/EOL-1-1.2.fastq.gz
wildcards: sample=EOL-1, unit=1
threads: 8
resources: mem_mb=15400, mem_mib=14687, disk_mb=15400, disk_mib=14687, tmpdir=

Submitted job 4 with external jobid 'Your job 1509737 ("snakejob.cutadapt_pe.4.sh") >has been submitted'.
[Fri Dec 15 13:56:48 2023]
Error in rule cutadapt_pe:
jobid: 4
input: /WTS/RNAseq_test_1D_231201_N12/EOL-1_DMSO_1.fq.gz, /WTS/RNAseq_test_1D_231201_N12/EOL-1_DMSO_2.fq.gz
output: results/trimmed/EOL-1-1.1.fastq.gz, results/trimmed/EOL-1-1.2.fastq.gz, >results/trimmed/EOL-1-1.qc.txt
log: results/logs/cutadapt/EOL-1-1.log (check log file(s) for error details)
conda-env: /home/him/RNA/.snakemake/conda/b97fd9bce60732e534a403eba0f5c294_
cluster_jobid: Your job 1509737 ("snakejob.cutadapt_pe.4.sh") has been submitted

Error executing rule cutadapt_pe on cluster (jobid: 4, external: Your job 1509737 ("snakejob.cutadapt_pe.4.sh") has been submitted, jobscript: /home/him/>RNA/.snakemake/tmp.s42vvmk3/snakejob.cutadapt_pe.4.sh). For error details see the >cluster log and the log files of the involved rule(s).
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-12-15T135619.975453.snakemake.log

When I looked at the log file

cat results/logs/cutadapt/EOL-1-1.log
/bin/bash: cutadapt: command not found

my question is
Should I install cutadapt and other tools?

Thank you.

can base_level be multiple if has many groups?

Dear. author,

Thanks a lot for the great tools!

May I know if it's possible to designate the base_level to multiple values or only one? If have multiple groups/comparisons, need to run separately (separate the units file and the sample file)?

Thank you very much!

diffexp:
  # samples to exclude (e.g. outliers due to technical problems)
  exclude:
  # model for sleuth differential expression analysis
  models:
    model_X:
      full: ~condition + batch
      reduced: ~batch
      # Binary valued covariate that shall be used for fold change/effect size
      # based downstream analyses.
      primary_variable: condition
      # base level of the primary variable (will be considered as denominator
      # in the fold change/effect size estimation).
      **base_level**: untreated
  # significance level to use for volcano, ma- and qq-plots

Biomart-provoked errors in sleuth-init

The sleuth_init rule fails with no useful information in the snakemake output. The logfiles logs/sleuth/all.init.log and logs/sleuth/model_X.init.log reveal the error:

➜  sleuth git:(master) ✗ cat model_X.init.log
── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.2.1     ✔ purrr   0.3.3
✔ tibble  2.1.3     ✔ dplyr   0.8.3
✔ tidyr   1.0.0     ✔ stringr 1.4.0
✔ readr   1.3.1     ✔ forcats 0.4.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
✖ dplyr::select() masks biomaRt::select()
Error in curl::curl_fetch_memory(url, handle = handle) :
  transfer closed with outstanding read data remaining
Calls: %>% ... request_fetch -> request_fetch.write_memory -> <Anonymous>
Execution halted
➜  sleuth git:(master) ✗ cat all.init.log
── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.2.1     ✔ purrr   0.3.3
✔ tibble  2.1.3     ✔ dplyr   0.8.3
✔ tidyr   1.0.0     ✔ stringr 1.4.0
✔ readr   1.3.1     ✔ forcats 0.4.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
✖ dplyr::select() masks biomaRt::select()
Error in biomaRt::getBM(attributes = c("ensembl_transcript_id", "ensembl_gene_id",  :
  The query to the BioMart webservice returned an invalid result: biomaRt expected a character string of length 1. Please report this to the mailing list.
Calls: %>% -> eval -> eval -> <Anonymous>
Execution halted
➜  sleuth git:(master) ✗

Presumably a change in the Biomart API has caused this? Someone already figured out how to fix this?

Error in rule diffexp_datavzrd

Hello,

the workflow ran almost through to the end. But the following Error occurred:

Error in rule diffexp_datavzrd:
jobid: 41
input: results/datavzrd/diffexp/model_X.yaml, results/tables/logcount-matrix/model_X.logcount-matrix.tsv, results/tables/diffexp/model_X.transcripts.diffexp.tsv, results/tables/diffexp/model_X.genes-aggregated.diffexp.tsv, results/tables/diffexp/model_X.genes-representative.diffexp.tsv, results/plots/interactive/volcano/model_X.vl.json
output: results/datavzrd-reports/diffexp-model_X
log: logs/datavzrd-report/diffexp.model_X/diffexp.model_X.log (check log file(s) for error details)
conda-env: /home/afienemann/kallisto/.snakemake/conda/e31e6658fb06c01d5e441b8bd51e5051_

RuleException:
CalledProcessError in file https://raw.githubusercontent.com/snakemake-workflows/rna-seq-kallisto-sleuth/v2.5.1/workflow/rules/datavzrd.smk, line 95:
Command 'source /home/afienemann/miniconda3/bin/activate '/home/afienemann/kallisto/.snakemake/conda/e31e6658fb06c01d5e441b8bd51e5051_'; set -euo pipefail; /home/afienemann/miniconda3/envs/snakemake/bin/python3.11 /home/afienemann/kallisto/.snakemake/scripts/tmpy4hs05xj.wrapper.py' returned non-zero exit status 1.
File "https://raw.githubusercontent.com/snakemake-workflows/rna-seq-kallisto-sleuth/v2.5.1/workflow/rules/datavzrd.smk", line 95, in __rule_diffexp_datavzrd
File "/home/afienemann/miniconda3/envs/snakemake/lib/python3.11/concurrent/futures/thread.py", line 58, in run
Exiting because a job execution failed. Look above for error message

The mentioned log:

"Error: Column "B6_mtb_d14_N1-1" under path "results/tables/diffexp/model_X.transcripts.diffexp.tsv" seems to have multiple definitions. Please check your config file."

Does the log refer to this config file? => /results/datavzrd/diffexp/model_X.yaml
What does "multiple definitions" mean in this context?

Thanks!

test if SIGNOR database can be used by SPIA pathway enrichment analysis

Check if we can get the curated SIGNOR database into a format that SPIA can use for its pathway enrichment analysis. It probably makes sense to start from this:

https://signor.uniroma2.it/getPathwayData.php?relations

Docker run fail when using the "use this template" import

Hello,
I was also making a snakemake workflow (although less evolved) to use Kallisto and Sleuth....

I tried to import the workflow in my own organisation by clicking on the "use this template" button.

All tests passed see screenshot except the Docker test.

Do you have an idea why?
Thank you
Marc

Run snakemake/[email protected]
  with:
    directory: .test
    snakefile: workflow/Snakefile
    args: --use-conda --show-failed-logs
/usr/bin/docker run --name af96b420b6d454309b40cba0b188e44ed8cacd_fdba3b --label af96b4 --workdir /github/workspace --rm -e INPUT_DIRECTORY -e INPUT_SNAKEFILE -e INPUT_ARGS -e INPUT_STAGEIN -e HOME -e GITHUB_REF -e GITHUB_SHA -e GITHUB_REPOSITORY -e GITHUB_ACTOR -e GITHUB_WORKFLOW -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GITHUB_EVENT_NAME -e GITHUB_WORKSPACE -e GITHUB_ACTION -e GITHUB_EVENT_PATH -e RUNNER_OS -e RUNNER_TOOL_CACHE -e RUNNER_TEMP -e RUNNER_WORKSPACE -e ACTIONS_RUNTIME_URL -e ACTIONS_RUNTIME_TOKEN -e GITHUB_ACTIONS=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/runner/work/_temp/_github_home":"/github/home" -v "/home/runner/work/_temp/_github_workflow":"/github/workflow" -v "/home/runner/work/rna-seq-kallisto-sleuth/rna-seq-kallisto-sleuth":"/github/workspace" af96b4:20b6d454309b40cba0b188e44ed8cacd  ".test" "workflow/Snakefile" "--use-conda --show-failed-logs" ""
Building DAG of jobs...
MissingInputException in line 1 of /github/workspace/workflow/rules/trim.smk:
Missing input files for rule cutadapt_pe:
data/reads/a.chr21.2.fq
data/reads/a.chr21.1.fq
##[error]Docker run failed with exit code 1

Changes in `units.tsv` do not promt a rerun of the workflow

When an entry is added to or removed from units.tsv, this should prompt a rerun of the whole workflow, at least from rules sleuth_init and compose_samplesheet onward. This is currently not the case.

This is might be caused by the compose_sample_sheet rule, which assembles a sample sheet for sleuth from the output of kallisto and the samples file. It is possible that changes to the units file are not properly propagated to this rule. It might be enough, to add units.tsv as an input for this rule.

Another potential point of failure could be the checkpoint sleuth_diffexp.

InputFunctionException in line 29 .../quant.smk

The error message:

$ snakemake --cores 80 --use-conda --dry-run
Building DAG of jobs...
~/miniconda3/envs/snakeflow-kallisto-sleuth/lib/python3.9/site-packages/pandas/core/indexing.py:925: PerformanceWarning: indexing past lexsort depth may impact performance.
  return self._getitem_tuple(key)
InputFunctionException in line 29 of https://github.com/snakemake-workflows/rna-seq-kallisto-sleuth/raw/v2.3.0/workflow/rules/quant.smk:
Error:
  ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Wildcards:
  sample=20SE01776
  unit=1
Traceback:
  File "https://github.com/snakemake-workflows/rna-seq-kallisto-sleuth/raw/v2.3.0/workflow/rules/common.smk", line 77, in get_trimmed
  File "~/miniconda3/envs/snakeflow-kallisto-sleuth/lib/python3.9/site-packages/pandas/core/generic.py", line 1537, in __nonzero__

My config files:

$ head config/*tsv
==> config/samples.tsv <==
sample  condition       batch_effect
20SE01776       1       London
20SF01123       1       London
20SH01318       1       London
20SH01483       1       London
20SI01410       1       London
20SP01339       1       London
20SS01190       1       London
20ST01399       1       London
20SW01296       1       London

==> config/units.tsv <==
sample  unit    fragment_len_mean       fragment_len_sd fq1     fq2
20SE01776       1       NA      NA      .../20SE01776_unmapped_left.fq .../20SE01776_unmapped_right.fq
20SF01123       1       NA      NA      .../20SF01123_unmapped_left.fq .../20SF01123_unmapped_right.fq 
(absolute paths given)

Thank you for this module in reproducible science. I really appreciate Snakemake and it's workflows.
However, this is my first attempt could you help me get this workflow running? The documentation is a bit scarce.

conda command not available despite singularity

Hi,

First of all thanks for making this workflow available - looking forward to using it!

I was trying to run the pipeline via --use-conda --use-singularity, but when trying snakemake -n --use-singularity --use-conda --cores 10 I get an error, saying that conda is not available:

Building DAG of jobs...
Singularity image docker://continuumio/miniconda3 will be pulled.
CreateCondaEnvironmentException:
The 'conda' command is not available in the shell /usr/bin/bash that will be used by Snakemake. You have to ensure that it is in your PATH, e.g., first activating the conda base environment with `conda activate base`.

This is odd, since my understanding was that everything should run within the miniconda3 container, where conda should of course be available. Or am I misunderstanding something here? Perhaps I am just doing something wrong..

I just tried to enter the relevant miniconda3 container:

singularity shell docker://continuumio/miniconda3

and in there conda is in the path.

I am using snakemake version 5.4.5 and singularity version 2.6.1-dist.

Thanks in advance for any comments/help!

Best wishes,
Christoph

ValidationError: 'nperm' is a required property

Hi,

upon running snakemake --dag on the workflow (version 2.3.2) I got:

WorkflowError in line 8 of /home/cm/Downloads/rna-seq-kallisto-sleuth-2.3.2/workflow/rules/common.smk:
Error validating config file.
ValidationError: 'nperm' is a required property

nperm however, is not part of the current config file at the time of writing, nor explained. It is, however, part of the schema file (schemas/config.schema.yam) and the report rst file. Perhaps the parameter should be considered in the template config.yaml, too?

Best regards
Christian

config.yaml is not part of the release download

Hi,

very nice workflow! I would, however, suggest including the config-directory as a sample directory in the release, otherwise users have to download everything separately (or clone the development branch).

Best regards
Christian

Get rid of checkpoints

Checkpoints are not needed here anymore, because the report flag now can take a directory and add all contained files into the report, see here.

Hence, we need to replace the boostrap jobs with a single job that iterates over all significant genes and writes the boostrap plots into a directory. That can then be annotated for the report as follows:

report("results/plots/bootstrap", patterns=["{transcript}.bootstrap.pdf"], category=..., caption=...)

snakemake-workflows / rna-seq-kallisto-sleuth Goto Github PK

rna-seq-kallisto-sleuth's Introduction

Snakemake workflow: rna-seq-kallisto-sleuth

Usage

rna-seq-kallisto-sleuth's People

Contributors

Stargazers

Watchers

Forkers

rna-seq-kallisto-sleuth's Issues

Recommend Projects

Recommend Topics

Recommend Org