Giter Site home page Giter Site logo

epigen / enrichment_analysis Goto Github PK

View Code? Open in Web Editor NEW
18.0 2.0 0.0 1.79 MB

A Snakemake workflow for performing genomic region set and gene set enrichment analyses using LOLA, GREAT, and GSEApy.

Home Page: https://epigen.github.io/enrichment_analysis/

License: MIT License

Python 61.26% R 38.74%
bioinformatics genomic-regions enrichment-analysis atac-seq biomedical-data-science chip-seq gene-set-enrichment gene-sets rna-seq visualization

enrichment_analysis's Introduction

Genomic Region Set & (Ranked) Gene Set Enrichment Analysis & Visualization Snakemake Workflow for Human and Mouse Genomes.

DOI

Given human (hg19 or hg38) or mouse (mm9 or mm10) based genomic region sets (i.e., region sets) and/or (ranked) gene sets of interest and respective background region/gene sets, the enrichment within the configured databases is determined using LOLA, GREAT, GSEApy (over-representation analysis (ORA) & preranked GSEA) and results saved as CSV files. Additionally, the most significant results are plotted for each region/gene set, database queried, and analysis performed. Finally, the results within the same "group" (e.g., stemming from the same DEA) are aggregated per database and analysis in summary CSV files and visualized using hierarchically clustered heatmaps and bubble plots. For collaboration, communication and documentation of results, methods and workflow information a detailed self-contained HTML report can be generated.

This workflow adheres to the module specifications of MR. PARETO, an effort to augment research by modularizing (biomedical) data science. For more details and modules check out the project's repository.

If you use this workflow in a publication, please don't forget to give credits to the authors by citing it using this DOI 10.5281/zenodo.7810621.

Workflow Rulegraph

Table of contents

Authors

Software

This project wouldn't be possible without the following software and their dependencies:

Software Reference (DOI)
Enrichr https://doi.org/10.1002/cpz1.90
ggplot2 https://ggplot2.tidyverse.org/
GREAT https://doi.org/10.1371/journal.pcbi.1010378
GSEA https://doi.org/10.1073/pnas.0506580102
GSEApy https://doi.org/10.1093/bioinformatics/btac757
LOLA https://doi.org/10.1093/bioinformatics/btv612
pandas https://doi.org/10.5281/zenodo.3509134
pheatmap https://cran.r-project.org/package=pheatmap
rGREAT https://doi.org/10.1093/bioinformatics/btac745
Snakemake https://doi.org/10.12688/f1000research.29032.2

Methods

This is a template for the Methods section of a scientific publication and is intended to serve as a starting point. Only retain paragraphs relevant to your analysis. References [ref] to the respective publications are curated in the software table above. Versions (ver) have to be read out from the respective conda environment specifications (workflow/envs/*.yaml files) or post execution (results_dir/envs/enrichment_analysis/*.yaml files). Parameters that have to be adapted depending on the data or workflow configurations are denoted in squared brackets e.g. [X].

The outlined analyses were performed using the programming languages R (ver) [ref] and Python (ver) [ref] unless stated otherwise. All approaches statistically correct their results using expressed/accessible background genomic region/gene sets from the respective analyses that yielded the query region/gene sets.

Genomic region set enrichment analyses

LOLA. Genomic region set enrichment analysis was performed using LOLA (ver) [ref], which uses Fisher’s exact test. The following databases were queried [lola_dbs].

GREAT. Genomic region set enrichment analysis was performed using GREAT [ref] implemented with rGREAT (ver) [ref]. The following databases were queried [great_dbs].

Furthermore, genomic regions (query- and background-sets) were mapped to genes using GREAT and then analyzed as gene-sets as described below for a complementary and extended perspective.

Gene set enrichment analyses (GSEA)

Over-representation analysis (ORA). Gene set ORA was performed using Enrichr [ref], which uses Fisher’s exact test (i.e., hypergeometric test), implemented with GSEApy's (ver) [ref] function enrich. The following databases were queried [enrichr_dbs][local_gmt_dbs][local_json_dbs].

Preranked GSEA. Preranked GSEA was performed using GSEA [ref], implemented with GSEApy's (ver) [ref] function prerank. The following databases were queried [enrichr_dbs][local_gmt_dbs][local_json_dbs].

Aggregation The results of all queries belonging to the same analysis [group] were aggregated by method and database. Additionally, we filtered the results by retaining only the union of terms that were statistically significant (i.e. [adj_pvalue]<[adjp_th]) in at least one query.

Visualization All analysis results were visualized in the same way.

For each query, method and database combination an enrichment dot plot was used to visualize the most important results. The top [top_n] terms were ranked (along the y-axis) by the mean rank of statistical significance ([p_value]), effect-size ([effect_size]), and overlap ([overlap]) with the goal to make the results more balanced and interpretable. The significance (adjusted p-value) is denoted by the dot color, effect-size by the x-axis position, and overlap by the dot size.

The aggregated results per analysis [group], method and database combination were visualized using hierarchically clustered heatmaps and bubble plots. The union of the top [top_terms_n] most significant terms per query were determined and their effect-size and significance were visualized as hierarchically clustered heatmaps, and statistical significance ([adj_pvalue] < [adjp_th]) was denoted by *. Furthermore, a hierarchically clustered bubble plot encoding both effect-size (color) and statistical significance (size) is provided, with statistical significance denoted by *. All summary visualizations’ values were capped by [adjp_cap]/[or_cap]/[nes_cap] to avoid shifts in the coloring scheme caused by outliers.

The analysis and visualizations described here were performed using a publicly available Snakemake (ver) [ref] workflow [10.5281/zenodo.7810621].

Features

The three tools LOLA, GREAT and GSEApy (over-representation analysis (ORA) & preranked GSEA) are used for various enrichment analyses. Databases to be queried can be configured (see ./config/config.yaml). All approaches statistically correct their results using the provided background region/gene sets.

  • enrichment analysis methods:
    • region-set
      • LOLA: Genomic Locus Overlap Enrichment Analysis is run locally. Required (cached) databases, which are downloaded automatically during the first run. Supported databases depend on the genome (lola_dbs).
      • GREAT using rGREAT: Genomic Regions Enrichment of Annotations Tool is queried remotely (requires a working internet connection). Supported databases depend on the genome (great_dbs).
        • query region sets with >500,000 regions are not supported and empty output files are generated to satisfy Snakemake
        • background region sets with >1,000,000 are not supported and the whole genome is used as background
    • gene-set over-representation analysis (ORA_GSEApy)
      • GSEApy enrich() function performs Fisher’s exact test (i.e., hypergeoemtric test) and is run locally.
    • region-based gene-set over-representation analysis (ORA_GSEApy)
      • region-gene associations for each query and background region-set are obtained using GREAT.
      • they are used for a complementary ORA using GSEApy.
      • thereby an extended region-set enrichment perspective can be gained through association to genes by querying the same and/or more databases, that are not supported/provided by region-based tools.
      • limitation: if the background region set exceeds GREAT's capacities (i.e., 1,000,000 regions), no background gene list is generated and background gene number (bg_n) of 20,000 is used in the ORA.
    • preranked gene-set enrichment analysis (preranked_GSEApy)
      • GSEApy prerank() function performs preranked GSEA and is run locally.
      • no duplicates allowed: only entries with the largest absolute score are kept.
  • resources (databases) for both gene-based analyses are downloaded (Enrichr) or copied (local files) and saved as JSON files in /resources
    • all Enrichr databases can be queried (enrichr_dbs).
    • local JSON database files can be queried (local_json_dbs).
    • local GMT database files (e.g., from MSigDB) can be queried (local_gmt_dbs).
  • group aggregation of results per method and database
    • results of all queries belonging to the same group are aggregated per method (e.g., ORA_GSEApy) and database (e.g., GO_Biological_Process_2021) by concatenation and saved as a long-format table (CSV).
    • a filtered version taking the union of all statistically significant (i.e., adjusted p-value <{adjp_th}) terms per query is also saved as CSV file.
  • visualization
    • region/gene-set specific enrichment dot plots are generated for each query, method and database combination
      • the top {top_n} terms are ranked (along the y-axis) by the mean rank of statistical significance ({p_value}), effect-size ({efect_size} e.g., log2(odds ratio) or normalized enrichemnt scores), and overlap ({overlap} e.g., coverage or support) with the goal to make the results more balanced and interpretable
      • significance (adjusted p-value) is presented by the dot color
      • effect-size is presented by the x-axis position
      • overlap is presented by the dot size
    • group summary/overview
      • the union of the top {top_terms_n} most significant terms per query, method, and database within a group is determined.
      • their effect-size (effect) and statistical significance (adjp) are visualized as hierarchically clustered heatmaps, with statistical significance denoted by * (PDF).
      • a hierarchically clustered bubble plot encoding both effect-size (color) and significance (size) is provided, with statistical significance denoted by * (PNG and SVG).
      • all summary visualizations are configured to cap the values ({adjp_cap}/{or_cap}/{nes_cap}) to avoid shifts in the coloring scheme caused by outliers.

Results

The result directory {result_path}/enrichment_analysis contains a folder for each region/gene-set {query} and {group}

  • {query}/{method}/{database}/ containing:
    • result table (CSV file): {query}_{database}.csv
    • enrichment dot plot (SVG and PNG): {query}_{database}.{svg|png}
  • {group}/{method}/{database}/ containing
    • aggregated result table (CSV file): {group}_{database}_all.csv
    • filtered aggregated result table (CSV file): {group}_{database}_sig.csv
    • hierarchically clustered heatmaps visualizing statistical significance and effect-sizes of the top {top_terms_n} terms (PDF): {group}_{database}_{adjp|effect}_hm.pdf
    • hierarchically clustered bubble plot visualizing statistical significance and effect-sizes simultaneously (PNG and SVG): {group}_{database}_summary.{svg|png}

Usage

Here are some tips for the usage of this workflow:

  • Run the analysis on every query gene/region set of interest (e.g., results of differential analyses) with the respective background genes/regions (e.g., all expressed genes in the data or consensus regions).
  • generate the Snakemake Report
  • look through the overview plots of your dedicated groups and queried databases in the report
  • dig deeper by looking at the
    • aggregated result table underlying the summary/overview plot
    • enrichment plots for the individual query sets
  • investigate interesting hits further by looking into the individual query result tables.

Configuration

Detailed specifications can be found here ./config/README.md

Examples

We provide four example queries:

We provide two local example databases

Follow these steps to run the complete analysis:

  1. activate your snakemake conda environment
    conda activate snakemake
  2. enter the workflow directory
    cd enrichment_analysis
  3. run a snakemake dry-run (-n flag) using the provided configuration to check if everything is in order
    snakemake -p --use-conda --configfile .test/config/example_enrichment_analysis_config.yaml -n
  4. run the workflow
    snakemake -p --use-conda --configfile .test/config/example_enrichment_analysis_config.yaml
  5. generate report
    snakemake --report .test/report.html --configfile .test/config/example_enrichment_analysis_config.yaml

Links

Resources

Publications

The following publications successfully used this module for their analyses.

  • ...

enrichment_analysis's People

Contributors

sreichl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

enrichment_analysis's Issues

update and document all packages to latest version

use the latest version of all packages, re-test & provide versions in environment .yaml files

Error in rule gene_ORA_GSEApy: jobid: 31, raise ValueError("No objects to concatenate")

Hi!

Do you have an idea why your pipeline fails always on this step? :)
I've tried 2 different inputs and there is always problem on that step 32 of 81.

Below is the part of the console output with the error message:

`Warning message:
package ‘svglite’ was built under R version 4.1.3 
[Fri May 26 18:19:55 2023]
Finished job 22.
32 of 81 steps (40%) done
Select jobs to execute...

[Fri May 26 18:19:55 2023]
rule gene_ORA_GSEApy:
    input: /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/results/enrichment_analysis/UP_sorted_cyto_from_10kb_bins.bed/GREAT/genes.txt, /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/results/enrichment_analysis/background/GREAT/genes.txt, resources/Sorted_cyto_from_bins_2_run/MyDB.json
    output: /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/results/enrichment_analysis/UP_sorted_cyto_from_10kb_bins.bed/ORA_GSEApy/MyDB/UP_sorted_cyto_from_10kb_bins.bed_MyDB.csv
    log: logs/rules/gene_ORA_GSEApy_UP_sorted_cyto_from_10kb_bins.bed_MyDB.log
    jobid: 31
    reason: Missing output files: /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/results/enrichment_analysis/UP_sorted_cyto_from_10kb_bins.bed/ORA_GSEApy/MyDB/UP_sorted_cyto_from_10kb_bins.bed_MyDB.csv; Input files updated by another job: /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/results/enrichment_analysis/UP_sorted_cyto_from_10kb_bins.bed/GREAT/genes.txt, /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/results/enrichment_analysis/background/GREAT/genes.txt
    wildcards: gene_set=UP_sorted_cyto_from_10kb_bins.bed, db=MyDB
    resources: tmpdir=/tmp, mem_mb=32000

python -c "import sys; print('.'.join(map(str, sys.version_info[:2])))"
Activating conda environment: .snakemake/conda/db2069d06c67e34fe0d5f2324aefa1c9_
python /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/.snakemake/scripts/tmp49w6b_kb.gene_ORA_GSEApy.py
Activating conda environment: .snakemake/conda/db2069d06c67e34fe0d5f2324aefa1c9_
2023-05-26 18:19:57,009 [INFO] Input dict object named with gs_ind_0
2023-05-26 18:19:57,009 [INFO] Run: gs_ind_0 
2023-05-26 18:19:57,010 [INFO] No hits return, for gene set: Custom140555313936960
Traceback (most recent call last):
  File "/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/.snakemake/scripts/tmp49w6b_kb.gene_ORA_GSEApy.py", line 91, in <module>
    res = gp.enrich(gene_list=gene_list,
  File "/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/.snakemake/conda/db2069d06c67e34fe0d5f2324aefa1c9_/lib/python3.9/site-packages/gseapy/__init__.py", line 607, in enrich
    enr.run()
  File "/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/.snakemake/conda/db2069d06c67e34fe0d5f2324aefa1c9_/lib/python3.9/site-packages/gseapy/enrichr.py", line 534, in run
    self.results = pd.concat(self.results, ignore_index=True)
  File "/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/.snakemake/conda/db2069d06c67e34fe0d5f2324aefa1c9_/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 274, in concat
    op = _Concatenator(
  File "/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/.snakemake/conda/db2069d06c67e34fe0d5f2324aefa1c9_/lib/python3.9/site-packages/pandas/core/reshape/concat.py", line 331, in __init__
    raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate
[Fri May 26 18:19:57 2023]
Error in rule gene_ORA_GSEApy:
    jobid: 31
    output: /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/results/enrichment_analysis/UP_sorted_cyto_from_10kb_bins.bed/ORA_GSEApy/MyDB/UP_sorted_cyto_from_10kb_bins.bed_MyDB.csv
    log: logs/rules/gene_ORA_GSEApy_UP_sorted_cyto_from_10kb_bins.bed_MyDB.log (check log file(s) for error message)
    conda-env: /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/.snakemake/conda/db2069d06c67e34fe0d5f2324aefa1c9_

RuleException:
CalledProcessErrorin line 65 of /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/workflow/rules/enrichment_analysis.smk:
Command 'source /mnt/polkanowa2/programs/bin/activate '/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/.snakemake/conda/db2069d06c67e34fe0d5f2324aefa1c9_'; set -eo pipefail; python /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/.snakemake/scripts/tmp49w6b_kb.gene_ORA_GSEApy.py' returned non-zero exit status 1.
  File "/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/workflow/rules/enrichment_analysis.smk", line 65, in __rule_gene_ORA_GSEApy
  File "/mnt/polkanowa2/programs/lib/python3.9/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-05-26T181430.372545.snakemake.log
(base) root@SRV602-88C9:/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis# `

Here is attached the log file.
2023-05-26T181430.372545.snakemake.log

Please let me know if you need anything else!
Thanks
Sandra

TFBS/motif enrichment analysis

  • using Rcistarget (code in macroStim project) or SCENIC packages
  • input is the same: regions & genes
  • First for gene lists then region lists (=more complicated?)
  • https://scenicplus.readthedocs.io/en/latest/
  • pycisTarget seems promising, but only for regions? maybe mapping from genes to regions is required & easy
  • pycistarget (Rcistarget) based
  • with summarization to maxNES (and/or enrGene Numbers? -> config?)

PPI analysis using STRING R package

  • code in macroStim project
  • apply to all provided gene sets (e.g., up regulated)
  • or go beyond and implement "network-based" enrichment analysis using e.g., OMNIPATH testing if genes are closer in the network than random chance / background gene set of expressed genes

The results are not generating

Hi!

It was a great idea to create such tool!
I have an issue and I don't know where might be a root cause that the results are not generating.

When running:
$ snakemake -p --conda-frontend conda --configfile config/config.yaml -c1
I receive:

Config file config/config.yaml is extended by additional config specified via the command line.
Building DAG of [jobs...]
Nothing to be done (all requested files are present and up to date).
Complete log: .snakemake/log/2023-04-06T075124.125638.snakemake.log

In the complete log is the same information which is displayed on the screen above.

I attach the enrichment_analysis_annotation.csv file. Here is a config.yaml content:

# alwayse use absolute paths

##### RESOURCES #####
partition: 'tinyq'
mem: '32000'
threads: 1


##### GENERAL #####
annotation: /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/enrichment_analysis_annotation.csv
result_path: /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/results/
project_name: Sorted_cyto_from_10kb_bins

# genome
# human 'hg19' or 'hg38' 
# mouse 'mm9' or 'mm10'
genome: 'hg38'


##### TOOLS #####

### GSEApy - ORA Enrichr (Fisher/hypergeometric test) and preranked GSEA based analysis

# Databases downloaded from Enrichr (https://maayanlab.cloud/Enrichr/#libraries)
# example: enrichr_dbs: ["KEGG_2021_Mouse", "GO_Biological_Process_2021", "WikiPathways_2019_Mouse"]
enrichr_dbs: ["KEGG_2021_Human", "GO_Biological_Process_2021", "WikiPathways_2019_Human"]

# Databases in GMT format containing Gene Symbols e.g, downloaded from MSigDB (http://www.gsea-msigdb.org/gsea/msigdb)
local_gmt_dbs:
    MyMSigDB: "/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/databases/msigdb.v2023.1.Hs.symbols.gmt"

# path to local databases as JSON files will be loaded as dictionaries
# example content: { "MyDB_Term1": ["geneA","geneB","geneC"],"MyDB_Term2": ["geneX","geneY","geneZ"]}
local_json_dbs:
    MyDB: "/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/databases/c2.cp.wikipathways.v2023.1.Hs.json"

### GREAT - region-gene association based analysis

# databases to be queried from GREAT (https://great-help.atlassian.net/wiki/spaces/GREAT/pages/655440/Ontologies)
# not all ontologies are available for all genomes and GREAT versions (here we use version 4)
great_dbs: ['GO Molecular Function','GO Biological Process','GO Cellular Component','Mouse Phenotype','Mouse Phenotype Single KO','Human Phenotype']

### LOLA - region overlap based analysis

# databases to be queried by LOLA (https://databio.org/regiondb)
# not all databases are available for all genomes (eg mm10 only supports LOLACore)
lola_dbs: ['LOLACore','jaspar_motifs','roadmap_epigenomics']

### Enrichment plot

# tool specific column names for aggregation, plotting & summaries
column_names:
    ORA_GSEApy:
        top_n: 25
        p_value: 'P_value'
        adj_pvalue: 'Adjusted_P_value'
        effect_size: 'Odds_Ratio'
        overlap: 'Overlap'
        term: 'Term'
    preranked_GSEApy:
        top_n: 25
        p_value: 'NOM_p_val'
        adj_pvalue: 'FDR_q_val'
        effect_size: 'NES'
        overlap: 'Tag'
        term: 'Term'
    GREAT:
        top_n: 25
        p_value: "HyperP"
        adj_pvalue: "HyperFdrQ"
        effect_size: "RegionFoldEnrich"
        overlap: "TermCov"
        term: "Desc"
    LOLA:
        top_n: 25
        p_value: "pValue"
        adj_pvalue: "qValue"
        effect_size: "oddsRatio"
        overlap: "support"
        term: "description"


# GREAT before
#     GREAT:
#         top_n: 25
#         p_value: "Hyper_Raw_PValue"
#         adj_pvalue: "Hyper_Adjp_BH"
#         effect_size: "Hyper_Fold_Enrichment"
#         overlap: "Hyper_Region_Set_Coverage"
#         term: "name"

##### AGGREGATE & SUMMARIZE #####

# adjusted p-value threshold per tool to denote statistical significance
adjp_th:
    ORA_GSEApy: 0.05
    preranked_GSEApy: 0.05
    GREAT: 0.01
    LOLA: 0.01

# number of top terms per feature set within each group for all overview plots (adjusted p-value, effect-size and bubble-heatmap)
top_terms_n: 5

# cap for adjusted p-value plotting: -log10(adjusted p-value) > adjp_cap -> adjp_cap
adjp_cap: 4

# cap for odds ratio plotting: abs(log2(odds ratio)) > or_cap -> sign(log2(odds ratio)) * or_cap
or_cap: 5

# cap for  normalized enrichemnt scores (NES) abs(nes) > nes_cap -> sign(nes) * nes_cap
# applicable only to preranked_GSEApy
nes_cap: 5

If you need anything else please let me know!
Thanks for your help!

enrichment_analysis_annotation.csv

enhanced visualization considering hierarchical nature of enrichment terms

investigate preranked GSEApy behaviour for +/-Inf scores

DEA modules ( and ) calculate feature scores. In the case of raw p-values zero, the -log10 is Inf, hence feature scores are +/- Inf.

  • Test the behavior and handle the exception in case of undesired behaviors
  • document it either way.

raise KeyError(key) from err KeyError: 'qValue' Error in rule aggregate:

Hi!

That's me again 😄
Sorry to bother you but I'm testing the latest version of your pipeline on many different input files and for the different ones I get new error message:

python -c "import sys; print('.'.join(map(str, sys.version_info[:2])))"
Activating conda environment: .snakemake/conda/db2069d06c67e34fe0d5f2324aefa1c9_
python /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/.snakemake/scripts/tmpn224jb2o.aggregate.py
Activating conda environment: .snakemake/conda/db2069d06c67e34fe0d5f2324aefa1c9_
Traceback (most recent call last):
  File "/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/.snakemake/conda/db2069d06c67e34fe0d5f2324aefa1c9_/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'qValue'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/.snakemake/scripts/tmpn224jb2o.aggregate.py", line 55, in <module>
    sig_terms = result_df.loc[result_df[adjp_col]<adjp_th, term_col].unique()
  File "/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/.snakemake/conda/db2069d06c67e34fe0d5f2324aefa1c9_/lib/python3.9/site-packages/pandas/core/frame.py", line 2906, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/.snakemake/conda/db2069d06c67e34fe0d5f2324aefa1c9_/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
    raise KeyError(key) from err
KeyError: 'qValue'
[Mon Jun  5 22:36:57 2023]
Error in rule aggregate:
    jobid: 73
    output: /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/results/enrichment_analysis/mysterySets/LOLA/LOLACore/mysterySets_LOLACore_all.csv, /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/results/enrichment_analysis/mysterySets/LOLA/LOLACore/mysterySets_LOLACore_sig.csv
    log: logs/rules/aggregate_mysterySets_LOLA_LOLACore.log (check log file(s) for error message)
    conda-env: /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/.snakemake/conda/db2069d06c67e34fe0d5f2324aefa1c9_

RuleException:
CalledProcessErrorin line 19 of /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/workflow/rules/aggregate.smk:
Command 'source /mnt/polkanowa2/programs/bin/activate '/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/.snakemake/conda/db2069d06c67e34fe0d5f2324aefa1c9_'; set -eo pipefail; python /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/.snakemake/scripts/tmpn224jb2o.aggregate.py' returned non-zero exit status 1.
  File "/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/workflow/rules/aggregate.smk", line 19, in __rule_aggregate
  File "/mnt/polkanowa2/programs/lib/python3.9/concurrent/futures/thread.py", line 58, in run
Removing output files of failed job aggregate since they might be corrupted:
/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/results/enrichment_analysis/mysterySets/LOLA/LOLACore/mysterySets_LOLACore_all.csv
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-06-05T222043.715874.snakemake.log

If you need anything else for your testing purposes, please let me know.

Thanks for your help as always! :)
Sandra

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.