Giter Site home page Giter Site logo

spliz's Introduction

Introduction

salzmanlab/spliz is a bioinformatics best-practise analysis pipeline for calculating the splicing z-score for single cell RNA-seq analysis.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Quick Start

  1. Install nextflow (>=20.04.0) and conda.

  2. Download environment file.

    wget https://raw.githubusercontent.com/salzmanlab/SpliZ/main/environment.yml
  3. Create conda environment and activate.

    conda env create --name spliz_env --file=environment.yml
    conda activate spliz_env
  4. Run the pipeline on the test data set. You may need to modify the executor scope in the config file, in accordance to your compute needs.

    nextflow run salzmanlab/spliz \
        -r main \
        -latest \
        -profile small_test_data

    Sherlock users should use the sherlock profile:

     nextflow run salzmanlab/spliz \
         -r main \
         -latest \
         -profile small_test_data,sherlock
    
  5. Run the pipeline on your own dataset.

    1. Edit your config file with the parameters below. (You can use /small_data/small.config as a template, be sure to include any memory or time paramters.)
    2. Run with your config file:
    nextflow run salzmanlab/spliz \
        -r main \
        -latest \
        -c YOUR_CONFIG_HERE.conf
    

See usage docs for all of the available options when running the pipeline.

Pipeline Summary

By default, the pipeline currently performs the following:

  • Calculate the SpliZ scores for:
    • Identifying variable splice sites
    • Identifying differential splicing between cell types.

Input Parameters

Argument Description Example Usage
dataname Descriptive name for SpliZ run "Tumor_5"
run_analysis If the pipeline will perform splice site identifcation and differential splicing analysis true, false
input_file File to be used as SpliZ input tumor_5_with_postprocessing.txt
SICILIAN If input_file is output from SICILIAN true, false
pin_S Bound splice site residuals at this quantile (e.g. values in the lower pin_S quantile and the upper 1 - pin_S quantile will be rounded to the quantile limits) 0.1
pin_z Bound SpliZ scores at this quantile (e.g. values in the lower pin_z quantile and the upper 1 - pin_z quantile will be rounded to the quantile limits) 0
bounds Only include cell/gene pairs that have more than this many junctional reads for the gene 5
light Only output the minimum number of columns true, false
svd_type Type of SVD calculation normdonor, normgene
n_perms Number of permutations 100
grouping_level_1 Metadata column by which the data is intially partitioned "tissue"
grouping_level_2 Metadata column by which the partitioned data is grouped "compartment"
libraryType Library prepration method of the input data 10X, SS2

Optional Parameters for non-SICILIAN Inputs (SICILIAN = false)

Argument Description Example Usage
samplesheet If input files are in BAM format, this file specifies the locations of the input bam files. Samplesheet formatting is specified below. Tumor_5_samplesheet.csv
annotator_pickle Genome-specific annotation file for gene names hg38_refseq.pkl
exon_pickle Genome-specific annotation file for exon boundaries hg38_refseq_exon_bounds.pkl
splice_pickle Genome-specific annotation file for splice sites hg38_refseq_splices.pkl
gtf GTF file used as the reference annotation file for the genome assembly GRCh38_genomic.gtf
meta If input files are in BAM format, this file contains per-cell annotations. This file must contain columns for grouping_level_1 and grouping_level_2. metadata_tumor_5.tsv

Samplesheets

The samplesheet must be in comma-separated value(CSV) format. The file must be without a header. The sampleID must be a unique identifier for each bam file entry.

For non-SICILIAN samples, samplesheets must have 2 columns: sampleID and path to the bam file.

Tumor_5_S1,tumor_5_S1_L001.bam
Tumor_5_S2,tumor_5_S2_L002.bam
Tumor_5_S3,tumor_5_S3_L003.bam

For SICILIAN SS2 samples, amplesheets must have 3 columns: sampleID, read 1 bam file, and read 2 bam file.

Tumor_5_S1,tumor_5_S1_L001_R1.bam,tumor_5_S1_L001_R2.bam
Tumor_5_S2,tumor_5_S2_L002_R1.bam,tumor_5_S2_L002_R2.bam
Tumor_5_S3,tumor_5_S3_L003_R1.bam,tumor_5_S3_L003_R2.bam

Credits

salzmanlab/spliz was originally written by Salzman Lab.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

This repositiory contains code to perform the analyses in this paper:

The SpliZ generalizes “Percent Spliced In” to reveal regulated splicing at single-cell resolution

Julia Eve Olivieri*, Roozbeh Dehghannasiri*, Julia Salzman.

Nature Methods 2022 Mar 3. doi: https://www.nature.com/articles/s41592-022-01400-x.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

spliz's People

Contributors

kaitlinchaung avatar salzmanlab-admin avatar

Stargazers

Anatoly Chernov avatar fuentesazahara avatar  avatar İrem avatar  avatar  avatar Junbin Gao avatar  avatar  avatar Marieke Vromman avatar Ruiyan Hou avatar  avatar

Watchers

 avatar Julia Olivieri avatar  avatar Roozbeh Dehghannasiri avatar  avatar

spliz's Issues

Input files and experiment design

Thanks for this great tool!

SpliZ small_test_data is perfectly run in my cluster. However, there are some questions that confuse me. I use SICILIAN for my 10X data, but I see there are some differences between my sicilian_called_splice_juncs.tsv output and example is used in small_data small.tsv.

I know that called column is 1 for all junctions that should be included in the analysis, 0 otherwise, but I don't have any called column in my data. Also I don't how to make it.

numReads column in my tsv indicates total reads for a specific read, and numReads_per_cell indicates reads per each cell. I guess numReads in small.tsv equals to numReads_per_cell my tsv.

Should I remove other columns which are not exist in small.tsv ?

Dose cell column made by adding samples IDs and barcodes?

I want to use SpliZ for study differential splicing between different experimental groups in same cell type. I think I should set grouping_level_1 = cell type and grouping_level_2 = experiment group. However, I don't know should I make a tsv file includes barcodes, cell type, and experiment group? Should I merge this file with sicilian_called_splice_juncs.tsv? or set it as an input parameter?

Is it enough to just change lines 61:63 of nextflow.config? or I should change more lines and files for setting SpliZ base on my resources?

Can I use scZ_median and scZ_pval as cutoff parameters for downstream analysis?

For sashimi plots in Elife and Nature Methods papers, do you get data from SpliZ outputs or SICILIAN outputs?

I'm so sorry for asking too many questions.

Example input files for non-SICILIAN Inputs

Hello,

Thanks for developing this great tool! Would it be possible to provide an example of argument files required for non-SICILIAN inputs? eg Tumor_5_samplesheet.csv (samplesheet) and metadata_tumor_5.tsv (meta)

Thanks!

how to create input?

SICILIAN postprocessing all sicilian_called_splice_juncs.tsv to four files namely runname.pq, runname.tsv, runname_GLM_outputs_consolidated.txt and runname_with_postprocessing.txt for a 10x dataset.

May I ask:

  1. what do these files mean?
  2. I found only runname_with_postprocessing.txt has the called column. The format of runname.pq and runname.tsv looks the same with small test data. Could you please specify which files should be taken as input for SpliZ?
  3. According to SICILIAN pipeline, the cell and tissue annotation is not performed during postprocessing. Moreover, the columns in small.tsv is different from the output of SICILIAN postprocessing. It seems that I need to recreate the data by annotating all cells, removing columns that do not exist in small.tsv and sorting the columns. How do you create small.tsv?

Thanks.

The question about .config file

Hi, thanks for the great tool. I am trying to use it to solve some problems in my project. I have the 10x data and I used the cellranger to align them into the human ref. Finally, I got the bam file. So I want to configure the .config file. But I found it seems is not friendly to the input file exception the SICILIAN. I cannot how to write the input_file and meta file. Could you please give me some examples? I cannot understand the definition of "grouping_level_1 and grouping_level_2" and could you give me some explanation? Thank you in advance!

Timing out of SUMMARIZE_RESULTS

I got the following error running a separate dataset. I have been able to run this workflow on two other datasets without issue so far, and now I get this error. It appears like the last step (SUMMARIZE_RESULTS) is timing out, the rest of the processes run to completion.

Error executing process > 'NFCORE_SPLIZ:SPLIZ_PIPELINE:ANALYSIS:SUMMARIZE_RESULTS (kubota-cochlea-comp)'

Caused by:
  Process exceeded running time limit (1h)

Command executed:

  final_summary.py \
      --perm_pvals kubota-cochlea-comp_pvals_free_annotation-compartment_100_S_0.1_z_0.0_b_5_SICILIAN.tsv \
      --first_evec first_evec_kubota-cochlea-comp_pvals_free_annotation-compartment_100_S_0.1_z_0.0_b_5_SICILIAN.tsv \
      --second_evec second_evec_kubota-cochlea-comp_pvals_free_annotation-compartment_100_S_0.1_z_0.0_b_5_SICILIAN.tsv \
      --third_evec third_evec_kubota-cochlea-comp_pvals_free_annotation-compartment_100_S_0.1_z_0.0_b_5_SICILIAN.tsv \
      --splizvd kubota-cochlea-comp_sym_SVD_normdonor_S_0.1_z_0.0_b_5_SICILIAN_subcol.tsv \
      --grouping_level_2 free_annotation \
      --grouping_level_1 compartment \
      --outname summary_kubota-cochlea-comp_free_annotation-compartment_S_0.1_z_0.0_b_5_SICILIAN.tsv \
      --outname_log summarize_results.log

Command exit status:
  -

Command output:
  (empty)

Command error:
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)
  /wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/numpy/lib/nanfunctions.py:1117: RuntimeWarning: Mean of empty slice
    return np.nanmean(a, axis, out=out, keepdims=keepdims)

Work dir:
  /wynton/home/tjan/adavid/kubota/work/c6/9e6e6087df1e227cb74ea553db3757

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Issues about running my own data

Thanks for the great tool. I ran SpliZ using bam downloaded from 10X Genomics (https://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_possorted_genome_bam.bam). I filtered this BAM with cell barcode from pbmc3k with SeuratData package according the suggested method by 10X Genomics (https://kb.10xgenomics.com/hc/en-us/articles/360022448251-Is-there-way-to-filter-the-BAM-file-produced-by-10x-pipelines-with-a-list-of-barcodes-).

Here is my pbmc3k.config

params {
  dataname = "pbmc3k"
  SICILIAN = false
  samplesheet = "/home/data/wangxiong/project/SpliZ/pbmc3k_samplesheet.csv"
  annotator_pickle = "/home/data/wangxiong/project/SpliZ/sicilian_wx/SICILIAN_human_hg38_Refs/annotator_file/hg38_refseq.pkl"
  exon_pickle = "/home/data/wangxiong/project/SpliZ/sicilian_wx/SICILIAN_human_hg38_Refs/exon_pickle_file/hg38_refseq_exon_bounds.pkl"
  splice_pickle = "/home/data/wangxiong/project/SpliZ/sicilian_wx/SICILIAN_human_hg38_Refs/splice_pickle_file/hg38_refseq_splices.pkl"
  gtf = "/home/data/wangxiong/project/SpliZ/sicilian_wx/SICILIAN_human_hg38_Refs/gtf_file/grch38_known_genes.gtf"
  meta = "/home/data/wangxiong/project/SpliZ/pbmc3k_meta.tsv"
  pin_S = 0.1
  pin_z = 0.0
  bounds = 5
  light = false      
  svd_type = "normdonor"
  n_perms = 100
  grouping_level_2 = "grouping_level_2"
  grouping_level_1 = "grouping_level_1"
  libraryType = "10X"
  run_analysis = true
}

params.outdir = "./results/${params.dataname}"
params.tracedir = "./results/${params.dataname}/pipeline_info"
params.schema_ignore_params = "input,single_end,show_hidden_params,validate_params,igenomes_ignore,tracedir,igenomes_base,help,monochrome_logs,plaintext_email,max_multiqc_email_size,email_on_fail,email,multiqc_config,publish_dir_mode,genome,genomes" 

Here is my pbmc3k_samplesheet.csv

pbmc3k_filtered,pbmc3k_filtered.bam

An my pbmc3k_meta.tsv

# head -5 pbmc3k_meta.tsv
cell_id grouping_level_1 grouping_level_2
CB:Z:AAACATACAACCAC-1 Control Memory CD4 T
CB:Z:AAACATTGAGCTAC-1 Control B
CB:Z:AAACATTGATCAGC-1 Control Memory CD4 T
CB:Z:AAACCGTGCTTCCG-1 Control CD14+ Mono

I ran my own data using this code:

nextflow run salzmanlab/spliz -r main -latest -c pbmc3k.config

However, the following errors occurred:
image
image

I hav cat .command.sh

#!/bin/bash -euo pipefail
process_CI.py \
    --input_file collect-file.data \
    --meta pbmc3k_meta.tsv \
    --libraryType 10X \
    --outname pbmc3k.pq

I don't know the reason why I failed.

Error running test data set (small.pq, small.config)

I tried running the test data set, following the directsion in the README and came across this error, unclear where path domain is missing, is it in the *.config file?

$ nextflow run salzmanlab/spliz -r main -latest -c small.config                                                                  
N E X T F L O W  ~  version 21.04.0
Pulling salzmanlab/spliz ...
 Already-up-to-date
Launching `salzmanlab/spliz` [wise_cuvier] - revision: 6c708f518b [main]


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/spliz v1.0dev
------------------------------------------------------


WARN: Found unexpected parameters:
* --outdir: ./results/test
* --numGenes: null
* --domain: null
- Ignore this warning: params.schema_ignore_params = "outdir,numGenes,domain" 

Core Nextflow options
  revision              : main
  runName               : wise_cuvier
  container             : kaitlinchaung/spliz:dev
  launchDir             : /***
  workDir               : /***
  projectDir            : /***
  userName              : ***
  profile               : standard
  configFiles           : /***/.nextflow/assets/salzmanlab/spliz/nextflow.config, /***/small.config


Input/output options
  dataname              : test
  input_file            : small.pq
  SICILIAN              : true
  pin_S                 : 0.1
  pin_z                 : 0.0
  bounds                : 5
  light                 : false
  svd_type              : normdonor
  grouping_level_1      : tissue
  grouping_level_2      : compartment
  n_perms               : 100
  libraryType           : 10X
  run_analysis          : true

Max job request options
  max_memory            : 800 GB
  max_time              : 10d

Other parameters
  max_multiqc_email_size: 25 MB

------------------------------------------------------
 Only displaying parameters that differ from defaults.
------------------------------------------------------
[-        ] process > NFCORE_SPLIZ:SPLIZ_PIPELINE:SPLIZ:CALC_SPLIZVD         -
[-        ] process > NFCORE_SPLIZ:SPLIZ_PIPELINE:ANALYSIS:PVAL_PERMUTATIONS -
[-        ] process > NFCORE_SPLIZ:SPLIZ_PIPELINE:ANALYSIS:FIND_SPLIZ_SITES  -
[-        ] process > NFCORE_SPLIZ:SPLIZ_PIPELINE:ANALYSIS:SUMMARIZE_RESULTS -


A process input channel evaluates to null -- Invalid declaration `path domain`

 -- Check script '.nextflow/assets/salzmanlab/spliz/./workflows/./../subworkflows/local/analysis.nf' at line: 49 or see '.nextflow.log' file for more details

And the .nextflow.log file shows this:

(base) [adavid@dev3 ~]$ cat .nextflow.log
Sep-11 21:18:17.084 [main] DEBUG nextflow.cli.Launcher - Setting http proxy: [prox1, 3128]
Sep-11 21:18:17.186 [main] DEBUG nextflow.cli.Launcher - Setting https proxy: [prox1, 3128]
Sep-11 21:18:17.187 [main] DEBUG nextflow.cli.Launcher - $> nextflow run salzmanlab/spliz -r main -latest -c small.config
Sep-11 21:18:17.321 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 21.04.0
Sep-11 21:18:19.037 [main] DEBUG nextflow.scm.AssetManager - Git config: /wynton/home/tjan/adavid/.nextflow/assets/salzmanlab/spliz/.git/config; branch: master; remote: origin; url: https://github.com/salzmanlab/SpliZ.git
Sep-11 21:18:19.076 [main] DEBUG nextflow.scm.AssetManager - Git config: /wynton/home/tjan/adavid/.nextflow/assets/salzmanlab/spliz/.git/config; branch: master; remote: origin; url: https://github.com/salzmanlab/SpliZ.git
Sep-11 21:18:19.077 [main] INFO  nextflow.cli.CmdRun - Pulling salzmanlab/spliz ...
Sep-11 21:18:19.078 [main] DEBUG nextflow.scm.AssetManager - Pull pipeline salzmanlab/spliz  -- Using local path: /wynton/home/tjan/adavid/.nextflow/assets/salzmanlab/spliz
Sep-11 21:18:21.157 [main] INFO  nextflow.cli.CmdRun -  Already-up-to-date
Sep-11 21:18:21.299 [main] INFO  nextflow.cli.CmdRun - Launching `salzmanlab/spliz` [wise_cuvier] - revision: 6c708f518b [main]
Sep-11 21:18:21.316 [main] DEBUG nextflow.config.ConfigBuilder - Found config base: /wynton/home/tjan/adavid/.nextflow/assets/salzmanlab/spliz/nextflow.config
Sep-11 21:18:21.319 [main] DEBUG nextflow.config.ConfigBuilder - User config file: /wynton/home/tjan/adavid/small.config
Sep-11 21:18:21.320 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /wynton/home/tjan/adavid/.nextflow/assets/salzmanlab/spliz/nextflow.config
Sep-11 21:18:21.320 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /wynton/home/tjan/adavid/small.config
Sep-11 21:18:21.333 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
Sep-11 21:18:21.802 [main] INFO  org.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
Sep-11 21:18:21.811 [main] INFO  org.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
Sep-11 21:18:21.820 [main] INFO  org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
Sep-11 21:18:22.149 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
Sep-11 21:18:22.292 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; plugins-dir=/wynton/home/tjan/adavid/.nextflow/plugins
Sep-11 21:18:22.296 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[]
Sep-11 21:18:22.297 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins local root: .nextflow/plr/empty
Sep-11 21:18:22.303 [main] INFO  org.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
Sep-11 21:18:22.303 [main] INFO  org.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
Sep-11 21:18:22.306 [main] INFO  org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
Sep-11 21:18:22.320 [main] INFO  org.pf4j.AbstractPluginManager - No plugins
Sep-11 21:18:22.432 [main] DEBUG nextflow.Session - Session uuid: 34d210c5-9c89-4ab7-8dc7-56a8b0a80ef8
Sep-11 21:18:22.432 [main] DEBUG nextflow.Session - Run name: wise_cuvier
Sep-11 21:18:22.433 [main] DEBUG nextflow.Session - Executor pool size: 32
Sep-11 21:18:22.558 [main] DEBUG nextflow.cli.CmdRun - 
  Version: 21.04.0 build 5552
  Created: 02-05-2021 16:22 UTC (09:22 PDT)
  System: Linux 3.10.0-1160.36.2.el7.x86_64
  Runtime: Groovy 3.0.7 on OpenJDK 64-Bit Server VM 11.0.9.1-internal+0-adhoc..src
  Encoding: UTF-8 (UTF-8)
  Process: [email protected] [172.26.44.133]
  CPUs: 32 - Mem: 503.8 GB (417.3 GB) - Swap: 4 GB (1.3 GB)
Sep-11 21:18:22.598 [main] DEBUG nextflow.Session - Work-dir: /wynton/home/tjan/adavid/work [fhgfs]
Sep-11 21:18:22.649 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
Sep-11 21:18:22.667 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
Sep-11 21:18:23.387 [main] DEBUG nextflow.Session - Session start invoked
Sep-11 21:18:23.408 [main] DEBUG nextflow.trace.TraceFileObserver - Flow starting -- trace file: /wynton/home/tjan/adavid/results/pipeline_info/execution_trace_2021-09-11_21-18-22.txt
Sep-11 21:18:23.446 [main] DEBUG nextflow.Session - Using default localLib path: /wynton/home/tjan/adavid/.nextflow/assets/salzmanlab/spliz/lib
Sep-11 21:18:23.455 [main] DEBUG nextflow.Session - Adding to the classpath library: /wynton/home/tjan/adavid/.nextflow/assets/salzmanlab/spliz/lib
Sep-11 21:18:23.456 [main] DEBUG nextflow.Session - Adding to the classpath library: /wynton/home/tjan/adavid/.nextflow/assets/salzmanlab/spliz/lib/nfcore_external_java_deps.jar
Sep-11 21:18:25.347 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Sep-11 21:18:25.385 [main] INFO  nextflow.Nextflow - 

------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/spliz v1.0dev
------------------------------------------------------

Sep-11 21:18:25.737 [main] WARN  nextflow.Nextflow - Found unexpected parameters:
* --outdir: ./results/test
* --numGenes: null
* --domain: null
Sep-11 21:18:25.739 [main] INFO  nextflow.Nextflow - - Ignore this warning: params.schema_ignore_params = "outdir,numGenes,domain" 
Sep-11 21:18:25.824 [main] INFO  nextflow.Nextflow - Core Nextflow options
  revision              : main
  runName               : wise_cuvier
  container             : kaitlinchaung/spliz:dev
  launchDir             : /***
  workDir               : /***
  projectDir            : /***
  userName              : ***
  profile               : standard
  configFiles           : /***/.nextflow/assets/salzmanlab/spliz/nextflow.config, /***/small.config

Input/output options
  dataname              : test
  input_file            : small.pq
  SICILIAN              : true
  pin_S                 : 0.1
  pin_z                 : 0.0
  bounds                : 5
  light                 : false
  svd_type              : normdonor
  grouping_level_1      : tissue
  grouping_level_2      : compartment
  n_perms               : 100
  libraryType           : 10X
  run_analysis          : true

Max job request options
  max_memory            : 800 GB
  max_time              : 10d

Other parameters
  max_multiqc_email_size: 25 MB

------------------------------------------------------
 Only displaying parameters that differ from defaults.
------------------------------------------------------
Sep-11 21:18:27.492 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: PLOT, CLASS_INPUT_SS2, output_documentation, FIND_SPLIZ_SITES, CONVERT_PARQUET, PROCESS_CLASS_INPUT, ANN_SPLICES, CLASS_INPUT_10X, CALC_SPLIZVD, fastqc, PVAL_PERMUTATIONS, SUMMARIZE_RESULTS, get_software_versions, multiqc
Sep-11 21:18:27.586 [main] DEBUG nextflow.script.ProcessConfig - Config settings `withLabel:process_medium` matches label `process_medium` for process with name NFCORE_SPLIZ:SPLIZ_PIPELINE:SPLIZ:CALC_SPLIZVD
Sep-11 21:18:27.592 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Sep-11 21:18:27.592 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Sep-11 21:18:27.601 [main] DEBUG nextflow.executor.Executor - [warm up] executor > local
Sep-11 21:18:27.612 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=32; memory=503.8 GB; capacity=32; pollInterval=100ms; dumpInterval=5m
Sep-11 21:18:27.763 [main] DEBUG nextflow.script.ProcessConfig - Config settings `withLabel:process_medium` matches label `process_medium` for process with name NFCORE_SPLIZ:SPLIZ_PIPELINE:ANALYSIS:PVAL_PERMUTATIONS
Sep-11 21:18:27.766 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Sep-11 21:18:27.767 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Sep-11 21:18:27.780 [main] DEBUG nextflow.script.ProcessConfig - Config settings `withLabel:process_medium` matches label `process_medium` for process with name NFCORE_SPLIZ:SPLIZ_PIPELINE:ANALYSIS:FIND_SPLIZ_SITES
Sep-11 21:18:27.782 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Sep-11 21:18:27.783 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Sep-11 21:18:27.790 [main] DEBUG nextflow.script.ProcessConfig - Config settings `withLabel:process_low` matches label `process_low` for process with name NFCORE_SPLIZ:SPLIZ_PIPELINE:ANALYSIS:SUMMARIZE_RESULTS
Sep-11 21:18:27.792 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: local
Sep-11 21:18:27.792 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Sep-11 21:18:27.802 [main] DEBUG nextflow.script.ProcessConfig - Config settings `withLabel:process_medium` matches label `process_medium` for process with name NFCORE_SPLIZ:SPLIZ_PIPELINE:ANALYSIS:PLOT
Sep-11 21:18:27.813 [main] DEBUG nextflow.Session - Session aborted -- Cause: A process input channel evaluates to null -- Invalid declaration `path domain`
Sep-11 21:18:27.851 [main] DEBUG nextflow.Session - The following nodes are still active:
  [operator] map
  [operator] reduce
  [operator] map

Sep-11 21:18:27.911 [main] WARN  nextflow.Nextflow - Found unexpected parameters:
* --outdir: ./results/test
* --numGenes: null
* --domain: null
Sep-11 21:18:27.916 [main] INFO  nextflow.Nextflow - - Ignore this warning: params.schema_ignore_params = "outdir,numGenes,domain" 
Sep-11 21:18:27.926 [main] ERROR nextflow.cli.Launcher - A process input channel evaluates to null -- Invalid declaration `path domain`
java.lang.IllegalArgumentException: A process input channel evaluates to null -- Invalid declaration `path domain`
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
	at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:72)
	at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrapNoCoerce.callConstructor(ConstructorSite.java:105)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:59)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:263)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:277)
	at nextflow.script.params.BaseInParam.checkFromNotNull(BaseInParam.groovy:170)
	at jdk.internal.reflect.GeneratedMethodAccessor70.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43)
	at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:193)
	at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:61)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
	at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:66)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:185)
	at nextflow.script.params.BaseInParam.setFrom(BaseInParam.groovy:181)
	at nextflow.script.ProcessDef.run(ProcessDef.groovy:175)
	at nextflow.script.BindableDef.invoke_a(BindableDef.groovy:52)
	at nextflow.script.ComponentDef.invoke_o(ComponentDef.groovy:41)
	at nextflow.script.WorkflowBinding.invokeMethod(WorkflowBinding.groovy:95)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeOnDelegationObjects(ClosureMetaClass.java:397)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:339)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:61)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:171)
	at Script_b1ebe85a$_runScript_closure1$_closure2.doCall(Script_b1ebe85a:49)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at groovy.lang.Closure.call(Closure.java:412)
	at groovy.lang.Closure.call(Closure.java:406)
	at nextflow.script.WorkflowDef.run0(WorkflowDef.groovy:186)
	at nextflow.script.WorkflowDef.run(WorkflowDef.groovy:170)
	at nextflow.script.BindableDef.invoke_a(BindableDef.groovy:52)
	at nextflow.script.ComponentDef.invoke_o(ComponentDef.groovy:41)
	at nextflow.script.WorkflowBinding.invokeMethod(WorkflowBinding.groovy:95)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeOnDelegationObjects(ClosureMetaClass.java:397)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:339)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:61)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:171)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:212)
	at Script_1a9f4c93$_runScript_closure1$_closure2.doCall(Script_1a9f4c93:43)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at groovy.lang.Closure.call(Closure.java:412)
	at groovy.lang.Closure.call(Closure.java:406)
	at nextflow.script.WorkflowDef.run0(WorkflowDef.groovy:186)
	at nextflow.script.WorkflowDef.run(WorkflowDef.groovy:170)
	at nextflow.script.BindableDef.invoke_a(BindableDef.groovy:52)
	at nextflow.script.ComponentDef.invoke_o(ComponentDef.groovy:41)
	at nextflow.script.WorkflowBinding.invokeMethod(WorkflowBinding.groovy:95)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeOnDelegationObjects(ClosureMetaClass.java:397)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:339)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:61)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:171)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:176)
	at Script_613c6d34$_runScript_closure11$_closure27.doCall(Script_613c6d34:238)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at groovy.lang.Closure.call(Closure.java:412)
	at groovy.lang.Closure.call(Closure.java:406)
	at nextflow.script.WorkflowDef.run0(WorkflowDef.groovy:186)
	at nextflow.script.WorkflowDef.run(WorkflowDef.groovy:170)
	at nextflow.script.BindableDef.invoke_a(BindableDef.groovy:52)
	at nextflow.script.ComponentDef.invoke_o(ComponentDef.groovy:41)
	at nextflow.script.WorkflowBinding.invokeMethod(WorkflowBinding.groovy:95)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeOnDelegationObjects(ClosureMetaClass.java:397)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:339)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:61)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:171)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:176)
	at Script_613c6d34$_runScript_closure12$_closure28.doCall(Script_613c6d34:242)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at groovy.lang.Closure.call(Closure.java:412)
	at groovy.lang.Closure.call(Closure.java:406)
	at nextflow.script.WorkflowDef.run0(WorkflowDef.groovy:186)
	at nextflow.script.WorkflowDef.run(WorkflowDef.groovy:170)
	at nextflow.script.BindableDef.invoke_a(BindableDef.groovy:52)
	at nextflow.script.ChainableDef$invoke_a.call(Unknown Source)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
	at nextflow.script.BaseScript.runDsl2(BaseScript.groovy:191)
	at nextflow.script.BaseScript.run(BaseScript.groovy:200)
	at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:221)
	at nextflow.script.ScriptRunner.run(ScriptRunner.groovy:212)
	at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:120)
	at nextflow.cli.CmdRun.run(CmdRun.groovy:302)
	at nextflow.cli.Launcher.run(Launcher.groovy:475)
	at nextflow.cli.Launcher.main(Launcher.groovy:657)

ERROR 'The truth value of a DataFrame is ambiguous'

Description of the bug

An error occurred when running with spliz nextflow pipeline

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

It seemed that the problem can be located in file calc_splizvd.py line 387

df["cov"] = df["gene"].map(grouped.apply(lambda x: x['z_Start'].cov(x['z_End'])))

And it should be like this

df["cov"] = df["gene"].map(grouped.apply(lambda x: x['z_Start'].cov(x['z_End'])).to_dict())

The very same problem also exists as mentioned in juliaolivieri/SpliZ_pipeline#10

result explanation

Dear Dr Chaung, I got a table about the differential SpliZ in different celltype and different conditional. But I do not know how to explain it. I want to find out the significant different splicing gene in different condition of the same celltype. And I also want to find out the significant splicing gene in the different celltype of the same condition. But it seems that this table can not give me the p value.
image

Look at this table above, I especially want to find out the differential splicing gene in the MS and Ctrl. But the scZ_pvalue is nan. Could you help me? Thank you in advance!

Running small_test_data

nextflow run salzmanlab/spliz \ -r main \ -latest \ -profile small_test_data

When I ran this code, the following occurred:

N E X T F L O W ~ version 21.04.0 Pulling salzmanlab/spliz ... Already-up-to-date Launching salzmanlab/spliz [distracted_agnesi] - revision: 694d32b88a [main] WARNING: Could not load nf-core/config profiles: https://raw.githubusercontent.com/nf-core/configs/master/nfcore_custom.config WARNING: Could not load nf-core/config profiles: https://raw.githubusercontent.com/nf-core/configs/master/nfcore_custom.config

And an error occurred
image

I don't know whether this happened due to my network or something else.

Issues fo grouping levels

Dear kaitlinchaung, in #4 (comment) , You mentioned that

An example I can provide is if you have data from multiple tissue (i.e. lung, kidney, and heart) and multiple cell_type (i.e endothelial, blood, capillary) within each tissue.

If grouping_level_1 = tissue and grouping_level_2 = cell_type, then you would be looking for differential SpliZ in endothelial vs blood vs capillary FOR EACH tissue.
If grouping_level_2 = tissue and there is no grouping_level_1, then you would be looking for differential SpliZ in endothelial vs blood vs capillary, irrespective of tissue.
If grouping_level_2 = cell_type and there is no grouping_level_1, then you would be looking for differential SpliZ in lung vs kidney vs heart, irrespective of cell_type.

I have some different opinions as following:
If grouping_level_2 = tissue and there is no grouping_level_1, then you would be looking for differential SpliZ in lung vs kidney vs heart, irrespective of cell_type. Because in this case of meta, no celltype information was provided, only tissue information was provided in meta.

If grouping_level_2 = cell_type and there is no grouping_level_1, then you would be looking for differential SpliZ in endothelial vs blood vs capillary, irrespective of tissue. In this case, only celltype information was provided in meta, and no tissue information.

Looking forward for your kindly reply.

Error executing process > 'NFCORE_SPLIZ:SPLIZ_PIPELINE:PREPROCESS:CONVERT_BAM:PROCESS_CLASS_INPUT (1)

Description of the bug

I'm trying to run SpliZ using bam files output from STARsolo. Everything goes well until NFCORE_SPLIZ:SPLIZ_PIPELINE:PREPROCESS:CONVERT_BAM:PROCESS_CLASS_INPUT. Specifically on the process_CI.py script.

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line:
nextflow -bg run /home/gabrod/bin/SpliZ/main.nf  -c /home/gabrod/bin/SpliZ/nextflow.config --dataname Human
Colon --outdir /share/ScratchGeneral/gabrod/forSpliZ/HumanColon -profile wolfpack,conda --run_analysis true --SICILI
AN false --pin_S 0.1 --pin_z 0 --bounds 5 --light false --svd_type normdonor --n_perms 100 --grouping_level_1 Tissue
 --grouping_level_2 Stage --samplesheet /share/ScratchGeneral/gabrod/forSpliZ/Human_colon.csv --annotator_pickle /sh
are/ScratchGeneral/gabrod/GTF/hg38_ensembl.pkl --exon_pickle /share/ScratchGeneral/gabrod/GTF/hg38_ensembl_exon_boun
ds.pkl --splice_pickle /share/ScratchGeneral/gabrod/GTF/hg38_ensembl_splices.pkl --gtf /share/ScratchGeneral/gabrod/
GTF/hg38.ensGene.gtf --libraryType 10X --meta /share/ScratchGeneral/gabrod/forSpliZ/Metadata.txt --outdir /share/Scr
atchGeneral/gabrod/forSpliZ/HumanColon 
  1. See error:
Command error:
  Traceback (most recent call last):
    File "/share/ScratchGeneral/gabrod/forSpliZ/conda/nf-core-spliz-1.0dev-39088092ea5be9e9ec0058b86103f2cc/lib/pyth
on3.9/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
      return self._engine.get_loc(casted_key)
    File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
    File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
  KeyError: 'barcode'
  
  The above exception was the direct cause of the following exception:
  
  Traceback (most recent call last):
    File "/home/gabrod/bin/SpliZ/bin/process_CI.py", line 72, in <module>
      main()
    File "/home/gabrod/bin/SpliZ/bin/process_CI.py", line 35, in main
      df["barcode_refName"] = df["barcode"].astype(str) + df["refName_ABR1"]
    File "/share/ScratchGeneral/gabrod/forSpliZ/conda/nf-core-spliz-1.0dev-39088092ea5be9e9ec0058b86103f2cc/lib/pyth
on3.9/site-packages/pandas/core/frame.py", line 3455, in __getitem__
      indexer = self.columns.get_loc(key)
    File "/share/ScratchGeneral/gabrod/forSpliZ/conda/nf-core-spliz-1.0dev-39088092ea5be9e9ec0058b86103f2cc/lib/pyth
on3.9/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
      raise KeyError(key) from err
  KeyError: 'barcode'

Expected behaviour

To get SpliZ output

Log files

Have you provided the following extra information/files:

  • The command used to run the pipeline
  • The .nextflow.log file

System

  • Hardware:
  • Executor:
  • OS:
  • Version

Nextflow Installation

  • Version:

Container engine

  • Engine:
  • version:
  • Image tag:

Additional context

issue about running my own data

Hello,I set up my file consist with SpliZ Provided example,but its results is blank.My .bam is from cellranger.
This is my meta.tsv:
4DF3819B@EB3E730C 5F0F6262
This is my samplesheets.csv:
1650761385(1)
This is my .config:
1650761633

Thanks a lot!

Building a configuration file

now I have one .bam file about 7000+ cell, and the cells have been divided into two types, i want to compare the AS event between this two types. but i got an unexpected error.

my meta.tsv:

cell_id grouping_level_1 grouping_level_2
Sper_AACCATGAGATTACCC-1 Spermatocytes-E3 Eu
Sper_AACCATGGTATCAGTC-1 Spermatocytes-E3 Para
Sper_AACTCCCGTCAGTGGA-1 Spermatocytes-E3 Eu
Sper_AAGACCTAGCGGATCA-1 Spermatocytes-E3 Eu

my .conf:

params {
dataname = "Sper"
SICILIAN = false
samplesheet = "samplesheet.csv"
annotator_pickle = "gtf_part.pkl"
exon_pickle = "gtf_part_exon_bounds.pkl"
splice_pickle = "gtf_part_splices.pkl"
gtf = "gtf_part.gtf"
meta = "meta.tsv"
pin_S = 0.1
pin_z = 0.0
bounds = 5
light = false
svd_type = "normdonor"
n_perms = 100
grouping_level_2 = "grouping_level_1"
grouping_level_1 = "grouping_level_2"
libraryType = "10X"
run_analysis = true
}

my samplesheet.csv:

Sper, cellsorted.bam

my error:

[-        ] process > NFCORE_SPLIZ:SPLIZ_PIPELINE:PREPROCESS:CONVERT_BAM:CLASS_INPUT_10X     -
[-        ] process > NFCORE_SPLIZ:SPLIZ_PIPELINE:PREPROCESS:CONVERT_BAM:PROCESS_CLASS_INPUT -
[-        ] process > NFCORE_SPLIZ:SPLIZ_PIPELINE:PREPROCESS:CONVERT_BAM:ANN_SPLICES         -
[-        ] process > NFCORE_SPLIZ:SPLIZ_PIPELINE:SPLIZ:CALC_SPLIZVD                         -
[-        ] process > NFCORE_SPLIZ:SPLIZ_PIPELINE:ANALYSIS:PVAL_PERMUTATIONS                 -
[-        ] process > NFCORE_SPLIZ:SPLIZ_PIPELINE:ANALYSIS:FIND_SPLIZ_SITES                  -
[-        ] process > NFCORE_SPLIZ:SPLIZ_PIPELINE:ANALYSIS:SUMMARIZE_RESULTS                 -
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/spliz] Pipeline completed with errors-
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Execution aborted due to an unexpected error
 -- Check script '/public2/home/rotation/.nextflow/assets/salzmanlab/spliz/./workflows/./../subworkflows/local/./../../modules/local/class_input_10X.nf' at line: 2 or see '.nextflow.log' file for more details

whether is my .conf wrong? i have no idea to deal with this. And for one .bam file, i don't see some tips in your pipeline, could you offer some help? thanks a lot

Yu Liu

Divide by zero error in variance_adjusted_permutations_bytiss.py

I am running my own data through and it was able to process the SpliZ scores, but with the param anaysis = true in the config file running this with my data resulted in the following error shown below.

Also I noticed that the free_annotation variable was not mentioned in the config file or shows up in the output of calculated SpliZ scores and we are interested in doing differential splicing analysis, do I need to change the grouping variables to achieve this? My free annotation is in the format: (S1 ... S14)

  Traceback (most recent call last):
    File "/wynton/home/tjan/adavid/.nextflow/assets/salzmanlab/spliz/bin/variance_adjusted_permutations_bytiss.py", line 196, in <module>
      main()
    File "/wynton/home/tjan/adavid/.nextflow/assets/salzmanlab/spliz/bin/variance_adjusted_permutations_bytiss.py", line 154, in main
      out_df["pval_adj"] = multipletests(out_df["pval"],alpha, method="fdr_bh")[1]
    File "/wynton/home/tjan/adavid/.conda/envs/nf_spliz_env/lib/python3.9/site-packages/statsmodels/stats/multitest.py", line 147, in multipletests
      alphacSidak = 1 - np.power((1. - alphaf), 1./ntests)
  ZeroDivisionError: float division by zero

Work dir:
  /wynton/home/tjan/adavid/work/bb/67b2a8e594213a901087bdf709e303

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

Output of .command.sh:

#!/bin/bash -euo pipefail
variance_adjusted_permutations_bytiss.py \
    --input /wynton/home/tjan/adavid/work/8c/f4bf9521c938f448d35c84f4073dca/utricle-filtered_sym_SVD_normdonor_S_0.1_z_0.0_b_5_SICILIAN.pq \
    --num_perms 100 \
    --grouping_level_2 tissue \
    --grouping_level_1 compartment \
    --outname_all_pvals utricle-filtered_outdf_tissue-compartment_100_S_0.1_z_0.0_b_5_SICILIAN.tsv \
    --outname_perm_pvals utricle-filtered_pvals_tissue-compartment_100_S_0.1_z_0.0_b_5_SICILIAN.tsv \
    --outname_log pval_permutations.log

Issue aboout calculating SpliZ

I have four celltype, including CD4T, CD8T, B, NK cells, and three groups, including control, early, late. If I just want to compare differential splicing between control and early within CD4T and CD8T cells. Should I include all groups and celltyes in meta.tsv?

If only barcodes of CD4T and CD8T cells in control and early groups were included in meta.tsv, the SpliZ value will be affected?

cell_id grouping_level_2 grouping_level_1
Control_AAACATTGAGCTAC Control CD8 T
Control_AGTACTCTCAACCA Control CD8 T
Control_ATGTTCACCGTAGT Control CD4 T
Control_CGCTACTGAACAGA Control CD4 T
Early_TAGTTAGATGAACC Early CD8 T
Early_ACCTATTGTGCCCT Early CD8 T
Early_TAGAATTGTATCGG Early CD4 T
Early_ATAGATACCATGGT Early CD4 T

or like this

cell_id grouping_level_2 grouping_level_1
Control_AAACATTGAGCTAC Control CD8 T
Control_AGTACTCTCAACCA Control CD8 T
Control_ATGTTCACCGTAGT Control CD4 T
Control_CGCTACTGAACAGA Control CD4 T
Early_TAGTTAGATGAACC Early CD8 T
Early_ACCTATTGTGCCCT Early CD8 T
Early_TAGAATTGTATCGG Early CD4 T
Early_ATAGATACCATGGT Early CD4 T
Late_TAGTTAGATGAACC Late CD8 T
Late_ACCTATTGTGCCCT Late CD8 T
Late_TAGAATTGTATCGG Late CD4 T
Late_ATAGATACCATGGT Late CD4 T
Control_AAACATTGAGCTAC Control NK
Control_AGTACTCTCAACCA Control NK
Control_ATGTTCACCGTAGT Control NK
Control_CGCTACTGAACAGA Control NK
Early_TAGTTAGATGAACC Early NK
Early_ACCTATTGTGCCCT Early NK
Early_TAGAATTGTATCGG Early NK
Early_ATAGATACCATGGT Early NK
Late_TAGTTAGATGAACC Late NK
Late_ACCTATTGTGCCCT Late NK
Late_TAGAATTGTATCGG Late NK
Late_ATAGATACCATGGT Late NK

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.