Giter Site home page Giter Site logo

bimsbbioinfo / pigx_scrnaseq Goto Github PK

View Code? Open in Web Editor NEW
10.0 11.0 6.0 11.42 MB

Pipeline for analysis of Dropseq single cell data

Home Page: http://bioinformatics.mdc-berlin.de/pigx

R 18.45% Python 59.32% Shell 17.65% Scheme 0.73% M4 2.49% Makefile 1.36%
dropseq single-cell-rna-seq snakemake-workflows guix

pigx_scrnaseq's Issues

--printshellcmds not working

printshellcmds flag is being recognized but won't actually do anything (as of current tip on cluster_compatibility branch ---didn't notice this until rebase with master)

Develop indrops parser

Develop parsers for different single cell technologies have different cell and umi adaptor locations in R1.
A parser should be able to convert this kind of data into drop-seq like adapters.

Indrop uses variable length adapters - adapter specification:
indrop UMI description:
sequence W1 adapter: AAGGCGTCACAAGCAATCACTC

B1 anatomy: BBBBBBBB[BBB]WWWWWWWWWWWWWWWWWWWWWWCCCCCCCCUUUUUUTTTTTTTTTT______________
B = Barcode1, can be 8, 9, 10 or 11 bases long.
W = 'W1' sequence, specified below
C = Barcode2, always 8 bases
U = UMI, always 6 bases
T = Beginning of polyT tail.
_ = Either sequencing survives across the polyT tail, or signal starts dropping off
(and start being anything, likely with poor quality)
minimal_polyT_len_on_R1 = 7
hamming_threshold_for_W1_matching = 3
w1 = "GAGTGATTGCTTGTGACGCCTT"
rev_w1 = "AAGGCGTCACAAGCAATCACTC" #Hard-code so we don't recompute on every one of millions of calls

Manual parameter should override default parameters

Manually set parameters should override the parameters determined during the pipeline

Example: number of genes per cell as determined in find_absolute_read_cutoff rule should be overridden by parameters in the config

STAR breaks mm Matrix format for Velocyto output

STAR mm matrix for velocyto output has three columns for counts - spliced, unsplice, rest.
mm parser from scipy does not allow such format - velocyto counts in pigx output are currently wrong (contain spliced instead of unspliced)

Java fails with old kernel.

Error discovered yesterday: something about how Java builds a virtual machine but can't allocate a stack of a certain size. If the memory values are too low then we just run out of memory. If it's too high then we hit this error.

    jobid: 0
    output: /home/bosberg/projects/pigx_scrnaseq_masterbak/tests/out/Mapped/WT_HEK_4h_br1/WT_HEK_4h_br1.fastq.bam
    log: /home/bosberg/projects/pigx_scrnaseq_masterbak/tests/out/Log/WT_HEK_4h_br1.merge_fastq_to_bam.log```

Log file contents:

```$ tail  /home/bosberg/projects/pigx_scrnaseq_masterbak/tests/out/Log/WT_HEK_4h_br1.merge_fastq_to_bam.log
DEBUG    2018-03-22 17:56:54    ClassFinder    could not load class: picard.vcf.processor.VariantAccumulatorExecutor$MultiThreadedChunkBased$MultiException$1java.lang.NoClassDefFoundError: com/google/common/base/Function
DEBUG    2018-03-22 17:56:54    ClassFinder    could not load class: picard.vcf.processor.VariantIteratorProducer$Threadsafe$3java.lang.NoClassDefFoundError: com/google/common/base/Function
DEBUG    2018-03-22 17:56:54    ClassFinder    could not load class: picard.vcf.processor.VariantIteratorProducer$Threadsafe$NonUniqueVariantPredicatejava.lang.NoClassDefFoundError: com/google/common/base/Predicate
DEBUG    2018-03-22 17:56:54    ClassFinder    could not load class: picard.vcf.processor.VariantIteratorProducer$Threadsafe$OverlapsPredicatejava.lang.NoClassDefFoundError: com/google/common/base/Predicate
DEBUG    2018-03-22 17:56:54    ClassFinder    could not load class: picard.vcf.processor.VcfFileSegmentGenerator$1$1java.lang.NoClassDefFoundError: com/google/common/base/Predicate
DEBUG    2018-03-22 17:56:54    ClassFinder    could not load class: picard.vcf.processor.VcfFileSegmentGenerator$ByWholeContig$1java.lang.NoClassDefFoundError: com/google/common/base/Function
DEBUG    2018-03-22 17:56:54    ClassFinder    could not load class: picard.vcf.processor.VcfFileSegmentGenerator$WidthLimitingDecorator$1java.lang.NoClassDefFoundError: com/google/common/base/Function
[Thu Mar 22 17:56:54 CET 2018] picard.sam.FastqToSam FASTQ=/home/bosberg/projects/pigx_scrnaseq_masterbak/tests/sample_data/reads/HEK_4h_br1_R1.fastq.gz FASTQ2=/home/bosberg/projects/pigx_scrnaseq_masterbak/tests/sample_data/reads/HEK_4h_br1_R2.fastq.gz QUALITY_FORMAT=Standard OUTPUT=/home/bosberg/projects/pigx_scrnaseq_masterbak/tests/out/Mapped/WT_HEK_4h_br1/WT_HEK_4h_br1.fastq.bam SAMPLE_NAME=WT_HEK_4h_br1 SORT_ORDER=queryname    USE_SEQUENTIAL_FASTQS=false READ_GROUP_NAME=A MIN_Q=0 MAX_Q=93 STRIP_UNPAIRED_MATE_NUMBER=false ALLOW_AND_IGNORE_EMPTY_LINES=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json```

I seems that all of the rules the run with Java have this problem, and the latest efforst at diagnosis have indicated that it is a problem related to the old kernel on the max cluster. 

Add README

At the very least we should describe the format of the settings.yaml and the sample sheet.

m4/ax_r_package.m4 is outdated

The old m4 macro leads to configure errors under some circumstances. It should be updated, e.g. to the version used by pigx-rnaseq.

Log file output not going through on SM jobs on the cluster

Currently when jobs crash after cluster submission, nothing is printed to the log files, rather we need to fish for the execution command and submit it manually before the file can be read.
We should figure out why

First question should be whether the same jobs print logs on local submission

Develop parsers for different single cell technologies.

Develop parsers for different single cell technologies have different cell and umi adaptor locations in R1.
A parser should be able to convert this kind of data into drop-seq like adapters.

Indrop uses variable length adapters - adapter specification:
indrop UMI description:
sequence W1 adapter: AAGGCGTCACAAGCAATCACTC

B1 anatomy: BBBBBBBB[BBB]WWWWWWWWWWWWWWWWWWWWWWCCCCCCCCUUUUUUTTTTTTTTTT______________
B = Barcode1, can be 8, 9, 10 or 11 bases long.
W = 'W1' sequence, specified below
C = Barcode2, always 8 bases
U = UMI, always 6 bases
T = Beginning of polyT tail.
_ = Either sequencing survives across the polyT tail, or signal starts dropping off
(and start being anything, likely with poor quality)
minimal_polyT_len_on_R1 = 7
hamming_threshold_for_W1_matching = 3
w1 = "GAGTGATTGCTTGTGACGCCTT"
rev_w1 = "AAGGCGTCACAAGCAATCACTC" #Hard-code so we don't recompute on every one of millions of calls

too little explanation in the report

There should be a bit more explanation in the reports. A couple of sentences explaining what is shown in the plots would be helpful. This could also be a simple enough update to get the interns started.

environment.yml file missing for conda installation

Hello there Pigx team!

I'm a junior bioinformatician in charge of our research group analysis and I'd like to test Pigx for analyzing some Fastq files for RNA velocity estimation (I was originally going to use DropSeq pipeline). However, I'm having some issues installing it, since GNU Guix installation was troublesome on our HPC cluster.

I then proceeded to try to install via conda. However, the environment.yml file is missing from the git source when installed by (git clone https://github.com/BIMSBbioinfo/pigx_scrnaseq.git). I need it to run conda env create -f environment.yml and proceed through conda installation of Pigx.

Any chance this can receive an update?

Cheers!

Enable counting of intronic reads

Adapt the pipeline to make a UMI matrix of intronic reads

  1. Define intronic annotation from the GFF file
  2. Convert the annotation to the format for the dropseq pipeline
  3. make a parallel set of rules to tag the mapped bam file with introns using a custom tag
  4. make a UMI matrix based on intronic reads

Gene count matrix

Currently the pipeline outputs only the UMI matrix, should also output the Gene count matrix.

Adaptation of the get_umi_matrix rule

Add important stuff

Important stuff:

  • memory consumption
  • tempdir location
  • there was a third important thing

Local pipeline install fails during convert_matrix_from_mtx_to_loom.py

Hi BIMSB team,

I've been working on getting your pipeline up and running as a non-root user on our group's compute cluster. Unfortunately, installing guix isn't an option in my case, but after some trial and error I was able to put together a conda environment with the listed dependencies that manages to get through the configure/make process.

However, when I go to run the pipeline on the test dataset, the pipeline fails with the following message:

./pigx-scrnaseq tests/sample_sheet.csv -s tests/settings.yaml

Error in job convert_matrix_from_mtx_to_loom while creating output file /mnt/lustre/users/tcarroll/bin/git/pigx_scrnaseq/tests/out/Mapped/WT_HEK_4h_br1/hg19/WT_HEK_4h_br1_hg19_UMI.matrix.loom.
RuleException:
CalledProcessError in line 37 of /users/tcarroll/sharedscratch/bin/git/pigx_scrnaseq/run/libexec/pigx_scrnaseq/scripts/Accessory_Functions.py:
Command '/users/tcarroll/sharedscratch/anaconda3/envs/pigx_scrnaseq/bin/python /users/tcarroll/sharedscratch/bin/git/pigx_scrnaseq/run/libexec/pigx_scrnaseq/scripts/convert_matrix_from_mtx_to_loom.py --sample_id WT_HEK_4h_br1 --input_dir /mnt/lustre/users/tcarroll/bin/git/pigx_scrnaseq/tests/out/Mapped/WT_HEK_4h_br1/hg19/WT_HEK_4h_br1_Solo.out --gtf_file /mnt/lustre/users/tcarroll/bin/git/pigx_scrnaseq/tests/out/Annotation/hg19/hg19.gtf --star_output_types_keys Gene GeneFull Velocyto Velocyto --star_output_types_vals Counts GeneFull Spliced Unspliced --output_file /mnt/lustre/users/tcarroll/bin/git/pigx_scrnaseq/tests/out/Mapped/WT_HEK_4h_br1/hg19/WT_HEK_4h_br1_hg19_UMI.matrix.loom --sample_sheet_file /mnt/lustre/users/tcarroll/bin/git/pigx_scrnaseq/tests/sample_sheet.csv --path_script /users/tcarroll/sharedscratch/bin/git/pigx_scrnaseq/run/libexec/pigx_scrnaseq/scripts &> /mnt/lustre/users/tcarroll/bin/git/pigx_scrnaseq/tests/out/Log/WT_HEK_4h_br1.hg19.convert_matrix_from_mtx_to_loom.log' returned non-zero exit status 1.
File "/users/tcarroll/sharedscratch/bin/git/pigx_scrnaseq/run/libexec/pigx_scrnaseq/Snake_Dropseq.py", line 769, in __rule_convert_matrix_from_mtx_to_loom
File "/users/tcarroll/sharedscratch/bin/git/pigx_scrnaseq/run/libexec/pigx_scrnaseq/scripts/Accessory_Functions.py", line 37, in print_shell
File "/users/tcarroll/sharedscratch/anaconda3/envs/pigx_scrnaseq/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Exiting because a job execution failed. Look above for error message

I then checked the log referenced in the error message:

cat /mnt/lustre/users/tcarroll/bin/git/pigx_scrnaseq/tests/out/Log/WT_HEK_0h_br1.hg19.convert_matrix_from_mtx_to_loom.log

Parsing gene ids from gtf file /mnt/lustre/users/tcarroll/bin/git/pigx_scrnaseq/tests/out/Annotation/hg19/hg19.gtf
Reading input files ...
Counts
Features ...
Traceback (most recent call last):
File "/users/tcarroll/sharedscratch/bin/git/pigx_scrnaseq/run/libexec/pigx_scrnaseq/scripts/convert_matrix_from_mtx_to_loom.py", line 102, in
genes.columns = ['gene_id','gene_id2']
File "/users/tcarroll/sharedscratch/anaconda3/envs/pigx_scrnaseq/lib/python3.6/site-packages/pandas/core/generic.py", line 5287, in setattr
return object.setattr(self, name, value)
File "pandas/_libs/properties.pyx", line 67, in pandas._libs.properties.AxisProperty.set
File "/users/tcarroll/sharedscratch/anaconda3/envs/pigx_scrnaseq/lib/python3.6/site-packages/pandas/core/generic.py", line 661, in _set_axis
self._data.set_axis(axis, labels)
File "/users/tcarroll/sharedscratch/anaconda3/envs/pigx_scrnaseq/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 178, in set_axis
f"Length mismatch: Expected axis has {old_len} elements, new "
ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements

Looks like some issue with pandas, where there's 2 elements instead of the expected 3? I'm not as experienced with Python so having trouble figuring this out. Perhaps it's some dependency I missed during my manual local install? If you could provide any advice on what might be going on here or where to look for more troubleshooting info it would be most appreciated. I'm running in a CentOS Linux 7 (Core) environment, with tools and packages installed by Conda where possible and manually otherwise.

Thanks for the help!
-Tom

mapping with hisat2

Check whether it would be possible to map the data using hisat2 - it has much lower memory requirements than STAR.

convert_matrix_from_mtx_to_loom:102 - ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements

Hello,
I run into this error while running through your build tests of what is tagged as version 1.1.6.

$ cat /home/moeller/git/med-team/pigx/pigx-scrnaseq/tests/out/Log/WT_HEK_4h_br1.hg19.convert_matrix_from_mtx_to_loom.log
beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware
(If you have multiple ICDs installed and OpenCL works, you can ignore this message)
Parsing gene ids from gtf file /home/moeller/git/med-team/pigx/pigx-scrnaseq/tests/out/Annotation/hg19/hg19.gtf
Reading input files ...
Counts
Features ...
Traceback (most recent call last):
  File "/home/moeller/git/med-team/pigx/pigx-scrnaseq/scripts/convert_matrix_from_mtx_to_loom.py", line 102, in <module>
    genes.columns = ['gene_id','gene_id2']
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 5287, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 67, in pandas._libs.properties.AxisProperty.__set__
  File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 661, in _set_axis
    self._data.set_axis(axis, labels)
  File "/usr/lib/python3/dist-packages/pandas/core/internals/managers.py", line 177, in set_axis
    raise ValueError(
ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements

which was invoked in this context:

Error in rule convert_matrix_from_mtx_to_loom:
    jobid: 0
    output: /home/moeller/git/med-team/pigx/pigx-scrnaseq/tests/out/Mapped/WT_HEK_4h_br1/hg19/WT_HEK_4h_br1_hg19_UMI.matrix.loom
    log: /home/moeller/git/med-team/pigx/pigx-scrnaseq/tests/out/Log/WT_HEK_4h_br1.hg19.convert_matrix_from_mtx_to_loom.log (check log file(s) for error message)

RuleException:
CalledProcessError in line 37 of /home/moeller/git/med-team/pigx/pigx-scrnaseq/scripts/Accessory_Functions.py:                                                                                                                                                                                                                 
Command 'set -euo pipefail;  /usr/bin/python3 /home/moeller/git/med-team/pigx/pigx-scrnaseq/scripts/convert_matrix_from_mtx_to_loom.py --sample_id WT_HEK_4h_br1 --input_dir /home/moeller/git/med-team/pigx/pigx-scrnaseq/tests/out/Mapped/WT_HEK_4h_br1/hg19/WT_HEK_4h_br1_Solo.out --gtf_file /home/moeller/git/med-team/pigx/pigx-scrnaseq/tests/out/Annotation/hg19/hg19.gtf --star_output_types_keys Gene GeneFull Velocyto Velocyto --star_output_types_vals Counts GeneFull Spliced Unspliced --output_file /home/moeller/git/med-team/pigx/pigx-scrnaseq/tests/out/Mapped/WT_HEK_4h_br1/hg19/WT_HEK_4h_br1_hg19_UMI.matrix.loom --sample_sheet_file /home/moeller/git/med-team/pigx/pigx-scrnaseq/tests/sample_sheet.csv --path_script /home/moeller/git/med-team/pigx/pigx-scrnaseq/scripts &> /home/moeller/git/med-team/pigx/pigx-scrnaseq/tests/out/Log/WT_HEK_4h_br1.hg19.convert_matrix_from_mtx_to_loom.log' returned non-zero exit status 1.                                
  File "/usr/lib/python3/dist-packages/snakemake/executors/__init__.py", line 2168, in run_wrapper                                                                                                                                                                                                                             
  File "/home/moeller/git/med-team/pigx/pigx-scrnaseq/Snake_Dropseq.py", line 769, in __rule_convert_matrix_from_mtx_to_loom                                                                                                                                                                                                   
  File "/home/moeller/git/med-team/pigx/pigx-scrnaseq/scripts/Accessory_Functions.py", line 37, in print_shell                                                                                                                                                                                                                 
  File "/usr/lib/python3/dist-packages/snakemake/executors/__init__.py", line 529, in _callback                                                                                                                                                                                                                                
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run                                                                                                                                                                                                                                                      
  File "/usr/lib/python3/dist-packages/snakemake/executors/__init__.py", line 515, in cached_or_run                                                                                                                                                                                                                            
  File "/usr/lib/python3/dist-packages/snakemake/executors/__init__.py", line 2199, in run_wrapper                                                                                                                                                                                                                             
Exiting because a job execution failed. Look above for error message

Would you have any immediate idea for me on what to check/change? As a disclaimer, I am running this with Debian packages, not with guix. But all dependencies should be very recent and build-tested themselves.

Many thanks and regards
Steffen

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.