bioinform / rnacocktail Goto Github PK

View Code? Open in Web Editor NEW

86.0 12.0 48.0 101.77 MB

License: Other

Python 18.90% HTML 3.92% Shell 1.23% Jupyter Notebook 75.43% Dockerfile 0.52%

rna-seq

rnacocktail's Introduction

RNACocktail: A comprehensive framework for accurate and efficient RNA-Seq analysis

See http://bioinform.github.io/rnacocktail/ for help and downloads.

rnacocktail's People

Contributors

Stargazers

Watchers

Forkers

xflicsu genomicsnx bioxiao tw7649116 renzhonglu yixf-self microtsiu noahpieta alliecreason zzygyx9119 mym88mym lmdu wang-tianpeng zhengcq fengpku susuqu yangj9932 pkglimmer yangming scchess amhaslam cococou wilsonyangliu dayedepps yubioinfo yf0205 leiming8886 krrks fengqing-code digitalmary yangchuhua maozhitao zm-git-dev xigyou solivehong 52teth jhuanglabtools leonguos biogeeker liuze-nwafu zjwang6 tnken tarsus-hh shuifeng1988 sagityq seninfobio mj163163 wangdong-ls

rnacocktail's Issues

IOError: [Errno 36] File name too long:

I have 80 samples for "--sample" parameters of "run_rnacocktail.py diff".
And the I got the error info:
IOError: [Errno 36] File name too long:
I found it because rnacoktail will use all the sample names to create the log name. However, Popular file system Linux(ext4) has a filename length limitof 255 chars. can you have some alternative solution?
Appreciate it!

Exception: Unable to detect format from ['SNV;ENSG00000225630', '1', '+', '50', '0']

Hi,

First thanks for this very complete pipeline.

I have a problem with the step 5 of the editing mode.

I think the problem is in line 88 of the editing.py file and more precisely on the merge_info_SNV function.

The exception is raised in line 90 of this file because (I think again), the cat() function is not possible between SNV_fwd and SNV_fwd1.

This is a capture of the SNV_no var:

a capture of the "feature" var on the merge_info_SNV function:

and of my vcf file :

Do you have any idea of what's going on?

Thanks!

Error in run_editing STEP 10

Currently I am using RNAcocktail for RNA editing sites identification. All procedure went smoothly until STEP 10, with log showing as follows:
INFO 2019-04-09 11:37:47,149 src.run_editing --------------------------STEP 9--------------------------
INFO 2019-04-09 11:37:47,150 src.run_editing Task: Rerun GIREMI for 1351
INFO 2019-04-09 11:37:47,150 src.run_editing Running "bash -c cd /cloud/data/zxcai/soft/giremi && giremi -f /cloud/data/zxcai/ref/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa -l /cloud/data/zxcai/RNAcocktail_test/giremi/1351/SNV_annotated_filtered.bed -o /cloud/data/zxcai/RNAcocktail_test/giremi/1351/giremi_out.txt /cloud/data/zxcai/RNAcocktail_test/giremi/1351/alignments.pos_sorted.bam"
INFO 2019-04-09 12:28:29,244 src.run_editing Returned code 1 (3042.03 seconds)
INFO 2019-04-09 12:28:29,252 src.utils Creating directory /cloud/data/zxcai/RNAcocktail_test/out/giremi/1351
INFO 2019-04-09 12:28:29,287 src.run_editing --------------------------STEP 10--------------------------
INFO 2019-04-09 12:28:29,296 src.run_editing GIREMI failed!
INFO 2019-04-09 12:28:29,296 src.utils Run log is saved in /cloud/data/zxcai/RNAcocktail_test/logs/run-editing-20190409-113746.log
INFO 2019-04-09 12:28:29,296 src.utils All Done!

And when I check the work dir for output files generated during the analysis, I discovered that giremi_out.txt existed, but not giremi_out.txt.res (which should be copied to output dir in STEP 10). So I try to locate where this problem comes from, and could not find where giremi_out.txt.res is generated according to run_editing.py.
My guess is that when rerun Giremi to remove N variants, maybe the output should be giremi_out.txt.res?

Variant Calling

Hello!

I am working to use rnacocktail to call variants in my data (following alignment using HISAT2). I have run into the following error:

INFO 10:17:19,797 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50
INFO 10:17:19,797 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 10:17:19,797 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 10:17:19,797 HelpFormatter - [Thu Feb 14 10:17:19 EST 2019] Executing on Linux 3.10.0-957.el7.x86_64 amd64
INFO 10:17:19,797 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_191-b12
INFO 10:17:19,800 HelpFormatter - Program Args: -T HaplotypeCaller -R Homo_sapiens.GRCh37.dna.primary_assembly.fa -I working/gatk/MHXXXXXXX-XXXXXXXXXX/bsqr.bam -o working/gatk/MHXXXXXXX-XXXXXXXXXX/variants.vcf -stand_call_conf 20.000000 -stand_emit_conf 20.000000 -dontUseSoftClippedBases
INFO 10:17:19,808 HelpFormatter - Executing as XXXXXX@XXXXXX on Linux 3.10.0-957.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_191-b12.
INFO 10:17:19,809 HelpFormatter - Date/Time: 2019/02/14 10:17:19
INFO 10:17:19,809 HelpFormatter - ----------------------------------------------------------------------------------
INFO 10:17:19,809 HelpFormatter - ----------------------------------------------------------------------------------

ERROR ------------------------------------------------------------------------------------------

ERROR A USER ERROR has occurred (version 3.8-0-ge9d806836):

ERROR

ERROR This means that one or more arguments or inputs in your command are incorrect.

ERROR The error message below tells you what is the problem.

ERROR

ERROR If the problem is an invalid argument, please check the online documentation guide

ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.

ERROR

ERROR Visit our website and forum for extensive documentation and answers to

ERROR commonly asked questions https://software.broadinstitute.org/gatk

ERROR

ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.

ERROR

ERROR MESSAGE: Invalid command line: The parameter standard_min_confidence_threshold_for_emitting is deprecated. This argument is no longer used in GATK versions 3.7 and newer. Please see the online documentation for the latest usage recommendations.

Does rnacocktail only support GATK version 3.5?

Thanks so much for your help!

option '-k' is only recognised by salmon quant of salmon version 0.11.0 but not above

I was running RNACocktail for RNA quantification using shell command below:

# ${SRR} stands for SRR accession of the sample
run_rnacocktail.py quantify \
  --quantifier_idx /home/data/refdir/index/salmon/hg38_index \
  --1 ./sra/${SRR}_1.fastq.gz \
  --2 ./sra/${SRR}_2.fastq.gz \
  --libtype IU \
  --outdir ./sra/out \
  --workdir ./sra/work \
  --threads 10 \
  --sample ${SRR} \
  --unzip

I built all salmon versions above 0.11.0 from source only to find that option '-k' failed to be recognised by command salmon quant. The outputs were the same in every version of salmon and in every salmon_smem.log of samples it processed:

Exception : [unrecognised option '-k']. Exiting.

Below is the command line output:

INFO 2021-06-06 20:51:29,182 src.run_quantify     --------------------------STEP 1--------------------------
INFO 2021-06-06 20:51:29,183 src.run_quantify     Task: Salmon-SMEM for SRR10435206 
INFO 2021-06-06 20:51:29,183 src.run_quantify     Running "bash -c salmon quant -i /home/data/refdir/index/salmon/hg38_index  -p 10 -k 19 -l IU -1 <(gunzip -c ./sra/SRR10435206_1.fastq.gz) -2 <(gunzip -c ./sra/SRR10435206_2.fastq.gz) -o ./sra/work/salmon_smem/SRR10435206" 
INFO 2021-06-06 20:51:29,543 src.run_quantify     Returned code 1 (0.357945 seconds)
ERROR 2021-06-06 20:51:29,543 src.run_quantify     Aborting!
INFO 2021-06-06 20:51:29,543 src.run_quantify     Salmon-SMEM failed!
ERROR 2021-06-06 20:51:29,543 src.run_quantify     Failed Salmon-SMEM for SRR10435206. Log file: ./sra/work/salmon_smem/SRR10435206/salmon_smem.log

As stated in your Github page, salmon version of 0.11.0 worked fine, instead.
When it comes to RNACocktail, I guess it is time for it to move on and adopt a salmon version newer than 0.11.0, at least 0.11.1, as the feature of option '-k' recognition is almost obsolete as aforementioned. Installation of salmon version 0.11.0 from Conda is also discouraged with the advent of libtbb issue:

error while loading shared libraries: libtbb.so.2: cannot open shared object file: No such file or directory
(ERR): Description of arguments failed!

ldd shows libtbb refers to no .so file on the server.
Obviously salmon version 0.11.0 is badly off and seriously ill. Please let it rest in peace as you are maintaining RNACocktail at least up to 11 Nov 2020, only 7 months away from my issue arousal, inferred from date of your latest release.
Alternatively, you can modify the Python code to detect the version of salmon to avoid feeding option '-k' into salmon quant of version above 0.11.0, which seems to be the simplest and compatible solution.
Anyway I greatly appreciate your efforts in developing such a versatile tool to simplify RNA-seq analysis. You've done such a great job! I wish you are still actively maintaining this tool.

hisat2_jun2bed.py error

when I ran rnacocktail with align mode, i got an error below.

Traceback (most recent call last):
  File "/usr/local/bin/hisat2_jun2bed.py", line 32, in <module>
    int_start =  int(locus_ls[1])-51
ValueError: invalid literal for int() with base 10: 'ctg9'

and then I ckecked this error carefully, I found that when my input chromosome name contained seporater "_", the program will throw out an error above! consider many organsim's chromosome names contain seporater '_' especially the new assembly organism genome, i recommend you shoud change line 23 locus = "_".join([line_list[0],leftpos,rightpos,line_list[3]]) to locus = "__".join([line_list[0],leftpos,rightpos,line_list[3]]) and line 30 locus_ls = locus.split("_") to locus_ls = locus.split("__") in script 'hisat2_jun2bed.py' , or change another seporater . that will work properly!

sincerely

samtools version 1.3 or higher incompatible

the alignment script is not compatible with samtools version 1.3 or higher

Keep moving

RNAcocktail is a very impressive tool to process almost all kinds of analyses based on RNA-seq data. But the tools integrated in the RNAcocktail has not been updated for a long time. can you upgrade tools in RNAcocktail, for example salmon. And RNA splicing is also a major part of analyzing RNA-seq data, DART, SpliceAI improve splicing prediction.

Transcript quantification of pacbio long reads (ccs)

Hi, there
The rnacocktail is an excellent and comprehensive software to do the RNA-seq analysis. As we all know the third-generation sequencing (such as the Pacbio) has more power in detecting the isoforms. A challenging problem is the qualification of the isoforms. If I want to do it, what should I do? Are there some precious software to do this work ? Need your help !

Thanks advance
Sincelrely
Yizhong Huang

docker test fails on fusioncatcher in 0.3.0

When running docker_test.sh

mkdir fusioncatcher_data
cd fusioncatcher_data
wget https://sourceforge.net/projects/fusioncatcher/files/data/human_v95.tar.gz.aa
--2019-11-20 20:51:01-- https://sourceforge.net/projects/fusioncatcher/files/data/human_v95.tar.gz.aa
Resolving sourceforge.net (sourceforge.net)... 216.105.38.13
Connecting to sourceforge.net (sourceforge.net)|216.105.38.13|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2019-11-20 20:51:02 ERROR 404: Not Found.

GIREMI failed!

I‘m getting some errors with the following command, I hope to help solve it, thank you very much.

$ docker run --rm -u container_R -v /D8:/data -v /Container:/write rnacocktail:latest run_rnacocktail.py editing --alignment /data/D8.sorted.bam --variant /data/D8.snv.vcf --strand_pos /data/hg19_strand_pos.bed --genes_pos /data/hg19_genes_pos.bed --outdir /write/out --workdir /write/work --threads 10 --sample D8 --ref_genome /data/Homo_sapiens.GRCh37.dna.primary_assembly.fa --knownsites /data/common_all_20180418.vcf --giremi_dir /usr/local/bin/

INFO 2019-02-27 09:02:18,708 src.utils Running RNASeqPipeline 0.2.2
INFO 2019-02-27 09:02:18,708 src.utils Command-line /usr/local/bin/run_rnacocktail.py editing --alignment /data/D8.sorted.bam --variant /data/D8.snv.vcf --strand_pos /data/hg19_strand_pos.bed --genes_pos /data/hg19_genes_pos.bed --outdir /write/out --workdir /write/work --threads 10 --sample D8 --ref_genome /data/Homo_sapiens.GRCh37.dna.primary_assembly.fa --knownsites /data/common_all_20180418.vcf --giremi_dir /usr/local/bin/ --htslib_dir=/opt/htslib-1.3/
INFO 2019-02-27 09:02:18,708 src.utils Arguments are Namespace(VariantAnnotator_opts='', alignment='/data/D8.sorted.bam', editing_caller='GIREMI', gatk='GenomeAnalysisTK.jar', genes_pos='/data/hg19_genes_pos.bed', giremi_dir='/usr/local/bin/', giremi_opts='', htslib_dir='/opt/htslib-1.3/', java='java', java_opts='-Xms1g -Xmx5g', knownsites='/data/common_all_20180418.vcf', mode='editing', outdir='/write/out', ref_genome='/data/Homo_sapiens.GRCh37.dna.primary_assembly.fa', sample='D8', samtools='samtools', start=0, strand_pos='/data/hg19_strand_pos.bed', threads=10, timeout=10000000, variant='/data/D8.snv.vcf', workdir='/write/work')
INFO 2019-02-27 09:02:18,709 src.utils Run log will be saved in /write/work/logs/run-editing-20190227-090218.log
INFO 2019-02-27 09:02:18,709 src.utils Run in mode: editing
INFO 2019-02-27 09:02:18,709 src.utils Running RNA editing calling step using GIREMI
INFO 2019-02-27 09:02:18,709 src.run_editing Running RNA editing detection (GIREMI) for D8
ERROR 2019-02-27 09:02:18,710 src.run_editing Aborting!
INFO 2019-02-27 09:02:18,710 src.run_editing GIREMI failed!
ERROR 2019-02-27 09:02:18,710 src.run_editing No alignment file /data/D8.sorted.bam

Cannot start run_rnacocktail.py

I'm just trying to get started with RNAcocktail, but I don't see how I can get this started, I can't even get the help menu. I don't see how to get this installed and I don't see anything about this in the README files.

703404669@bioitutil2:/illumina/runs/RNASeq/rnacocktail-0.2.1/scripts$ ./run_rnacocktail.py 
Traceback (most recent call last):
  File "./run_rnacocktail.py", line 5, in <module>
    from src.main import run_pipeline
ImportError: No module named src.main
703404669@bioitutil2:/illumina/runs/RNASeq/rnacocktail-0.2.1/scripts$ ./run_rnacocktail.py -h
Traceback (most recent call last):
  File "./run_rnacocktail.py", line 5, in <module>
    from src.main import run_pipeline
ImportError: No module named src.main
703404669@bioitutil2:/illumina/runs/RNASeq/rnacocktail-0.2.1/scripts$

bioinform / rnacocktail Goto Github PK

rnacocktail's Introduction

rnacocktail's People

Contributors

Stargazers

Watchers

Forkers

rnacocktail's Issues

ERROR ------------------------------------------------------------------------------------------

ERROR A USER ERROR has occurred (version 3.8-0-ge9d806836):

ERROR

ERROR This means that one or more arguments or inputs in your command are incorrect.

ERROR The error message below tells you what is the problem.

ERROR

ERROR If the problem is an invalid argument, please check the online documentation guide

ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.

ERROR

ERROR Visit our website and forum for extensive documentation and answers to

ERROR commonly asked questions https://software.broadinstitute.org/gatk

ERROR

ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.

ERROR

ERROR MESSAGE: Invalid command line: The parameter standard_min_confidence_threshold_for_emitting is deprecated. This argument is no longer used in GATK versions 3.7 and newer. Please see the online documentation for the latest usage recommendations.

I‘m getting some errors with the following command, I hope to help solve it, thank you very much.

Recommend Projects

Recommend Topics

Recommend Org