RNACocktail: A comprehensive framework for accurate and efficient RNA-Seq analysis
See http://bioinform.github.io/rnacocktail/ for help and downloads.
License: Other
RNACocktail: A comprehensive framework for accurate and efficient RNA-Seq analysis
See http://bioinform.github.io/rnacocktail/ for help and downloads.
I have 80 samples for "--sample" parameters of "run_rnacocktail.py diff".
And the I got the error info:
IOError: [Errno 36] File name too long:
I found it because rnacoktail will use all the sample names to create the log name. However, Popular file system Linux(ext4) has a filename length limitof 255 chars. can you have some alternative solution?
Appreciate it!
Hi,
First thanks for this very complete pipeline.
I have a problem with the step 5 of the editing mode.
I think the problem is in line 88 of the editing.py file and more precisely on the merge_info_SNV function.
The exception is raised in line 90 of this file because (I think again), the cat() function is not possible between SNV_fwd and SNV_fwd1.
This is a capture of the SNV_no var:
a capture of the "feature" var on the merge_info_SNV function:
and of my vcf file :
Do you have any idea of what's going on?
Thanks!
Currently I am using RNAcocktail for RNA editing sites identification. All procedure went smoothly until STEP 10, with log showing as follows:
INFO 2019-04-09 11:37:47,149 src.run_editing --------------------------STEP 9--------------------------
INFO 2019-04-09 11:37:47,150 src.run_editing Task: Rerun GIREMI for 1351
INFO 2019-04-09 11:37:47,150 src.run_editing Running "bash -c cd /cloud/data/zxcai/soft/giremi && giremi -f /cloud/data/zxcai/ref/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa -l /cloud/data/zxcai/RNAcocktail_test/giremi/1351/SNV_annotated_filtered.bed -o /cloud/data/zxcai/RNAcocktail_test/giremi/1351/giremi_out.txt /cloud/data/zxcai/RNAcocktail_test/giremi/1351/alignments.pos_sorted.bam"
INFO 2019-04-09 12:28:29,244 src.run_editing Returned code 1 (3042.03 seconds)
INFO 2019-04-09 12:28:29,252 src.utils Creating directory /cloud/data/zxcai/RNAcocktail_test/out/giremi/1351
INFO 2019-04-09 12:28:29,287 src.run_editing --------------------------STEP 10--------------------------
INFO 2019-04-09 12:28:29,296 src.run_editing GIREMI failed!
INFO 2019-04-09 12:28:29,296 src.utils Run log is saved in /cloud/data/zxcai/RNAcocktail_test/logs/run-editing-20190409-113746.log
INFO 2019-04-09 12:28:29,296 src.utils All Done!
And when I check the work dir for output files generated during the analysis, I discovered that giremi_out.txt existed, but not giremi_out.txt.res (which should be copied to output dir in STEP 10). So I try to locate where this problem comes from, and could not find where giremi_out.txt.res is generated according to run_editing.py.
My guess is that when rerun Giremi to remove N variants, maybe the output should be giremi_out.txt.res?
Hello!
I am working to use rnacocktail to call variants in my data (following alignment using HISAT2). I have run into the following error:
INFO 10:17:19,797 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836, Compiled 2017/07/28 21:26:50
INFO 10:17:19,797 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 10:17:19,797 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 10:17:19,797 HelpFormatter - [Thu Feb 14 10:17:19 EST 2019] Executing on Linux 3.10.0-957.el7.x86_64 amd64
INFO 10:17:19,797 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_191-b12
INFO 10:17:19,800 HelpFormatter - Program Args: -T HaplotypeCaller -R Homo_sapiens.GRCh37.dna.primary_assembly.fa -I working/gatk/MHXXXXXXX-XXXXXXXXXX/bsqr.bam -o working/gatk/MHXXXXXXX-XXXXXXXXXX/variants.vcf -stand_call_conf 20.000000 -stand_emit_conf 20.000000 -dontUseSoftClippedBases
INFO 10:17:19,808 HelpFormatter - Executing as XXXXXX@XXXXXX on Linux 3.10.0-957.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_191-b12.
INFO 10:17:19,809 HelpFormatter - Date/Time: 2019/02/14 10:17:19
INFO 10:17:19,809 HelpFormatter - ----------------------------------------------------------------------------------
INFO 10:17:19,809 HelpFormatter - ----------------------------------------------------------------------------------ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.8-0-ge9d806836):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Invalid command line: The parameter standard_min_confidence_threshold_for_emitting is deprecated. This argument is no longer used in GATK versions 3.7 and newer. Please see the online documentation for the latest usage recommendations.
Does rnacocktail only support GATK version 3.5?
Thanks so much for your help!
I was running RNACocktail for RNA quantification using shell command below:
# ${SRR} stands for SRR accession of the sample
run_rnacocktail.py quantify \
--quantifier_idx /home/data/refdir/index/salmon/hg38_index \
--1 ./sra/${SRR}_1.fastq.gz \
--2 ./sra/${SRR}_2.fastq.gz \
--libtype IU \
--outdir ./sra/out \
--workdir ./sra/work \
--threads 10 \
--sample ${SRR} \
--unzip
I built all salmon versions above 0.11.0 from source only to find that option '-k' failed to be recognised by command salmon quant
. The outputs were the same in every version of salmon and in every salmon_smem.log
of samples it processed:
Exception : [unrecognised option '-k']. Exiting.
Below is the command line output:
INFO 2021-06-06 20:51:29,182 src.run_quantify --------------------------STEP 1--------------------------
INFO 2021-06-06 20:51:29,183 src.run_quantify Task: Salmon-SMEM for SRR10435206
INFO 2021-06-06 20:51:29,183 src.run_quantify Running "bash -c salmon quant -i /home/data/refdir/index/salmon/hg38_index -p 10 -k 19 -l IU -1 <(gunzip -c ./sra/SRR10435206_1.fastq.gz) -2 <(gunzip -c ./sra/SRR10435206_2.fastq.gz) -o ./sra/work/salmon_smem/SRR10435206"
INFO 2021-06-06 20:51:29,543 src.run_quantify Returned code 1 (0.357945 seconds)
ERROR 2021-06-06 20:51:29,543 src.run_quantify Aborting!
INFO 2021-06-06 20:51:29,543 src.run_quantify Salmon-SMEM failed!
ERROR 2021-06-06 20:51:29,543 src.run_quantify Failed Salmon-SMEM for SRR10435206. Log file: ./sra/work/salmon_smem/SRR10435206/salmon_smem.log
As stated in your Github page, salmon version of 0.11.0 worked fine, instead.
When it comes to RNACocktail, I guess it is time for it to move on and adopt a salmon version newer than 0.11.0, at least 0.11.1, as the feature of option '-k' recognition is almost obsolete as aforementioned. Installation of salmon version 0.11.0 from Conda is also discouraged with the advent of libtbb issue:
error while loading shared libraries: libtbb.so.2: cannot open shared object file: No such file or directory
(ERR): Description of arguments failed!
ldd
shows libtbb refers to no .so file on the server.
Obviously salmon version 0.11.0 is badly off and seriously ill. Please let it rest in peace as you are maintaining RNACocktail at least up to 11 Nov 2020, only 7 months away from my issue arousal, inferred from date of your latest release.
Alternatively, you can modify the Python code to detect the version of salmon to avoid feeding option '-k' into salmon quant
of version above 0.11.0, which seems to be the simplest and compatible solution.
Anyway I greatly appreciate your efforts in developing such a versatile tool to simplify RNA-seq analysis. You've done such a great job! I wish you are still actively maintaining this tool.
when I ran rnacocktail with align mode, i got an error below.
Traceback (most recent call last):
File "/usr/local/bin/hisat2_jun2bed.py", line 32, in <module>
int_start = int(locus_ls[1])-51
ValueError: invalid literal for int() with base 10: 'ctg9'
and then I ckecked this error carefully, I found that when my input chromosome name contained seporater "_"
, the program will throw out an error above! consider many organsim's chromosome names contain seporater '_' especially the new assembly organism genome, i recommend you shoud change line 23 locus = "_".join([line_list[0],leftpos,rightpos,line_list[3]])
to locus = "__".join([line_list[0],leftpos,rightpos,line_list[3]])
and line 30 locus_ls = locus.split("_")
to locus_ls = locus.split("__")
in script 'hisat2_jun2bed.py' , or change another seporater . that will work properly!
sincerely
the alignment script is not compatible with samtools version 1.3 or higher
RNAcocktail is a very impressive tool to process almost all kinds of analyses based on RNA-seq data. But the tools integrated in the RNAcocktail has not been updated for a long time. can you upgrade tools in RNAcocktail, for example salmon. And RNA splicing is also a major part of analyzing RNA-seq data, DART, SpliceAI improve splicing prediction.
Hi, there
The rnacocktail is an excellent and comprehensive software to do the RNA-seq analysis. As we all know the third-generation sequencing (such as the Pacbio) has more power in detecting the isoforms. A challenging problem is the qualification of the isoforms. If I want to do it, what should I do? Are there some precious software to do this work ? Need your help !
Thanks advance
Sincelrely
Yizhong Huang
When running docker_test.sh
INFO 2019-02-27 09:02:18,708 src.utils Running RNASeqPipeline 0.2.2
INFO 2019-02-27 09:02:18,708 src.utils Command-line /usr/local/bin/run_rnacocktail.py editing --alignment /data/D8.sorted.bam --variant /data/D8.snv.vcf --strand_pos /data/hg19_strand_pos.bed --genes_pos /data/hg19_genes_pos.bed --outdir /write/out --workdir /write/work --threads 10 --sample D8 --ref_genome /data/Homo_sapiens.GRCh37.dna.primary_assembly.fa --knownsites /data/common_all_20180418.vcf --giremi_dir /usr/local/bin/ --htslib_dir=/opt/htslib-1.3/
INFO 2019-02-27 09:02:18,708 src.utils Arguments are Namespace(VariantAnnotator_opts='', alignment='/data/D8.sorted.bam', editing_caller='GIREMI', gatk='GenomeAnalysisTK.jar', genes_pos='/data/hg19_genes_pos.bed', giremi_dir='/usr/local/bin/', giremi_opts='', htslib_dir='/opt/htslib-1.3/', java='java', java_opts='-Xms1g -Xmx5g', knownsites='/data/common_all_20180418.vcf', mode='editing', outdir='/write/out', ref_genome='/data/Homo_sapiens.GRCh37.dna.primary_assembly.fa', sample='D8', samtools='samtools', start=0, strand_pos='/data/hg19_strand_pos.bed', threads=10, timeout=10000000, variant='/data/D8.snv.vcf', workdir='/write/work')
INFO 2019-02-27 09:02:18,709 src.utils Run log will be saved in /write/work/logs/run-editing-20190227-090218.log
INFO 2019-02-27 09:02:18,709 src.utils Run in mode: editing
INFO 2019-02-27 09:02:18,709 src.utils Running RNA editing calling step using GIREMI
INFO 2019-02-27 09:02:18,709 src.run_editing Running RNA editing detection (GIREMI) for D8
ERROR 2019-02-27 09:02:18,710 src.run_editing Aborting!
INFO 2019-02-27 09:02:18,710 src.run_editing GIREMI failed!
ERROR 2019-02-27 09:02:18,710 src.run_editing No alignment file /data/D8.sorted.bam
I'm just trying to get started with RNAcocktail, but I don't see how I can get this started, I can't even get the help menu. I don't see how to get this installed and I don't see anything about this in the README files.
703404669@bioitutil2:/illumina/runs/RNASeq/rnacocktail-0.2.1/scripts$ ./run_rnacocktail.py
Traceback (most recent call last):
File "./run_rnacocktail.py", line 5, in <module>
from src.main import run_pipeline
ImportError: No module named src.main
703404669@bioitutil2:/illumina/runs/RNASeq/rnacocktail-0.2.1/scripts$ ./run_rnacocktail.py -h
Traceback (most recent call last):
File "./run_rnacocktail.py", line 5, in <module>
from src.main import run_pipeline
ImportError: No module named src.main
703404669@bioitutil2:/illumina/runs/RNASeq/rnacocktail-0.2.1/scripts$
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.