zhengxia / dapars Goto Github PK
View Code? Open in Web Editor NEWDaPars(Dynamic analysis of Alternative PolyAdenylation from RNA-seq)
License: GNU General Public License v2.0
DaPars(Dynamic analysis of Alternative PolyAdenylation from RNA-seq)
License: GNU General Public License v2.0
Dapars won't run. I get the following error.
File "DaPars-0.9.0/DaPars_main.py", line 7, in <module>
import scipy.stats
File "/apps/python/2.7.6/lib/python2.7/site-packages/scipy/stats/__init__.py", line 334, in <module>
from .stats import *
File "/apps/python/2.7.6/lib/python2.7/site-packages/scipy/stats/stats.py", line 187, in <module>
from . import distributions
File "/apps/python/2.7.6/lib/python2.7/site-packages/scipy/stats/distributions.py", line 10, in <module>
from ._distn_infrastructure import (entropy, rv_discrete, rv_continuous,
File "/apps/python/2.7.6/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 19, in <module>
from scipy.special import (comb, chndtr, gammaln, hyp0f1,
ImportError: cannot import name entr
Original issue reported on code.google.com by [email protected]
on 10 Oct 2014 at 4:18
I would like to use dapars with two datasets from different studies but the same organism and tissue. Normally I would want to normalize aligned reads and remove batch-effect, but what are the recommendations in case of dapars?
Hello, zhengxia:
The Dapars is really useful and very nice for me to analysis APA in last days.
But recently I gained some strand-specific RNA-Seq data wich build like dUTP ways, does dapars support it or have some specific parameter for it?
Best wishes
Haifeng Sun
TL;DR:
+ strand: DaPars BED/'Loci' 3'UTR start is 1 nt downstream of the actual annotated start
- strand: Dapars BED/'Loci' 3'end of the 3'UTR is 1 nt upstream of the actual annotated end
In lines 38-45 of DaPars_Extract_Anno.py, the 'start' coordinates for UTRs (last exons) are converted to '1-based' by adding 1 to the start coordinate. Given that the script outputs a BED file, and according to the BED format convention start coordinates are included in the range, adding 1 to the start coordinate actually excludes the first nucleotide of the UTR on the plus strand and the last nucleotide of the UTR on the minus strand.
Here is an example of extracted UTRs on the + strand. 'D Extracted 3'UTR' is the output BED file of DaPars_Extract_Anno.py
, with 'D2 Extracted 3'UTR' the same file from the DaPars2 repo (the script is identical between the two tools). 'GTF' is the reference GTF file from which the input BED12 file was derived. In both cases, the last exon reported in the DaPars BED file begins 1nt downstream of the source transcript:
Again, on the - strand the DaPars BED file terminates 1nt upstream of the source transcript end:
I'm not sure how DaPars handles this internally (i.e. the coordinates in the BED file may not be interpreted according to BED conventions) and regardless I don't expect 1nt to make a drastic difference to the algorithm's output. The real issue comes when using the BED file/'Loci' column in output files to extract predicted polyA sites, especially the distal polyA site on the minus strand (as the Start coordinate which corresponds to the UTR end is shifted by 1nt).
Screenshots were generated using output from the DaPars & DaPars2 execution workflows from the APAeval project with test data used as input.
Edit: This issue has also been reported to the DaPars2 repo at 3UTR/DaPars2#8
when i mixed male and female data,compute will get an error
[Fri 08 Dec 2017 04:59:34 PM ] Start Analysis ...
[Fri 08 Dec 2017 04:59:34 PM ] Loading coverage ...
Traceback (most recent call last):
File "/data5/exe/dapars/src/DaPars_main.py", line 548, in <module>
De_Novo_3UTR_Identification_Loading_Target_Wig_for_TCGA_Multiple_Samples_Main(sys.argv)
File "/data5/exe/dapars/src/DaPars_main.py", line 154, in De_Novo_3UTR_Identification_Loading_Target_Wig_for_TCGA_Multiple_Samples_Main
All_samples_Target_3UTR_coverages, All_samples_sequencing_depths, UTR_events_dict = Load_Target_Wig_files(All_Sample_files, Annotated_3
UTR_file)
File "/data5/exe/dapars/src/DaPars_main.py", line 509, in Load_Target_Wig_files
curr_sample_All_chroms_coverage_dict[chrom_name][1].append(0)
KeyError: 'chrY'
I've been using the DaPars tool to identify changes in APA in our RNA-seq data. I have successfully produced the output table but even after reading both the publications it is not totally clear to me what each column in the output table is.
Please can you provide a description for what each column in the output table is?
Thanks in advance!
Kathryn
Hi :
I'm using Dapars to find APA,but some problems seem to fused me :
python /home/pc/biosoft/dapars/src/DaPars_Extract_Anno.py -b $refbed -s $genesymbol -o Dapars_extracted_3UTR.bed
Generating regions ...
Traceback (most recent call last):
File "/home/pc/biosoft/dapars/src/DaPars_Extract_Anno.py", line 151, in
Extract_Anno_main(sys.argv[1:])
File "/home/pc/biosoft/dapars/src/DaPars_Extract_Anno.py", line 139, in Extract_Anno_main
Annotation_prepar_3UTR_extraction(gene_bed_file, gene_symbol_annotation_file,output_extract_file)
File "/home/pc/biosoft/dapars/src/DaPars_Extract_Anno.py", line 17, in Annotation_prepar_3UTR_extraction
gene_symbol = fields[1]
IndexError: list index out of range
do u know what's happened ? how should i solve this problem?
Thanks very much !
Haifeng Sun
China Nanjing Medical University
Hello, looks like the docs link is broken:
http://lilab.research.bcm.edu/dldcc-web/lilab/zheng/DaPars_Documentation/html/DaPars.html
Would like to use daPars but would need some examples / docs / install instructions.
Please advise,
Thanks,
Gregor
Hello,
I'm trying to install dapars on a computing cluster and am unable to get it running. I'm trying to use it in a virtual environment with python 2.7, rpy2(v2.8.6) and r-base(v3.4.1). When I try to run the program I get:
Traceback (most recent call last): File "/project/klynclab/software/dapars-0.9.1/src/DaPars_main.py", line 11, in <module> from rpy2.robjects.packages import importr File "/home/fmax/miniconda3/envs/dapars/lib/python2.7/site-packages/rpy2/robjects/__init__.py", line 16, in <module> import rpy2.rinterface as rinterface File "/home/fmax/miniconda3/envs/dapars/lib/python2.7/site-packages/rpy2/rinterface/__init__.py", line 92, in <module> from rpy2.rinterface._rinterface import (baseenv, ImportError: libicuuc.so.64: cannot open shared object file: No such file or directory
Can anyone give me a list of the dependencies/packages required to properly run dapars?
Dear all,
after some time I came back using DaPars and ran into the following issue, using my config and annotations files I used before:
python2.7 ../dapars/src/DaPars_main.py Configure_file.txt
[Mon 17 Dec 2018 03:47:16 PM ] Start Analysis ...
[Mon 17 Dec 2018 03:47:16 PM ] Loading coverage ...
Traceback (most recent call last):
File "../dapars/src/DaPars_main.py", line 548, in
De_Novo_3UTR_Identification_Loading_Target_Wig_for_TCGA_Multiple_Samples_Main(sys.argv)
File "../dapars/src/DaPars_main.py", line 154, in De_Novo_3UTR_Identification_Loading_Target_Wig_for_TCGA_Multiple_Samples_Main
All_samples_Target_3UTR_coverages, All_samples_sequencing_depths, UTR_events_dict = Load_Target_Wig_files(All_Sample_files, Annotated_3UTR_file)
File "../dapars/src/DaPars_main.py", line 509, in Load_Target_Wig_files
curr_sample_All_chroms_coverage_dict[chrom_name][1].append(0)
UnboundLocalError: local variable 'chrom_name' referenced before assignment
Would be happy if you can help me with this.
Additional Info: I mapped my data against ensembl reference (chr. called 1,2,3...,X) while the annotation file is based on gencode (chr1, chr2,...,chrX). But also former bedgraphs delivered the same error having the same chr-format as the annotation file used by DaPars.
Thank you for your tool.
I'm sorry to bother you,I don't quite understand what the output content means.
Accodition to the acticle, If I want to filter the data I need to set the cutoff of adjusted.P_val to 0.05. But I found that the column of PASS_filter all are "N".
I am also confuse about A_1_long_exp and A_1_short_exp.
What does they refer to?
Thanks you.
DaPars_main.py is only outputting 100 UTRs.
My extracted UTR.bed file has 50k UTRs
wc -l gencode.vM7.extracted_utr.bed
50752 gencode.vM7.extracted_utr.bed
However, the results file only has 100 entries
wc -l DaPars_data_All_Prediction_Results.txt
100 DaPars_data_All_Prediction_Results.txt
I have tried changing the coverage_cutoff, FDR_cutoff, PDUI_cutoff, and Fold_Change_cutoff, and still only get 100 results.
excuse me , where the hg19_4_19_2012_Refseq_id_from_UCSC.txt come from?
and if i use mm10 ,how to do ?
thanks very much if u can answer me !
Hi ZhengXia,
Just a minor bug. In Python3, the print needs to be treated as a function.
This:
print("Hello world.")
rather than:
print "Hello world."
After changing all print statements, the program works without error in Python.
Best,
Nico
Rpy2 no longer supports Python 2. The last version to support Python 2 does not support the latest version of R. It seems that the best way forward is to upgrade dapars to Python 3.
-Current version DaPars v0.9.0
-Description of error and relevant line number in the program
Using files converted to wiggles from bedgraphs or bigwigs, some of the
coverage may be in scientific notation (i.e. 1.6e+06).
-Example lines in input wiggle
chrM 1900 1901 1.15903e+06
chrM 1901 1902 1.24796e+06
chrM 1902 1903 1.31714e+06
chrM 1903 1904 1.40726e+06
-Produced error
Traceback (most recent call last):
File "/home/dreyfusslab/Desktop/DaPars_APA_analysis/DaPars_main.py", line 548, in <module>
De_Novo_3UTR_Identification_Loading_Target_Wig_for_TCGA_Multiple_Samples_Main(sys.argv)
File "/home/dreyfusslab/Desktop/DaPars_APA_analysis/DaPars_main.py", line 154, in De_Novo_3UTR_Identification_Loading_Target_Wig_for_TCGA_Multiple_Samples_Main
All_samples_Target_3UTR_coverages, All_samples_sequencing_depths, UTR_events_dict = Load_Target_Wig_files(All_Sample_files, Annotated_3UTR_file)
File "/home/dreyfusslab/Desktop/DaPars_APA_analysis/DaPars_main.py", line 500, in Load_Target_Wig_files
cur_sample_total_depth += int(fields[-1]) * (region_end - region_start)
ValueError: invalid literal for int() with base 10: '1.15903e+06'
-Relevant lines in DaPars_main.py
500: cur_sample_total_depth += int(fields[-1]) * (region_end - region_start)
507:
curr_sample_All_chroms_coverage_dict[chrom_name][1].append(int(fields[-1]))
-Proposed solution
One can use the command float() to convert the scientific notation to something
that is then usable by int(). For example, an updated command for lines 500
and 507 would be:
500: cur_sample_total_depth += int(float(fields[-1])) * (region_end -
region_start)
507:
curr_sample_All_chroms_coverage_dict[chrom_name][1].append(int(float(fields[-1])
))
This does not produce any error.
Original issue reported on code.google.com by [email protected]
on 9 Jul 2014 at 4:21
What steps will reproduce the problem?
python /data/raw_data/DaPars/DaPars-0.9.0/DaPars_main.py
DaPars_hg19_CD8_configure.txt
[Tue 19 Aug 2014 11:40:08 AM ] Start Analysis ...
[Tue 19 Aug 2014 11:40:08 AM ] Loading coverage ...
[Tue 19 Aug 2014 12:26:06 PM ] Loading coverage finished ...
[Tue 19 Aug 2014 01:22:06 PM ] Filtering the result ...
/home/ski/anaconda/lib/python2.7/site-packages/scipy/stats/stats.py:2563:
RuntimeWarning: divide by zero encountered in double_scalars
if float(np.abs(pexact - pmode)) / np.abs(np.max(pexact, pmode)) <= 1 - epsilon:
[Tue 19 Aug 2014 01:23:05 PM ] Finished!
the error appears in the Filter the result step and likely due to fisher exact
test
What is the expected output? What do you see instead?
We are supposed to see many shorting of 3UTR, but failed to detect any of such
cases
What version of the product are you using? On what operating system?
DaPars-0.9.0, and the operating system is red hat x86_64
Please provide any additional information below.
Thank you so much. Hope to see this issue addressed quickly!
Original issue reported on code.google.com by [email protected]
on 19 Aug 2014 at 10:14
-Current version DaPars v0.9.0
-Description of error and relevant line number in the program
Using files converted to wiggles from bedgraphs or bigwigs, some of the
coverage may be in scientific notation (i.e. 1.6e+06).
-Example lines in input wiggle
chrM 1900 1901 1.15903e+06
chrM 1901 1902 1.24796e+06
chrM 1902 1903 1.31714e+06
chrM 1903 1904 1.40726e+06
-Produced error
Traceback (most recent call last):
File "/home/dreyfusslab/Desktop/DaPars_APA_analysis/DaPars_main.py", line 548, in <module>
De_Novo_3UTR_Identification_Loading_Target_Wig_for_TCGA_Multiple_Samples_Main(sys.argv)
File "/home/dreyfusslab/Desktop/DaPars_APA_analysis/DaPars_main.py", line 154, in De_Novo_3UTR_Identification_Loading_Target_Wig_for_TCGA_Multiple_Samples_Main
All_samples_Target_3UTR_coverages, All_samples_sequencing_depths, UTR_events_dict = Load_Target_Wig_files(All_Sample_files, Annotated_3UTR_file)
File "/home/dreyfusslab/Desktop/DaPars_APA_analysis/DaPars_main.py", line 500, in Load_Target_Wig_files
cur_sample_total_depth += int(fields[-1]) * (region_end - region_start)
ValueError: invalid literal for int() with base 10: '1.15903e+06'
-Relevant lines in DaPars_main.py
500: cur_sample_total_depth += int(fields[-1]) * (region_end - region_start)
507:
curr_sample_All_chroms_coverage_dict[chrom_name][1].append(int(fields[-1]))
-Proposed solution
One can use the command float() to convert the scientific notation to something
that is then usable by int(). For example, an updated command for lines 500
and 507 would be:
500: cur_sample_total_depth += int(float(fields[-1])) * (region_end -
region_start)
507:
curr_sample_All_chroms_coverage_dict[chrom_name][1].append(int(float(fields[-1])
))
This does not produce any error.
Original issue reported on code.google.com by [email protected]
on 9 Jul 2014 at 4:21
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.