zhengxia / dapars Goto Github PK

View Code? Open in Web Editor NEW

46.0 46.0 32.0 20.86 MB

DaPars(Dynamic analysis of Alternative PolyAdenylation from RNA-seq)

License: GNU General Public License v2.0

Python 100.00%

dapars's People

Contributors

Stargazers

Watchers

dapars's Issues

Can not run Dapars: ImportError: cannot import name entr

Dapars won't run. I get the following error.

File "DaPars-0.9.0/DaPars_main.py", line 7, in <module>
    import scipy.stats
  File "/apps/python/2.7.6/lib/python2.7/site-packages/scipy/stats/__init__.py", line 334, in <module>
    from .stats import *
  File "/apps/python/2.7.6/lib/python2.7/site-packages/scipy/stats/stats.py", line 187, in <module>
    from . import distributions
  File "/apps/python/2.7.6/lib/python2.7/site-packages/scipy/stats/distributions.py", line 10, in <module>
    from ._distn_infrastructure import (entropy, rv_discrete, rv_continuous,
  File "/apps/python/2.7.6/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 19, in <module>
    from scipy.special import (comb, chndtr, gammaln, hyp0f1,
ImportError: cannot import name entr

Original issue reported on code.google.com by [email protected] on 10 Oct 2014 at 4:18

RNA-seq data normalization

I would like to use dapars with two datasets from different studies but the same organism and tissue. Normally I would want to normalize aligned reads and remove batch-effect, but what are the recommendations in case of dapars?

does dapars support strand-specific RNA-Seq？

Hello, zhengxia:

The Dapars is really useful and very nice for me to analysis APA in last days.

But recently I gained some strand-specific RNA-Seq data wich build like dUTP ways, does dapars support it or have some specific parameter for it？

Best wishes
Haifeng Sun

DaPars_Extract_Anno.py - coordinates in output BED file (and 'Loci' column) are shifted 1 nucleotide from source transcript

TL;DR:

+ strand: DaPars BED/'Loci' 3'UTR start is 1 nt downstream of the actual annotated start
- strand: Dapars BED/'Loci' 3'end of the 3'UTR is 1 nt upstream of the actual annotated end

In lines 38-45 of DaPars_Extract_Anno.py, the 'start' coordinates for UTRs (last exons) are converted to '1-based' by adding 1 to the start coordinate. Given that the script outputs a BED file, and according to the BED format convention start coordinates are included in the range, adding 1 to the start coordinate actually excludes the first nucleotide of the UTR on the plus strand and the last nucleotide of the UTR on the minus strand.

Here is an example of extracted UTRs on the + strand. 'D Extracted 3'UTR' is the output BED file of DaPars_Extract_Anno.py, with 'D2 Extracted 3'UTR' the same file from the DaPars2 repo (the script is identical between the two tools). 'GTF' is the reference GTF file from which the input BED12 file was derived. In both cases, the last exon reported in the DaPars BED file begins 1nt downstream of the source transcript:

Again, on the - strand the DaPars BED file terminates 1nt upstream of the source transcript end:

I'm not sure how DaPars handles this internally (i.e. the coordinates in the BED file may not be interpreted according to BED conventions) and regardless I don't expect 1nt to make a drastic difference to the algorithm's output. The real issue comes when using the BED file/'Loci' column in output files to extract predicted polyA sites, especially the distal polyA site on the minus strand (as the Start coordinate which corresponds to the UTR end is shifted by 1nt).

Screenshots were generated using output from the DaPars & DaPars2 execution workflows from the APAeval project with test data used as input.

Edit: This issue has also been reported to the DaPars2 repo at 3UTR/DaPars2#8

through errors when mixing male and femal data

when i mixed male and female data,compute will get an error

[Fri 08 Dec 2017 04:59:34 PM ] Start Analysis ...
[Fri 08 Dec 2017 04:59:34 PM ] Loading coverage ...
Traceback (most recent call last):
  File "/data5/exe/dapars/src/DaPars_main.py", line 548, in <module>
    De_Novo_3UTR_Identification_Loading_Target_Wig_for_TCGA_Multiple_Samples_Main(sys.argv)
  File "/data5/exe/dapars/src/DaPars_main.py", line 154, in De_Novo_3UTR_Identification_Loading_Target_Wig_for_TCGA_Multiple_Samples_Main
    All_samples_Target_3UTR_coverages, All_samples_sequencing_depths, UTR_events_dict = Load_Target_Wig_files(All_Sample_files, Annotated_3
UTR_file)
  File "/data5/exe/dapars/src/DaPars_main.py", line 509, in Load_Target_Wig_files
    curr_sample_All_chroms_coverage_dict[chrom_name][1].append(0)
KeyError: 'chrY'

Output column definitions

I've been using the DaPars tool to identify changes in APA in our RNA-seq data. I have successfully produced the output table but even after reading both the publications it is not totally clear to me what each column in the output table is.

Please can you provide a description for what each column in the output table is?

Thanks in advance!
Kathryn

problem with -- list index out of range

Hi :
I'm using Dapars to find APA,but some problems seem to fused me :
python /home/pc/biosoft/dapars/src/DaPars_Extract_Anno.py -b $refbed -s $genesymbol -o Dapars_extracted_3UTR.bed

Generating regions ...
Traceback (most recent call last):
File "/home/pc/biosoft/dapars/src/DaPars_Extract_Anno.py", line 151, in
Extract_Anno_main(sys.argv[1:])
File "/home/pc/biosoft/dapars/src/DaPars_Extract_Anno.py", line 139, in Extract_Anno_main
Annotation_prepar_3UTR_extraction(gene_bed_file, gene_symbol_annotation_file,output_extract_file)
File "/home/pc/biosoft/dapars/src/DaPars_Extract_Anno.py", line 17, in Annotation_prepar_3UTR_extraction
gene_symbol = fields[1]
IndexError: list index out of range

do u know what's happened ? how should i solve this problem?
Thanks very much !

Haifeng Sun
China Nanjing Medical University

Docs link broken?

Hello, looks like the docs link is broken:

http://lilab.research.bcm.edu/dldcc-web/lilab/zheng/DaPars_Documentation/html/DaPars.html

Would like to use daPars but would need some examples / docs / install instructions.

Please advise,
Thanks,
Gregor

Possible to install using conda?

Hello,

I'm trying to install dapars on a computing cluster and am unable to get it running. I'm trying to use it in a virtual environment with python 2.7, rpy2(v2.8.6) and r-base(v3.4.1). When I try to run the program I get:

Traceback (most recent call last): File "/project/klynclab/software/dapars-0.9.1/src/DaPars_main.py", line 11, in <module> from rpy2.robjects.packages import importr File "/home/fmax/miniconda3/envs/dapars/lib/python2.7/site-packages/rpy2/robjects/__init__.py", line 16, in <module> import rpy2.rinterface as rinterface File "/home/fmax/miniconda3/envs/dapars/lib/python2.7/site-packages/rpy2/rinterface/__init__.py", line 92, in <module> from rpy2.rinterface._rinterface import (baseenv, ImportError: libicuuc.so.64: cannot open shared object file: No such file or directory

Can anyone give me a list of the dependencies/packages required to properly run dapars?

Chrom name issue

Dear all,

after some time I came back using DaPars and ran into the following issue, using my config and annotations files I used before:
python2.7 ../dapars/src/DaPars_main.py Configure_file.txt
[Mon 17 Dec 2018 03:47:16 PM ] Start Analysis ...
[Mon 17 Dec 2018 03:47:16 PM ] Loading coverage ...
Traceback (most recent call last):
File "../dapars/src/DaPars_main.py", line 548, in
De_Novo_3UTR_Identification_Loading_Target_Wig_for_TCGA_Multiple_Samples_Main(sys.argv)
File "../dapars/src/DaPars_main.py", line 154, in De_Novo_3UTR_Identification_Loading_Target_Wig_for_TCGA_Multiple_Samples_Main
All_samples_Target_3UTR_coverages, All_samples_sequencing_depths, UTR_events_dict = Load_Target_Wig_files(All_Sample_files, Annotated_3UTR_file)
File "../dapars/src/DaPars_main.py", line 509, in Load_Target_Wig_files
curr_sample_All_chroms_coverage_dict[chrom_name][1].append(0)
UnboundLocalError: local variable 'chrom_name' referenced before assignment

Would be happy if you can help me with this.
Additional Info: I mapped my data against ensembl reference (chr. called 1,2,3...,X) while the annotation file is based on gencode (chr1, chr2,...,chrX). But also former bedgraphs delivered the same error having the same chr-format as the annotation file used by DaPars.

What does PASS_filter refer to？

Thank you for your tool.
I'm sorry to bother you,I don't quite understand what the output content means.
Accodition to the acticle, If I want to filter the data I need to set the cutoff of adjusted.P_val to 0.05. But I found that the column of PASS_filter all are "N".
I am also confuse about A_1_long_exp and A_1_short_exp.
What does they refer to？
Thanks you.

Only outputting 100 UTRs

DaPars_main.py is only outputting 100 UTRs.
My extracted UTR.bed file has 50k UTRs

wc -l gencode.vM7.extracted_utr.bed 
50752 gencode.vM7.extracted_utr.bed

However, the results file only has 100 entries

wc -l DaPars_data_All_Prediction_Results.txt 
100 DaPars_data_All_Prediction_Results.txt

I have tried changing the coverage_cutoff, FDR_cutoff, PDUI_cutoff, and Fold_Change_cutoff, and still only get 100 results.

hg19_4_19_2012_Refseq_id_from_UCSC.txt

excuse me , where the hg19_4_19_2012_Refseq_id_from_UCSC.txt come from?
and if i use mm10 ,how to do ?
thanks very much if u can answer me !

print() syntax in python3

Hi ZhengXia,

Just a minor bug. In Python3, the print needs to be treated as a function.
This:
print("Hello world.")
rather than:
print "Hello world."

After changing all print statements, the program works without error in Python.

Best,
Nico

Migrate to Python 3

Rpy2 no longer supports Python 2. The last version to support Python 2 does not support the latest version of R. It seems that the best way forward is to upgrade dapars to Python 3.

DaPars does not handle scientific notation in the wig file coverage field (field 4)

-Current version DaPars v0.9.0

-Description of error and relevant line number in the program
Using files converted to wiggles from bedgraphs or bigwigs, some of the 
coverage may be in scientific notation (i.e. 1.6e+06). 


-Example lines in input wiggle
chrM    1900    1901    1.15903e+06
chrM    1901    1902    1.24796e+06
chrM    1902    1903    1.31714e+06
chrM    1903    1904    1.40726e+06


-Produced error
Traceback (most recent call last):
  File "/home/dreyfusslab/Desktop/DaPars_APA_analysis/DaPars_main.py", line 548, in <module>
    De_Novo_3UTR_Identification_Loading_Target_Wig_for_TCGA_Multiple_Samples_Main(sys.argv)
  File "/home/dreyfusslab/Desktop/DaPars_APA_analysis/DaPars_main.py", line 154, in De_Novo_3UTR_Identification_Loading_Target_Wig_for_TCGA_Multiple_Samples_Main
    All_samples_Target_3UTR_coverages, All_samples_sequencing_depths, UTR_events_dict = Load_Target_Wig_files(All_Sample_files, Annotated_3UTR_file)
  File "/home/dreyfusslab/Desktop/DaPars_APA_analysis/DaPars_main.py", line 500, in Load_Target_Wig_files
    cur_sample_total_depth += int(fields[-1]) * (region_end - region_start)
ValueError: invalid literal for int() with base 10: '1.15903e+06'


-Relevant lines in DaPars_main.py
500:  cur_sample_total_depth += int(fields[-1]) * (region_end - region_start)
507:  
curr_sample_All_chroms_coverage_dict[chrom_name][1].append(int(fields[-1]))


-Proposed solution
One can use the command float() to convert the scientific notation to something 
that is then usable by int().  For example, an updated command for lines 500 
and 507 would be:
500:  cur_sample_total_depth += int(float(fields[-1])) * (region_end - 
region_start)
507:  
curr_sample_All_chroms_coverage_dict[chrom_name][1].append(int(float(fields[-1])
))

This does not produce any error.

Original issue reported on code.google.com by [email protected] on 9 Jul 2014 at 4:21

Dapars generate divide by zero issue

What steps will reproduce the problem?
python /data/raw_data/DaPars/DaPars-0.9.0/DaPars_main.py 
DaPars_hg19_CD8_configure.txt
[Tue 19 Aug 2014 11:40:08 AM ] Start Analysis ...
[Tue 19 Aug 2014 11:40:08 AM ] Loading coverage ...
[Tue 19 Aug 2014 12:26:06 PM ] Loading coverage finished ...
[Tue 19 Aug 2014 01:22:06 PM ] Filtering the result ...
/home/ski/anaconda/lib/python2.7/site-packages/scipy/stats/stats.py:2563: 
RuntimeWarning: divide by zero encountered in double_scalars
  if float(np.abs(pexact - pmode)) / np.abs(np.max(pexact, pmode)) <= 1 - epsilon:
[Tue 19 Aug 2014 01:23:05 PM ] Finished!

the error appears in the Filter the result step and likely due to fisher exact 
test 


What is the expected output? What do you see instead?
We are supposed to see many shorting of 3UTR, but failed to detect any of such 
cases

What version of the product are you using? On what operating system?
DaPars-0.9.0, and the operating system is red hat x86_64

Please provide any additional information below.
Thank you so much. Hope to see this issue addressed quickly!

Original issue reported on code.google.com by [email protected] on 19 Aug 2014 at 10:14

DaPars does not handle scientific notation in the wig file coverage field (field 4)

-Current version DaPars v0.9.0

-Description of error and relevant line number in the program
Using files converted to wiggles from bedgraphs or bigwigs, some of the 
coverage may be in scientific notation (i.e. 1.6e+06). 


-Example lines in input wiggle
chrM    1900    1901    1.15903e+06
chrM    1901    1902    1.24796e+06
chrM    1902    1903    1.31714e+06
chrM    1903    1904    1.40726e+06


-Produced error
Traceback (most recent call last):
  File "/home/dreyfusslab/Desktop/DaPars_APA_analysis/DaPars_main.py", line 548, in <module>
    De_Novo_3UTR_Identification_Loading_Target_Wig_for_TCGA_Multiple_Samples_Main(sys.argv)
  File "/home/dreyfusslab/Desktop/DaPars_APA_analysis/DaPars_main.py", line 154, in De_Novo_3UTR_Identification_Loading_Target_Wig_for_TCGA_Multiple_Samples_Main
    All_samples_Target_3UTR_coverages, All_samples_sequencing_depths, UTR_events_dict = Load_Target_Wig_files(All_Sample_files, Annotated_3UTR_file)
  File "/home/dreyfusslab/Desktop/DaPars_APA_analysis/DaPars_main.py", line 500, in Load_Target_Wig_files
    cur_sample_total_depth += int(fields[-1]) * (region_end - region_start)
ValueError: invalid literal for int() with base 10: '1.15903e+06'


-Relevant lines in DaPars_main.py
500:  cur_sample_total_depth += int(fields[-1]) * (region_end - region_start)
507:  
curr_sample_All_chroms_coverage_dict[chrom_name][1].append(int(fields[-1]))


-Proposed solution
One can use the command float() to convert the scientific notation to something 
that is then usable by int().  For example, an updated command for lines 500 
and 507 would be:
500:  cur_sample_total_depth += int(float(fields[-1])) * (region_end - 
region_start)
507:  
curr_sample_All_chroms_coverage_dict[chrom_name][1].append(int(float(fields[-1])
))

This does not produce any error.

Original issue reported on code.google.com by [email protected] on 9 Jul 2014 at 4:21

zhengxia / dapars Goto Github PK

dapars's People

Contributors

Stargazers

Watchers

Forkers

dapars's Issues

Recommend Projects

Recommend Topics

Recommend Org