Comments (6)
If the GTF contains at least "intron, intergenic and CDS" in the 3rd collumn, PEKA moves on. However, after detection of noxn and ntxn I am now stuck with another error message:
Traceback (most recent call last):
File "peka.py", line 1461, in
main()
File "peka.py", line 1457, in main
set_seeds
File "peka.py", line 1125, in run
reference, genome, genome_chr_sizes, window + kmer_length, window + kmer_length, merge_overlaps=False
File "peka.py", line 543, in get_sequences
return [line.split("\t")[1].strip() for line in open(seq_tab.seqfn)]
File "peka.py", line 543, in
return [line.split("\t")[1].strip() for line in open(seq_tab.seqfn)]
IndexError: list index out of range
from peka.
Hi!
Indeed having a GTF without all the specified regions, causes issues in PEKA, as it will try to generate results for composite regions, such as "whole_gene", which comprises of introns, CDS and UTRs.
The second error is related to sequence extraction from the genome, but I don't know why it occurred - I suppose it occurred in the intronic region? Can you please explain how you added introns into the GTF and whether you also added crosslinks to intronic sites? Could you please paste the full output that was printed before the second error? If you share your input files with me (you can upload them to dropbox or google drive and send a link to [email protected]), I will try to reproduce the error and add a fix.
Thanks for your report!
from peka.
Hi!
I have changed the last intergenic region to intron region in the GTF just to have the intron requirement fullfilled. I want this to run on the whole genome (as UTRs are not annotated on the GTF). I have shared the input files with you in here.
The error message prior the traceback was:
Getting thresholded crosslinks
Thresholding intron
lenght of df_reg for intron is: 96
Thresholding intron runtime: 0.00 min
Thresholding intergenic
lenght of df_reg for intergenic is: 121112
Thresholding intergenic runtime: 0.11 min
Thresholding cds_utr_ncrna
lenght of df_reg for cds_utr_ncrna is: 439304
Thresholding cds_utr_ncrna runtime: 0.32 min
Thresholding runtime: 0.43 min for 125120 thresholded crosslinks
559762 total sites. All sites taging runtime: 0.42 min
125120 thresholded sites on genome
559762 all sites on genome
noxn 411358 on genome
ntxn 55340 on genome
from peka.
Thank you! I'll have a look
from peka.
HI!
I found the cause for the second error - your fasta file was formatted as ASCII text, with CRLF line terminators. This means that the sequence was wrapped with nonstandard a line separator "^M$", like so:
SaciDSM639^M$
GGGGAGTTAAATGACTGTAATCCCTATTTCAAAATAGTTTGAGGTAACACCATTGGAAATTATTGTAAAC^M$
TAAAATAACGTATGAAACATATGTATTCATGTGGCAATTAGGGAAACGCTGAAAGGTGGCAAAGGGGAGG^M$
This formatting causes a faulty output in bedtools getfasta
and this is why PEKA errors when it tries to process it.
The error was resolved by changing your fasta file to UNIX format, by running
dox2unix your-fasta-file
Please, try changing your fasta file to UNIX format, run PEKA with it, and let me know if it resolved your issue.
As for the first error you reported regarding the need to have all regions annotated in the GTF, I'll implement a fix in the future, so there won't be a need to make "hackish" adjustments to input files.
from peka.
Hi Klara!
Seems like the formatting of the fasta file did the job. Thanx a lot!
from peka.
Related Issues (8)
- Puzzle about the Figure 1g about Heatmaps showing relative occurrences (RtXn) and PEKA-scores for top 40 k-mers HOT 4
- Question on how to make use of CLIP-Seq biological replicates? HOT 2
- Example input files? HOT 4
- Working with cross-link sites identified using iCount with overlapping indicies HOT 6
- test data works, but own data gives: ValueError: Overlapping IntervalIndex is not accepted. HOT 3
- Hi, can you provide sample data to see whether it works or not? Very appreciate for that. HOT 1
- ValueError: Overlapping IntervalIndex is not accepted. HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from peka.