Giter Site home page Giter Site logo

PEKA on intronless genomes about peka HOT 6 CLOSED

ulelab avatar ulelab commented on August 26, 2024
PEKA on intronless genomes

from peka.

Comments (6)

sykorami avatar sykorami commented on August 26, 2024

If the GTF contains at least "intron, intergenic and CDS" in the 3rd collumn, PEKA moves on. However, after detection of noxn and ntxn I am now stuck with another error message:

Traceback (most recent call last):
File "peka.py", line 1461, in
main()
File "peka.py", line 1457, in main
set_seeds
File "peka.py", line 1125, in run
reference, genome, genome_chr_sizes, window + kmer_length, window + kmer_length, merge_overlaps=False
File "peka.py", line 543, in get_sequences
return [line.split("\t")[1].strip() for line in open(seq_tab.seqfn)]
File "peka.py", line 543, in
return [line.split("\t")[1].strip() for line in open(seq_tab.seqfn)]
IndexError: list index out of range

from peka.

kkuret avatar kkuret commented on August 26, 2024

Hi!
Indeed having a GTF without all the specified regions, causes issues in PEKA, as it will try to generate results for composite regions, such as "whole_gene", which comprises of introns, CDS and UTRs.
The second error is related to sequence extraction from the genome, but I don't know why it occurred - I suppose it occurred in the intronic region? Can you please explain how you added introns into the GTF and whether you also added crosslinks to intronic sites? Could you please paste the full output that was printed before the second error? If you share your input files with me (you can upload them to dropbox or google drive and send a link to [email protected]), I will try to reproduce the error and add a fix.
Thanks for your report!

from peka.

sykorami avatar sykorami commented on August 26, 2024

Hi!
I have changed the last intergenic region to intron region in the GTF just to have the intron requirement fullfilled. I want this to run on the whole genome (as UTRs are not annotated on the GTF). I have shared the input files with you in here.

The error message prior the traceback was:
Getting thresholded crosslinks
Thresholding intron
lenght of df_reg for intron is: 96
Thresholding intron runtime: 0.00 min
Thresholding intergenic
lenght of df_reg for intergenic is: 121112
Thresholding intergenic runtime: 0.11 min
Thresholding cds_utr_ncrna
lenght of df_reg for cds_utr_ncrna is: 439304
Thresholding cds_utr_ncrna runtime: 0.32 min
Thresholding runtime: 0.43 min for 125120 thresholded crosslinks
559762 total sites. All sites taging runtime: 0.42 min
125120 thresholded sites on genome
559762 all sites on genome
noxn 411358 on genome
ntxn 55340 on genome

from peka.

kkuret avatar kkuret commented on August 26, 2024

Thank you! I'll have a look

from peka.

kkuret avatar kkuret commented on August 26, 2024

HI!
I found the cause for the second error - your fasta file was formatted as ASCII text, with CRLF line terminators. This means that the sequence was wrapped with nonstandard a line separator "^M$", like so:

SaciDSM639^M$
GGGGAGTTAAATGACTGTAATCCCTATTTCAAAATAGTTTGAGGTAACACCATTGGAAATTATTGTAAAC^M$
TAAAATAACGTATGAAACATATGTATTCATGTGGCAATTAGGGAAACGCTGAAAGGTGGCAAAGGGGAGG^M$

This formatting causes a faulty output in bedtools getfasta and this is why PEKA errors when it tries to process it.
The error was resolved by changing your fasta file to UNIX format, by running
dox2unix your-fasta-file

Please, try changing your fasta file to UNIX format, run PEKA with it, and let me know if it resolved your issue.

As for the first error you reported regarding the need to have all regions annotated in the GTF, I'll implement a fix in the future, so there won't be a need to make "hackish" adjustments to input files.

from peka.

sykorami avatar sykorami commented on August 26, 2024

Hi Klara!
Seems like the formatting of the fasta file did the job. Thanx a lot!

from peka.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.