Giter Site home page Giter Site logo

Error using Flair correct about flair HOT 7 CLOSED

brookslabucsc avatar brookslabucsc commented on August 11, 2024
Error using Flair correct

from flair.

Comments (7)

csoulette avatar csoulette commented on August 11, 2024

Hi Katherine,

The correct module builds a database from your GTF annotation input in order to compare the junctions within each read to known junctions from your annotations. It looks like the script is getting caught on a junction that has a negative chromosome coordinate value (kind of weird to see that). I've tried to recapitulate your error using the same annotation file you used in your run, but was unable to trigger this negative value error. When running correct there is a temporary folder created that contains all of the junction annotations in bed format. Could you have a look though those files to see if there are any bed entries with negative values? We can use that to figure out which GTF annotation is causing this issue. Thanks~~

-CMS

from flair.

knewl avatar knewl commented on August 11, 2024

Hi,

Thanks very much for getting back to me. I was surprised that you couldn't replicate the error so I downloaded the latest annotation file:

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/004/195/GCF_000004195.3_Xenopus_tropicalis_v9.1/GCF_000004195.3_Xenopus_tropicalis_v9.1_genomic.gff.gz

And all is fine. I'm not sure what was wrong with my previous gtf. I did have a look at the temporary bed files and didn't see any negative values. Anyway, all is ok now.

Thanks again,

Katherine

from flair.

csoulette avatar csoulette commented on August 11, 2024

Hi Katherine,

Glad to hear the newly downloaded annotation worked out for you!

Just as a side note, I'd recommend using the GTF annotation from the ncbi ftp instead (that is what I conducted my testing on). GFF is formatted slightly different. It should work given the way we parse annotation files, but it hasn't been fully tested. I believe you can grab the GTF file from the same FTP URL you already used:

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/004/195/GCF_000004195.3_Xenopus_tropicalis_v9.1/GCF_000004195.3_Xenopus_tropicalis_v9.1_genomic.gtf.gz

I'll go ahead and close this issue since the issue seems to have been resolved with your new file.

Thanks!

Best,
CMS

from flair.

knewl avatar knewl commented on August 11, 2024

Hello again,

I have just got back to working on this project and am having the same error as before using the gtf file you pointed to above.

I am running Flair correct on 12 samples, and 4 of them fail with the following error message:

Traceback (most recent call last)::  81%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                          | 773/951 [00:51<00:10, 16.87it/s]
  File "../flair/flair-1.3/bin/ssPrep.py", line 391, in <module>
    main()
  File "../flair/flair-1.3/bin/ssPrep.py", line 382, in main
    intTree, ssData = buildIntervalTree(knownJuncs, wiggle)
  File "../flair/flair-1.3/bin/ssPrep.py", line 342, in buildIntervalTree
    x.add(c1S,c1E,c1)
  File "src/kerneltree.pyx", line 22, in kerneltree.IntervalTree.add
OverflowError: can't convert negative value to unsigned long
Traceback (most recent call last):
  File "../flair/flair-1.3/bin/ssCorrect.py", line 304, in <module>█████████████████████████████████████████████████████████████████████████████████| 951/951 [01:02<00:00, 16.77it/s]
    main()
  File "../flair/flair-1.3/bin/ssCorrect.py", line 287, in main
    with open("%s_inconsistent.bed" % chrom,'rb') as fd:
FileNotFoundError: [Errno 2] No such file or directory: 'NW_016686139.1_inconsistent.bed'
usage: script.py chromsizes bedfile pslfile
usage: script.py psl ref.gtf/ref.gp isos_matched.psl 
rm: cannot remove ‘BC06.flair.correct_all_corrected.unnamed.psl’: No such file or directory

I have tried to look into the error myself and it seems that it appears when there is an exon junction right at the beginning of a scaffold. e.g here the value c1 is 12, which then becomes negative when 15 (wiggle) is subtracted in buildIntervalTree in ssPrep.py:

KnownJuncs : NW_016686139.1_known_juncs.bed
c1:  12 c2:  160 Wiggle:  15

Traceback (most recent call last):
  File "../flair/flair-1.3/bin/ssPrep.py", line 391, in <module>
    main()
  File "../flair/flair-1.3/bin/ssPrep.py", line 382, in main
    intTree, ssData = buildIntervalTree(knownJuncs, wiggle)
  File "../flair/flair-1.3/bin/ssPrep.py", line 342, in buildIntervalTree
    x.add(c1S,c1E,c1)
  File "src/kerneltree.pyx", line 22, in kerneltree.IntervalTree.add
OverflowError: can't convert negative value to unsigned long

This is the example bed file that's causing the problem...

$ cat  NW_016686139.1_known_juncs.bed
NW_016686139.1	12	160	gtf	.	+
NW_016686139.1	241	2956	gtf	.	+

Thanks again for your help with this issue,

Katherine

from flair.

csoulette avatar csoulette commented on August 11, 2024

Hi knewl,

Thanks for following through with this error. This is an interesting situation that raises some questions about the alignment of junctions near the beginning of contigs/chromosomes. I initially would think to throw such splice sites out, given that 1-15 nucelotides is such a small window to align a noisy nanopore read against. I know from experience that minimap2 (older versions that i've worked with) has a difficult time aligning to exonic regions of lengths ~<20nt.

I'm not too familiar with the organization of the xenopus genome/transcriptomics, and i'm curious how often these contigs are utilized in analyses?

For now I've fixed the code to include these splice site edge cases as a possible splice site for correction. This means that the database of known splice sites for which your reads are corrected against will include such splice sites. Moreover, if a read contains these types of splice sites, then the read will be included in the final set of corrected reads.

Thanks~

-CMS

from flair.

knewl avatar knewl commented on August 11, 2024

Hi CMS,

Thank you very much for responding to this so quickly. I have re-run the correct stage and it worked. I'm quite new to the Xenopus genome so I'm not sure how often the these contigs with splice sites near the ends are useful in analyses. A new long read assembly would probably help a lot. In my experience, Minimap2 is able to handle short alignments - in fact, one of transcripts I'm interested in has a 15bp exon which is mapped to without problems.

Anyway, thanks again for making an excellent tool which I will use again and will recommend to others. There is a tiny bug in the collapse stage which I will create a separate issue for, but it worked once I removed a few lines of code and I now have a good transcriptome to work with.

Cheers,

Katherine

from flair.

wangjiawen2013 avatar wangjiawen2013 commented on August 11, 2024

I met this error too. There are so many gtf versions to chose that I cannot make sure which to use. My gtf works well with illumina short reads data, but fails to run flair.py correction. Can flair be adapted to avoid this error ? Otherwise, we must download various gtfs and try them one by one. And also, different gtfs would be used for illumina and nanopore data, which makes the inconsistence among different datasets.

from flair.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.