Comments (7)
Hi Katherine,
The correct module builds a database from your GTF annotation input in order to compare the junctions within each read to known junctions from your annotations. It looks like the script is getting caught on a junction that has a negative chromosome coordinate value (kind of weird to see that). I've tried to recapitulate your error using the same annotation file you used in your run, but was unable to trigger this negative value error. When running correct there is a temporary folder created that contains all of the junction annotations in bed format. Could you have a look though those files to see if there are any bed entries with negative values? We can use that to figure out which GTF annotation is causing this issue. Thanks~~
-CMS
from flair.
Hi,
Thanks very much for getting back to me. I was surprised that you couldn't replicate the error so I downloaded the latest annotation file:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/004/195/GCF_000004195.3_Xenopus_tropicalis_v9.1/GCF_000004195.3_Xenopus_tropicalis_v9.1_genomic.gff.gz
And all is fine. I'm not sure what was wrong with my previous gtf. I did have a look at the temporary bed files and didn't see any negative values. Anyway, all is ok now.
Thanks again,
Katherine
from flair.
Hi Katherine,
Glad to hear the newly downloaded annotation worked out for you!
Just as a side note, I'd recommend using the GTF annotation from the ncbi ftp instead (that is what I conducted my testing on). GFF is formatted slightly different. It should work given the way we parse annotation files, but it hasn't been fully tested. I believe you can grab the GTF file from the same FTP URL you already used:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/004/195/GCF_000004195.3_Xenopus_tropicalis_v9.1/GCF_000004195.3_Xenopus_tropicalis_v9.1_genomic.gtf.gz
I'll go ahead and close this issue since the issue seems to have been resolved with your new file.
Thanks!
Best,
CMS
from flair.
Hello again,
I have just got back to working on this project and am having the same error as before using the gtf file you pointed to above.
I am running Flair correct on 12 samples, and 4 of them fail with the following error message:
Traceback (most recent call last):: 81%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 773/951 [00:51<00:10, 16.87it/s]
File "../flair/flair-1.3/bin/ssPrep.py", line 391, in <module>
main()
File "../flair/flair-1.3/bin/ssPrep.py", line 382, in main
intTree, ssData = buildIntervalTree(knownJuncs, wiggle)
File "../flair/flair-1.3/bin/ssPrep.py", line 342, in buildIntervalTree
x.add(c1S,c1E,c1)
File "src/kerneltree.pyx", line 22, in kerneltree.IntervalTree.add
OverflowError: can't convert negative value to unsigned long
Traceback (most recent call last):
File "../flair/flair-1.3/bin/ssCorrect.py", line 304, in <module>█████████████████████████████████████████████████████████████████████████████████| 951/951 [01:02<00:00, 16.77it/s]
main()
File "../flair/flair-1.3/bin/ssCorrect.py", line 287, in main
with open("%s_inconsistent.bed" % chrom,'rb') as fd:
FileNotFoundError: [Errno 2] No such file or directory: 'NW_016686139.1_inconsistent.bed'
usage: script.py chromsizes bedfile pslfile
usage: script.py psl ref.gtf/ref.gp isos_matched.psl
rm: cannot remove ‘BC06.flair.correct_all_corrected.unnamed.psl’: No such file or directory
I have tried to look into the error myself and it seems that it appears when there is an exon junction right at the beginning of a scaffold. e.g here the value c1 is 12, which then becomes negative when 15 (wiggle) is subtracted in buildIntervalTree in ssPrep.py:
KnownJuncs : NW_016686139.1_known_juncs.bed
c1: 12 c2: 160 Wiggle: 15
Traceback (most recent call last):
File "../flair/flair-1.3/bin/ssPrep.py", line 391, in <module>
main()
File "../flair/flair-1.3/bin/ssPrep.py", line 382, in main
intTree, ssData = buildIntervalTree(knownJuncs, wiggle)
File "../flair/flair-1.3/bin/ssPrep.py", line 342, in buildIntervalTree
x.add(c1S,c1E,c1)
File "src/kerneltree.pyx", line 22, in kerneltree.IntervalTree.add
OverflowError: can't convert negative value to unsigned long
This is the example bed file that's causing the problem...
$ cat NW_016686139.1_known_juncs.bed
NW_016686139.1 12 160 gtf . +
NW_016686139.1 241 2956 gtf . +
Thanks again for your help with this issue,
Katherine
from flair.
Hi knewl,
Thanks for following through with this error. This is an interesting situation that raises some questions about the alignment of junctions near the beginning of contigs/chromosomes. I initially would think to throw such splice sites out, given that 1-15 nucelotides is such a small window to align a noisy nanopore read against. I know from experience that minimap2 (older versions that i've worked with) has a difficult time aligning to exonic regions of lengths ~<20nt.
I'm not too familiar with the organization of the xenopus genome/transcriptomics, and i'm curious how often these contigs are utilized in analyses?
For now I've fixed the code to include these splice site edge cases as a possible splice site for correction. This means that the database of known splice sites for which your reads are corrected against will include such splice sites. Moreover, if a read contains these types of splice sites, then the read will be included in the final set of corrected reads.
Thanks~
-CMS
from flair.
Hi CMS,
Thank you very much for responding to this so quickly. I have re-run the correct stage and it worked. I'm quite new to the Xenopus genome so I'm not sure how often the these contigs with splice sites near the ends are useful in analyses. A new long read assembly would probably help a lot. In my experience, Minimap2 is able to handle short alignments - in fact, one of transcripts I'm interested in has a 15bp exon which is mapped to without problems.
Anyway, thanks again for making an excellent tool which I will use again and will recommend to others. There is a tiny bug in the collapse stage which I will create a separate issue for, but it worked once I removed a few lines of code and I now have a good transcriptome to work with.
Cheers,
Katherine
from flair.
I met this error too. There are so many gtf versions to chose that I cannot make sure which to use. My gtf works well with illumina short reads data, but fails to run flair.py correction. Can flair be adapted to avoid this error ? Otherwise, we must download various gtfs and try them one by one. And also, different gtfs would be used for illumina and nanopore data, which makes the inconsistence among different datasets.
from flair.
Related Issues (20)
- FLAIR logging is very hard to use
- FLAIR need obvious succes or failure message HOT 2
- accept BED9 as well as BED6 for splice junctions
- can't keep intermediate files when calling correct from flair HOT 1
- flair collapse creates invalid isoforms names that cause GTF conversion to fail
- `'run_id' is not defined` error while running `collapse-range` HOT 2
- diffsplice_fishers_exact generates bogus error if input file is missing HOT 1
- diff_iso_usage does not include header in output files
- merging of transcript ids and gene ids is frustrating to use HOT 2
- diff_iso_usage documentation does not clear describe what it does
- The results of my test_output are not consisitent with the oferred test_expected. Why? HOT 1
- flair correct crash with AttributeError: 'bool' object has no attribute 'find_overlap' HOT 1
- ssUtils.py gtfToSSBed crashes if gtf doesn't have transcript_id HOT 1
- flair correct doesn't exit non-zero when an invalid argument is supplied HOT 3
- Improve documentation of algorithm is readthedocs
- Make FLAIR source & packaging structure not weird
- Translate predictProductivity output to protein HOT 4
- Transcript read number
- Question regarding sequencing depth and collapse step HOT 1
- Issues implementing FLAIR2 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flair.