Comments (21)
Hi Alison,
@amira glad you were able to fix it. That behavior is expected. We throw out the junctions that don't have a strand because we use strand information when determining whether a read is corrected or inconsistent.
Just to conclude on my side, I realized that the short reads junctions I use for the correction did not have a strand information (no XS tag). So, I redid the mapping to get this extra tag and was able to run FLAIR with no error. (the "hack" I implemented was forcing the correction regardless of the strand info, there was no need for it).
Comments here have been very helpful, thanks a lot!
Best,
Amira
from flair.
Hi Dharm,
Since you have the raw reads aligned ok, have you already looked into the corrected reads bed or psl file? Are the full length reads present there? Also, your commands look fine and thanks for your patience as we work together to figure this out!
-Alison
from flair.
Hello Alison,
As per your suggestion, I looked into .psl and .bed aligned file from this nanopore reads alignment file .bed and .psl output, the result I get out of flair align:
python /opt/flair/flair.py align -r all_4.fastq -g Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL5.fa -m /opt/minimap2-2.14_x64-linux/minimap2 -o flair.aligned.PL5. -t 40 -p -v1.3
I can see long isoforms for GAPDH and PL5 regions genes in both the .psl and .bed file the images are attached for your reference:
The following is .psl file alignment for GAPDH long isoforms:
The following is .psl file alignment for PL5 regions several genes long isoforms:
The following is .bed file alignment for GAPDH long isoforms:
The following is .bed file alignment for PL5 region long isoforms:
Please advice, what I need to change in parameter to avoid missing long isoforms in collapse steps output fasta files.
Thanks,
With Regards,
Dharm
from flair.
To clarify, the igv shots you are showing me are aligned reads, right? I think you should also check the *all_corrected.bed file (or the *all_corrected.psl file, they're equivalent). It's possible that the flair-correct step is removing the long reads.
-Alison
from flair.
Yes, you are absolutely right, the above is the first step of aligned files. As per your suggestion, I also looked into the **all_corrected. file which shows that long isoforms are broken and not connected and the way they don't count as single long isoform. The images are attached for your reference. Please let me know what step can be taken to avoid this:
Bed file with GAPDH same as above:
Why the corrected.bed and .psl missing the long isoforms?
Thanks,
With Regards,
Dharm
from flair.
The purpose of flair-correct is to correct spurious splice junctions in noisy reads to splice junctions we're more confident in (i.e. annotations, short-reads), and if the junction can't be corrected then the read is removed. This informs me that there may be an issue with how the script is handling splice junctions from your GTF. Can you send me a link to where you downloaded your gtf from? We might have to add some code to cover cases like this. Thanks for the cooperation :)
from flair.
I am working on pig cells RNAseq using Nanopore long read technology. The original GTF file is downloaded from ftp://ftp.ensembl.org/pub/release-95/gtf/sus_scrofa/Sus_scrofa.Sscrofa11.1.95.gtf.gz for some of the transgenes we are trying to express in Payload 5 regions are added as additional chromosomes in GTF files
You can check on the GAPDH gene which would be more helpful.
Thanks
from flair.
Hello Alison,
Just wondering if you had time to look into the above GTF file to see it has some problems that may lead to shorter isoforms and count.
Thanks
from flair.
Hi Dharmendra,
We've had a bit of trouble recapitulating the weird filtering issue you've raised in this thread.
I've taken nanopore reads derived from the GAPDH locus from a human sequencing experiment, and successfully converted them into isoforms using the Sus_scrofa.Sscrofa11.1.dna.toplevel.fa
genome and associated annotation file. When issues of unnecessary filtering occurs during the correction step it means that the splice sites for each cannot be corrected. This is usually due to issues when building the splice site database in which we use to query each read against.
I was wondering if it would be possible to share some of the reads that are being filtered so that we can try to recapitulate the issue ourselves. Perhaps it would be possible to share reads from a locus that is being heavily filtered but is not important to your studies?
Thanks~
-CMS
from flair.
Hello CMS,
Thanks for comments and working on this issue, as per your request here are selected reads for GAPDH regions which missed on long isoforms:
long_reads_selected_for_GAPDH_region_marked_duplicates.bam.gz
Hope this will help us resolve this issue.
Thanks
from flair.
So I took the reads you provided, made a bed, and ran: python flair.py correct -q long_reads_selected_for_GAPDH_region_marked_duplicates.bed -f Sus_scrofa.Sscrofa11.1.95.gtf -c pig.chromsizes
.
Here is the *_all_corrected.bed reads that I got: https://genome.ucsc.edu/s/atang14/susScr11
You'll notice that it has the single-exon reads in your screenshots, but it also looks like many of the multi-exonic reads are getting corrected and kept. Seeing as we're running essentially the same commands on the same data and getting different results, I'm guessing it's something about the environment. Maybe the script didn't finish running? Did it output any errors?
-Alison
from flair.
No I didn't got an error message. I repeated this, I haven't got any error message. I also tried with the GAPDH region selected reads which I forwarded you previously and I am still missing long reads.
from flair.
Do you have a dockerize version of Flair, I can use that to avoid environmental problems I may be facing and missing the longer isoforms as I am not getting any error messages when I run the Flair on our end.
Thanks
from flair.
Hi Dharm,
We're working on a Docker now to hopefully solve whatever issue is occurring. In the meantime, could you send me your annotation file that you've been using with extra PL5 entries? Since we were able to run the correction step successfully in our hands with your reads, with the only difference in our commands being the annotation file, I'd like to try running the correction step with your reads/annotation to try and debug further.
-Alison
from flair.
Hi Alison,
I am contributing to this thread because I think I am facing the same problem: FLAIR runs without errors but does not report some isoforms that we can clearly see after mapping. As mentioned above, it looks like it's due to the corrections step: when a read does not overlap a known/annotated junction is it removed.
I work on transposable elements in Arabidopsis Thaliana and most of their genes are poorly annotated. It there anyway to keep these reads and correct them only using short reads?
Thank you!
Best,
Amira
from flair.
Hi Amira and Dharm,
We have made a docker for FLAIR. You can use it like so:
docker pull quay.io/brookslab/flair
docker run -w /usr/data -v [your_path_to_data]:/usr/data -t -d [image_id]
docker exec [container_id] python3 /usr/local/flair/flair.py align [rest_of_your_command]
If the docker gives the same results (no long reads after flair-correct) then we'd have to take a closer look at the data/annotation. Dharm I know you're using a custom annotation file, so maybe that might be causing issues for FLAIR since in our hands the portion of data that you sent look ok?
Amira, we are working on correcting with only short reads specified if you don't have an annotation.
-Alison
from flair.
Hi Alison,
I am contributing to this thread because I think I am facing the same problem: FLAIR runs without errors but does not report some isoforms that we can clearly see after mapping. As mentioned above, it looks like it's due to the corrections step: when a read does not overlap a known/annotated junction is it removed.
I work on transposable elements in Arabidopsis Thaliana and most of their genes are poorly annotated. It there anyway to keep these reads and correct them only using short reads?
Thank you!
Best,
Amira
Hi Amira,
You will find the latest version of ssCorrect will now allow you to run correction without GTF annotations. Let me know if there are any issues. Thanks ~
Best,
CMS
from flair.
Hi Alison and CMS,
My apologies for the late reply and thank you for the fix!
I updated FLAIR and reran the correction step. I first ran the command without the GTF file assuming that is was not mandatory anymore, but it's still is. So I tried with the GTF file, here's my command:
python $SOURCE/flair.py correct -f $TAIR10GTF -c $CHRLEN -q $queryONTreads -j $shortReadsJunctions -o $OUTDIR -t 30
I checked the correction output and some isoforms are still missing . Here are some examples:
grey track: *all_corrected.bed
green track: *all_inconsistent.bed
For me, the inconsistent isoforms here (green) are the ones that should be reported as consistent.
Could you please look a bit more into it? I can send you the GTF file I am working with if it helps.
Many thanks in advance,
Best
Amira
from flair.
Thanks for your patience Amira.
We have made the fixes necessary so that the gtf argument is optional, and at least one of a gtf file or short read junction file needs to be specified. Please try again with only your short read junctions and let us know how that goes.
We have also slightly altered the syntax for the flair-correct command such that a genome sequence fasta file is required. The genome file must also be indexed (you can run samtools faidx yourgenome.fa
to generate the .fai). So your command might look something like python $SOURCE/flair.py correct -c $CHRLEN -q $queryONTreads -j $shortReadsJunctions -o $OUTDIR -t 30 -g genome.fa
now.
-Alison
from flair.
Hi Alison,
Thanks again for the fixes!
I updated FLAIR and tried again by running the command as you suggested. I got the following error:
> python $SOURCE/flair.py correct -c $CHRLEN -q $queryONTreads -j $shortReadsJunctions -o $OUTDIR -t 30 -g $GENOME
Step 2/5: Processing additional junction file /kingdoms/a2e/workspace2/kramdi/ATAC-1_A2016/ONT_analysis/detect_isoforms/FLAIR/extract-spliceJunction-shortReads/shortReadsA2E_junctions.nreads5.bed ...
No junctions from GTF or junctionsBed to correct with. Exiting...
Correction command did not exit with success status
I looked a bit into the code and it seems that, in the script ssCorrect.py
, the case where the strand is "0" (unknown) in the junction file is ignored and only the values "1" and "2" are converted to "+" and "-".
I edited my local version and was able to run the script with no error. The consistent isoforms are now the one I expect visually. Here are the new results on the examples showed in my previous post:
Bleu track: consistent isoforms
Green track: inconsistent isoforms
Could you please check that part of the script ssCorrect.py
that generates the error and update the repo?
I'll carry on with the rest of the steps (collapse and quantify). So far, it looks pretty good! So thanks a lot!
Best,
Amira
from flair.
Hi @Dharmendra-G-1 , you may want to check out issue #34. They had the same issue with longer reads getting removed at the correction step, and it was due to the kind of genome annotation file format they were using. We've fixed that issue now, and since the only thing that was different between your failed and my successful FLAIR runs on the data you provided was the annotation, it's possible that your issues may be fixed. A little late, but just I'd let you know :)
@amira glad you were able to fix it. That behavior is expected. We throw out the junctions that don't have a strand because we use strand information when determining whether a read is corrected or inconsistent.
-Alison
from flair.
Related Issues (20)
- Set isoform abundance cutoff to FLAIR quantify HOT 1
- Crash running flair collapse HOT 8
- Gene model naming issue HOT 1
- error in flair quantify HOT 9
- add --split-prefix to collapse minimap command
- Flair collapse issue HOT 13
- flair collapse--NameError: name 'blockcount' is not defined HOT 4
- How to count the isoforms corresponding to each gene. HOT 2
- Incorrectly name flair align options is confusing
- allow flair correct to specify temporary directory HOT 1
- How to concatenate bed files before flair collapse HOT 2
- The naming of my isoforms is different from what is mentioned in your manual.
- Hello, after splitting the BED file by chromosomes, we found that a single chromosome's BED file is 1.03G, which is larger than 1G. Can Flair Collapse currently handle a BED file of 1.03G? HOT 1
- psl_to_sequence command
- flair correct shows FileNotFoundError HOT 1
- flair quantify problem error HOT 3
- ValueError in filter_isoforms_by_proportion_of_gene_expr.py
- Collapse step didn't output a gtf file without a reference annotation
- FLAIR missing isoforms HOT 1
- Process termination at Step 5/5 in flair correct without error messages HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flair.