Comments (11)
Hi shamsbhuiyan,
Can you please also include the commands you've used to run correct and collapse so that we can better assess why you're not seeing any gene names associated with your collapsed isoforms. Thanks!
-CMS
from flair.
Sure, I ran the code for flair correct on four different samples, I'll just paste the command for one of the samples:
python ~/flair/flair.py correct -c ~/rn6.chrom.sizes -f /space/grp/Pipelines/rnaseq-pipeline/Assemblies/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf -q Chip170.bed -t 20
After I concatanted the four flair_all_corrected.psl. Then I ran FLAIR collapse on the concatenated file. Here is my code for FLAIR collapse
python ~/flair/flair.py collapse -r /space/collaborator/upload/Chip176_G90_pass.fq,/space/collaborator/upload/Chip170_N10_pass.fq,/space/collaborator/upload/Chip160_G10_pass.fq,/space/collaborator/upload/Chip158_N90_pass.fq -q allSamplesIsoforms.psl -g /space/grp/Pipelines/rnaseq-pipeline/Assemblies/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa -n N -m ~/minimap2/ -t 60 -f /space/grp/Pipelines/rnaseq-pipeline/Assemblies/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf
from flair.
Also, I don't know if this helps - here's the image for cacna1c. From the flair collapse step, I see that 28 transcripts are annotated to this gene. I guess the problem is that these isoforms get annotated to cacna1c (which they should), but the isoforms that align to where cacna1h is do not get annotated to cacna1h (the image in my first post)
from flair.
Thanks for the info!
Can you send me two things: (1) the CACNA1C GTF entries or a link to where you downloaded your gtf and (2) the psl entry for at least one of your isoforms that should get annotated? If all the other steps seemed to work ok, I'd assume it's an error with the gtf. The gene name assignment was tested mainly on the gencode human gtfs and what the identify_gene_isoform.py script does is compare splice junctions used by the isoform with splice junctions in the gtf. Thanks for your patience and help :)
If you can, you may want to try running collapse with the -max_ends=1
to reduce the number of isoforms that are built with the same splice junction chains. Also, just to clarify: the first image you sent shows the isoforms that don't have a gene name assigned to them and the second image you sent shows the isoforms that did get a gene name assigned to them?
-Alison
from flair.
Yes, none of the isoforms are assigned to Cacna1h in the first image, whereas 28 isoforms are assigned to Cacna1c in the second image.
GTF file found here:
ftp://ftp.ensembl.org/pub/release-96/gtf/rattus_norvegicus
PSL lines for a Cacna1h isoform
0 0 0 0 0 0 0 0 - 6138b3a4-7679-409f-95f9-06204c85e90f;16_10:14731000 6748 0 6748 10 112626471 14731795 14788481 33 953,161,169,246,122,79,71,134,110,152,193,90,126,185,69,124,101,423,209,91,156,118,186,152,440,790,93,316,160,98,134,112,190, 0,1622,1948,2204,2538,3571,4817,5111,5409,5948,7490,7893,8086,8710,8988,9138,9382,9925,10848,11387,11665,11960,12507,13261,15479,16199,17918,18152,19800,21582,21851,22063,56496, 14731795,14733417,14733743,14733999,14734333,14735366,14736612,14736906,14737204,14737743,14739285,14739688,14739881,14740505,14740783,14740933,14741177,14741720,14742643,14743182,14743460,14743755,14744302,14745056,14747274,14747994,14749713,14749947,14751595,14753377,14753646,14753858,14788291,
0 0 0 0 0 0 0 0 - 6138b3a4-7679-409f-95f9-06204c85e90f;16-1_10:14731000 6748 0 6748 10 112626471 14732281 14788481 33 467,161,169,246,122,79,71,134,110,152,193,90,126,185,69,124,101,423,209,91,156,118,186,152,440,790,93,316,160,98,134,112,190, 0,1622,1948,2204,2538,3571,4817,5111,5409,5948,7490,7893,8086,8710,8988,9138,9382,9925,10848,11387,11665,11960,12507,13261,15479,16199,17918,18152,19800,21582,21851,22063,56496, 14732281,14733417,14733743,14733999,14734333,14735366,14736612,14736906,14737204,14737743,14739285,14739688,14739881,14740505,14740783,14740933,14741177,14741720,14742643,14743182,14743460,14743755,14744302,14745056,14747274,14747994,14749713,14749947,14751595,14753377,14753646,14753858,14788291,
from flair.
Hmm. Well I put those two lines in a file called shams.isos.psl and ran the following command that does the annotation:
python ~/flair/bin/identify_gene_isoform.py shams.isos.psl Rattus_norvegicus.Rnor_6.0.96.gtf shams.isos.annotated.psl
and got
0 0 0 0 0 0 0 0 - 6138b3a4-7679-409f-95f9-06204c85e90f_ENSRNOG00000033893 6748 0 67410 112626471 14731795 14788481 33 953,161,169,246,122,79,71,134,110,152,193,90,126,185,69,124,101,423,209,91,156,118,186,152,440,790,93,316,160,98,134,112,190, 0,1622,1948,2204,2538,3571,4817,5111,5409,5948,7490,7893,8086,8710,8988,9138,9382,9925,10848,11387,11665,11960,12507,13261,15479,16199,17918,18152,19800,21582,21851,22063,56496, 14731795,14733417,14733743,14733999,14734333,14735366,14736612,14736906,14737204,14737743,14739285,14739688,14739881,14740505,14740783,14740933,14741177,14741720,14742643,14743182,14743460,14743755,14744302,14745056,14747274,14747994,14749713,14749947,14751595,14753377,14753646,14753858,14788291,
0 0 0 0 0 0 0 0 - 6138b3a4-7679-409f-95f9-06204c85e90f-1_ENSRNOG00000033893 6748 0 6748 10 112626471 14732281 14788481 33 467,161,169,246,122,79,71,134,110,152,193,90,126,185,69,124,101,423,209,91,156,118,186,152,440,790,93,316,160,98,134,112,190, 0,1622,1948,2204,2538,3571,4817,5111,5409,5948,7490,7893,8086,8710,8988,9138,9382,9925,10848,11387,11665,11960,12507,13261,15479,16199,17918,18152,19800,21582,21851,22063,56496, 14732281,14733417,14733743,14733999,14734333,14735366,14736612,14736906,14737204,14737743,14739285,14739688,14739881,14740505,14740783,14740933,14741177,14741720,14742643,14743182,14743460,14743755,14744302,14745056,14747274,14747994,14749713,14749947,14751595,14753377,14753646,14753858,14788291,
so it worked for me and was able to annotate the gene this time even though it didn't annotate it for you (identify_gene_isoform.py will put chromosome:position
instead of a gene name if it can't find one).
I did make some minor changes to the identify_gene_isoform.py script yesterday, so I'd say next steps for you would be (1) update your flair scripts just to be sure and then try (2) running that script on all of your isoforms. Let me know how it goes.
-Alison
from flair.
Hey Alison,
Sorry for the late response. I will update my version of FLAIR and let you know if this fixes things
from flair.
Hi, so I re downloaded flair a couple of weeks ago. My collapse step runs but then fails here:
Renaming isoforms [31/1285]
Aligning reads to first-pass isoform reference
[M::mm_idx_gen::7.696*1.50] collected minimizers
[M::mm_idx_gen::8.277*2.59] sorted minimizers
[M::main::8.286*2.59] loaded/built the index for 63407 target sequence(s)
[M::mm_mapopt_update::8.320*2.58] mid_occ = 29116
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 63407
[M::mm_idx_stat::8.341*2.58] distinct minimizers: 1473546 (57.71% are singletons); average occurrences: 39.526; average spacing: 5.373
[M::worker_pipeline::1809.525*57.94] mapped 105582 sequences
[M::worker_pipeline::3518.154*58.68] mapped 105374 sequences
[ERROR] failed to write the results
Possible minimap2/samtools error, specify paths or make sure they are in $PATH
My collapse command is:
python /space/grp/bin/flair/flair.py collapse -r /space/collaborator/upload/Chip176_G90_pass
.fq,/space/collaborator/upload/Chip170_N10_pass.fq,/space/collaborator/upload/Chip160_G10_pass.fq,/space/collaborator/upload/Chip158_N90_pass.fq -q allSamples.psl -g /space/
grp/Pipelines/rnaseq-pipeline/Assemblies/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa -n N -m ~/minimap2/ -t 60 -f /space/grp/Pipelines/rnaseq-pipe
line/Assemblies/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf
from flair.
Ok, this has to do with where FLAIR puts temporary files. It uses a python module to generate a unique filename and it usually does this in /tmp or /scratch/tmp or /data/tmp etc and minimap2 tries to make a file in the temp folder that you don't have permissions for.
I have updated flair-collapse so that you can specify a directory for where the temporary files should be put. So you can add --temp_dir "./"
to your command and that will use your working directory as the temporary directory instead.
-Alison
from flair.
Hi Alison, sorry for the late response again. I reran FLAIR again, this time with a recent build of FLAIR, along with splice junction data from short reads, but the exact same oxford nanopore data.
So when I made this post, FLAIR collapse produced about ~7000 isoforms. With updated FLAIR, I get ~52,000 isoforms in after FLAIR collapse. To ground it back to the Cacna1h example I had in my first post, this what the isoforms for Cacna1h looks like now:
This was the command I used to run FLAIR collapse:
python ~/flair/flair.py collapse -r /space/collaborator/upload/Chip176_G90_pass.fq,/space/collaborator/upload/Chip170_N10_pass.fq,/space/collaborator/upload/Chip160_G10_pass.fq,/space/collaborator/upload/Chip158_N90_pass.fq -q allSamples.psl -g /space/grp/Pipelines/rnaseq-pipeline/Assemblies/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa -n N -m ~/minimap2/ -t 60 -f /space/grp/Pipelines/rnaseq-pipeline/Assemblies/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf --temp_dir "./"
Unlike my original post, this run of FLAIR included short read data at the FLAIR correction step. Could this explain the strange new numbers of isoforms or the very short isoforms on the screenshot? Also, I noticed that this time, when I ran flair collapse, the step took about a week to run! Does adding short read data really slow things down so much?
from flair.
Hey, so I think its my short read data messing up FLAIR. I'll open up a new issue to describe what's going exactly. Without the short reads splice junctions, the isoforms do get named correctly and the number of isoforms does make sense.
from flair.
Related Issues (20)
- Set isoform abundance cutoff to FLAIR quantify HOT 1
- Crash running flair collapse HOT 8
- Gene model naming issue HOT 1
- error in flair quantify HOT 9
- add --split-prefix to collapse minimap command
- Flair collapse issue HOT 13
- flair collapse--NameError: name 'blockcount' is not defined HOT 4
- How to count the isoforms corresponding to each gene. HOT 2
- Incorrectly name flair align options is confusing
- allow flair correct to specify temporary directory HOT 1
- How to concatenate bed files before flair collapse HOT 2
- The naming of my isoforms is different from what is mentioned in your manual.
- Hello, after splitting the BED file by chromosomes, we found that a single chromosome's BED file is 1.03G, which is larger than 1G. Can Flair Collapse currently handle a BED file of 1.03G? HOT 1
- psl_to_sequence command
- flair correct shows FileNotFoundError HOT 1
- flair quantify problem error HOT 3
- ValueError in filter_isoforms_by_proportion_of_gene_expr.py
- Collapse step didn't output a gtf file without a reference annotation
- FLAIR missing isoforms HOT 1
- Process termination at Step 5/5 in flair correct without error messages HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flair.