Giter Site home page Giter Site logo

Comments (11)

csoulette avatar csoulette commented on August 11, 2024

Hi shamsbhuiyan,

Can you please also include the commands you've used to run correct and collapse so that we can better assess why you're not seeing any gene names associated with your collapsed isoforms. Thanks!

-CMS

from flair.

shamsbhuiyan avatar shamsbhuiyan commented on August 11, 2024

Sure, I ran the code for flair correct on four different samples, I'll just paste the command for one of the samples:
python ~/flair/flair.py correct -c ~/rn6.chrom.sizes -f /space/grp/Pipelines/rnaseq-pipeline/Assemblies/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf -q Chip170.bed -t 20

After I concatanted the four flair_all_corrected.psl. Then I ran FLAIR collapse on the concatenated file. Here is my code for FLAIR collapse
python ~/flair/flair.py collapse -r /space/collaborator/upload/Chip176_G90_pass.fq,/space/collaborator/upload/Chip170_N10_pass.fq,/space/collaborator/upload/Chip160_G10_pass.fq,/space/collaborator/upload/Chip158_N90_pass.fq -q allSamplesIsoforms.psl -g /space/grp/Pipelines/rnaseq-pipeline/Assemblies/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa -n N -m ~/minimap2/ -t 60 -f /space/grp/Pipelines/rnaseq-pipeline/Assemblies/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf

from flair.

shamsbhuiyan avatar shamsbhuiyan commented on August 11, 2024

Also, I don't know if this helps - here's the image for cacna1c. From the flair collapse step, I see that 28 transcripts are annotated to this gene. I guess the problem is that these isoforms get annotated to cacna1c (which they should), but the isoforms that align to where cacna1h is do not get annotated to cacna1h (the image in my first post)
image

from flair.

belgravia avatar belgravia commented on August 11, 2024

Thanks for the info!

Can you send me two things: (1) the CACNA1C GTF entries or a link to where you downloaded your gtf and (2) the psl entry for at least one of your isoforms that should get annotated? If all the other steps seemed to work ok, I'd assume it's an error with the gtf. The gene name assignment was tested mainly on the gencode human gtfs and what the identify_gene_isoform.py script does is compare splice junctions used by the isoform with splice junctions in the gtf. Thanks for your patience and help :)

If you can, you may want to try running collapse with the -max_ends=1 to reduce the number of isoforms that are built with the same splice junction chains. Also, just to clarify: the first image you sent shows the isoforms that don't have a gene name assigned to them and the second image you sent shows the isoforms that did get a gene name assigned to them?

-Alison

from flair.

shamsbhuiyan avatar shamsbhuiyan commented on August 11, 2024

Yes, none of the isoforms are assigned to Cacna1h in the first image, whereas 28 isoforms are assigned to Cacna1c in the second image.
GTF file found here:
ftp://ftp.ensembl.org/pub/release-96/gtf/rattus_norvegicus

PSL lines for a Cacna1h isoform

0       0       0       0       0       0       0       0       -       6138b3a4-7679-409f-95f9-06204c85e90f;16_10:14731000     6748    0       6748    10      112626471       14731795        14788481        33      953,161,169,246,122,79,71,134,110,152,193,90,126,185,69,124,101,423,209,91,156,118,186,152,440,790,93,316,160,98,134,112,190,   0,1622,1948,2204,2538,3571,4817,5111,5409,5948,7490,7893,8086,8710,8988,9138,9382,9925,10848,11387,11665,11960,12507,13261,15479,16199,17918,18152,19800,21582,21851,22063,56496,       14731795,14733417,14733743,14733999,14734333,14735366,14736612,14736906,14737204,14737743,14739285,14739688,14739881,14740505,14740783,14740933,14741177,14741720,14742643,14743182,14743460,14743755,14744302,14745056,14747274,14747994,14749713,14749947,14751595,14753377,14753646,14753858,14788291,
0       0       0       0       0       0       0       0       -       6138b3a4-7679-409f-95f9-06204c85e90f;16-1_10:14731000   6748    0       6748    10      112626471       14732281        14788481        33      467,161,169,246,122,79,71,134,110,152,193,90,126,185,69,124,101,423,209,91,156,118,186,152,440,790,93,316,160,98,134,112,190,   0,1622,1948,2204,2538,3571,4817,5111,5409,5948,7490,7893,8086,8710,8988,9138,9382,9925,10848,11387,11665,11960,12507,13261,15479,16199,17918,18152,19800,21582,21851,22063,56496,       14732281,14733417,14733743,14733999,14734333,14735366,14736612,14736906,14737204,14737743,14739285,14739688,14739881,14740505,14740783,14740933,14741177,14741720,14742643,14743182,14743460,14743755,14744302,14745056,14747274,14747994,14749713,14749947,14751595,14753377,14753646,14753858,14788291,

from flair.

belgravia avatar belgravia commented on August 11, 2024

Hmm. Well I put those two lines in a file called shams.isos.psl and ran the following command that does the annotation:
python ~/flair/bin/identify_gene_isoform.py shams.isos.psl Rattus_norvegicus.Rnor_6.0.96.gtf shams.isos.annotated.psl and got

0 0 0 0 0 0 0 0 - 6138b3a4-7679-409f-95f9-06204c85e90f_ENSRNOG00000033893 6748 0 67410 112626471 14731795 14788481 33 953,161,169,246,122,79,71,134,110,152,193,90,126,185,69,124,101,423,209,91,156,118,186,152,440,790,93,316,160,98,134,112,190, 0,1622,1948,2204,2538,3571,4817,5111,5409,5948,7490,7893,8086,8710,8988,9138,9382,9925,10848,11387,11665,11960,12507,13261,15479,16199,17918,18152,19800,21582,21851,22063,56496, 14731795,14733417,14733743,14733999,14734333,14735366,14736612,14736906,14737204,14737743,14739285,14739688,14739881,14740505,14740783,14740933,14741177,14741720,14742643,14743182,14743460,14743755,14744302,14745056,14747274,14747994,14749713,14749947,14751595,14753377,14753646,14753858,14788291,
0 0 0 0 0 0 0 0 - 6138b3a4-7679-409f-95f9-06204c85e90f-1_ENSRNOG00000033893 6748 0 6748 10 112626471 14732281 14788481 33 467,161,169,246,122,79,71,134,110,152,193,90,126,185,69,124,101,423,209,91,156,118,186,152,440,790,93,316,160,98,134,112,190, 0,1622,1948,2204,2538,3571,4817,5111,5409,5948,7490,7893,8086,8710,8988,9138,9382,9925,10848,11387,11665,11960,12507,13261,15479,16199,17918,18152,19800,21582,21851,22063,56496, 14732281,14733417,14733743,14733999,14734333,14735366,14736612,14736906,14737204,14737743,14739285,14739688,14739881,14740505,14740783,14740933,14741177,14741720,14742643,14743182,14743460,14743755,14744302,14745056,14747274,14747994,14749713,14749947,14751595,14753377,14753646,14753858,14788291,

so it worked for me and was able to annotate the gene this time even though it didn't annotate it for you (identify_gene_isoform.py will put chromosome:position instead of a gene name if it can't find one).

I did make some minor changes to the identify_gene_isoform.py script yesterday, so I'd say next steps for you would be (1) update your flair scripts just to be sure and then try (2) running that script on all of your isoforms. Let me know how it goes.

-Alison

from flair.

shamsbhuiyan avatar shamsbhuiyan commented on August 11, 2024

Hey Alison,

Sorry for the late response. I will update my version of FLAIR and let you know if this fixes things

from flair.

shamsbhuiyan avatar shamsbhuiyan commented on August 11, 2024

Hi, so I re downloaded flair a couple of weeks ago. My collapse step runs but then fails here:

Renaming isoforms                                                                                                                                                   [31/1285]
Aligning reads to first-pass isoform reference
[M::mm_idx_gen::7.696*1.50] collected minimizers
[M::mm_idx_gen::8.277*2.59] sorted minimizers
[M::main::8.286*2.59] loaded/built the index for 63407 target sequence(s)
[M::mm_mapopt_update::8.320*2.58] mid_occ = 29116
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 63407
[M::mm_idx_stat::8.341*2.58] distinct minimizers: 1473546 (57.71% are singletons); average occurrences: 39.526; average spacing: 5.373
[M::worker_pipeline::1809.525*57.94] mapped 105582 sequences
[M::worker_pipeline::3518.154*58.68] mapped 105374 sequences
[ERROR] failed to write the results
Possible minimap2/samtools error, specify paths or make sure they are in $PATH

My collapse command is:

python /space/grp/bin/flair/flair.py collapse -r /space/collaborator/upload/Chip176_G90_pass
.fq,/space/collaborator/upload/Chip170_N10_pass.fq,/space/collaborator/upload/Chip160_G10_pass.fq,/space/collaborator/upload/Chip158_N90_pass.fq -q allSamples.psl -g /space/
grp/Pipelines/rnaseq-pipeline/Assemblies/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa -n N -m ~/minimap2/ -t 60 -f /space/grp/Pipelines/rnaseq-pipe
line/Assemblies/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf

from flair.

belgravia avatar belgravia commented on August 11, 2024

Ok, this has to do with where FLAIR puts temporary files. It uses a python module to generate a unique filename and it usually does this in /tmp or /scratch/tmp or /data/tmp etc and minimap2 tries to make a file in the temp folder that you don't have permissions for.

I have updated flair-collapse so that you can specify a directory for where the temporary files should be put. So you can add --temp_dir "./" to your command and that will use your working directory as the temporary directory instead.

-Alison

from flair.

shamsbhuiyan avatar shamsbhuiyan commented on August 11, 2024

Hi Alison, sorry for the late response again. I reran FLAIR again, this time with a recent build of FLAIR, along with splice junction data from short reads, but the exact same oxford nanopore data.

So when I made this post, FLAIR collapse produced about ~7000 isoforms. With updated FLAIR, I get ~52,000 isoforms in after FLAIR collapse. To ground it back to the Cacna1h example I had in my first post, this what the isoforms for Cacna1h looks like now:

image

This was the command I used to run FLAIR collapse:
python ~/flair/flair.py collapse -r /space/collaborator/upload/Chip176_G90_pass.fq,/space/collaborator/upload/Chip170_N10_pass.fq,/space/collaborator/upload/Chip160_G10_pass.fq,/space/collaborator/upload/Chip158_N90_pass.fq -q allSamples.psl -g /space/grp/Pipelines/rnaseq-pipeline/Assemblies/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa -n N -m ~/minimap2/ -t 60 -f /space/grp/Pipelines/rnaseq-pipeline/Assemblies/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf --temp_dir "./"

Unlike my original post, this run of FLAIR included short read data at the FLAIR correction step. Could this explain the strange new numbers of isoforms or the very short isoforms on the screenshot? Also, I noticed that this time, when I ran flair collapse, the step took about a week to run! Does adding short read data really slow things down so much?

from flair.

shamsbhuiyan avatar shamsbhuiyan commented on August 11, 2024

Hey, so I think its my short read data messing up FLAIR. I'll open up a new issue to describe what's going exactly. Without the short reads splice junctions, the isoforms do get named correctly and the number of isoforms does make sense.

from flair.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.