Hello,
I am following your step to consolidate and get the count of isoforms: I believe we are missing lot of isoform during collapsing step.
The step I have taken to follow your protocol are as follows:
flair align:
python /opt/flair/flair.py align -r all_4.fastq -g Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL5.fa -m /opt/minimap2-2.14_x64-linux/minimap2 -o flair.aligned.PL5. -t 40 -p -v1.3
flair correct:
python /opt/flair/bin/bam2Bed12.py -i flair.aligned.PL5.bam > flair.aligned.PL5.bam.bed12 python /opt/flair/flair.py correct -f Sus_scrofa.Sscrofa11.1.95_with_PL5.gtf -c chromsizes_PL5.tsv -q flair.aligned.PL5.bam.bed12 -t 40
flair collapse:
python /opt/flair/flair.py collapse -r all_4.fastq -q flair_all_corrected.psl -g Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL5.fa -m /opt/minimap2-2.14_x64-linux/minimap2 -f Sus_scrofa.Sscrofa11.1.95_with_PL5.gtf -t 40
flair quantify:
python /opt/flair/flair.py quantify -r reads_manifest.tsv -i flair.collapse.isoforms.fa -t 40 -m /opt/minimap2-2.14_x64-linux/minimap2 -o PL5_counts_matrix_v3.tsv
The result is remapped on the reference seq:
`#Remapping of collapse isoforms:
/opt/minimap2-2.14_x64-linux/minimap2 -ax splice Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL5.fa flair.collapse.isoforms.fa > aln_file_collapse_spliced_whole_genome_vs_ref_whole_genome.sam
samtools view -S -b aln_file_collapse_spliced_whole_genome_vs_ref_whole_genome.sam -o aln_file_collapse_spliced_whole_genome_vs_ref_whole_genome.sam.bam
samtools sort aln_file_collapse_spliced_whole_genome_vs_ref_whole_genome.sam.bam > aln_file_collapse_spliced_whole_genome_vs_ref_whole_genome.sam.sorted.bam
samtools index aln_file_collapse_spliced_whole_genome_vs_ref_whole_genome.sam.sorted.bam`
The result flair.collapse.isoforms.fa missed common isoforms for GAPDH and other important genes:
Please refer the following figure, the top view block (red box) are isoforms defined by FLAIR and the down view is nanopore raw reads bam file aligned to the reference, you can see it misses to connect the long isofrom as one isoform.
Similarly result flair.collapse.isoforms.fa, The other transgene seq we are interested missed to connect and annotate as long one isoform, we can see a high reads coverage but not extended the lenght of isoform denoted in that region, in top view (green box) are the collapsed isoforms assigned by FLAIR missed on isoforms which can be seen on the raw nanopore reads:
If you don't mind can you please guide us, what parameter we are missing to get full length isoforms and also step need to be taken to avoid missing on isoforms which we can see on the raw bam aligned reads.
Thanks,
With Regards,
Dharm