brookslabucsc / flair Goto Github PK

Full-Length Alternative Isoform analysis of RNA

License: Other

Python 98.12% Shell 0.25% Dockerfile 0.27% Makefile 1.30% Emacs Lisp 0.05%

flair's Introduction

flair

FLAIR (Full-Length Alternative Isoform analysis of RNA) for the correction, isoform definition, and alternative splicing analysis of noisy reads. FLAIR has primarily been used for nanopore cDNA, native RNA, and PacBio sequencing reads.

The complete Flair manual is available via readthedocs

Cite FLAIR

If you use or discuss FLAIR, please cite the following paper:

Tang, A.D., Soulette, C.M., van Baren, M.J. et al. Full-length transcript characterization of SF3B1 mutation in chronic lymphocytic leukemia reveals downregulation of retained introns. Nat Commun 11, 1438 (2020). https://doi.org/10.1038/s41467-020-15171-6

flair's People

Contributors

Stargazers

Watchers

Forkers

chuanj smaegol shaniamare sawyerhicks laffayb minw2828 diekhans pedronachtigall mjblow alexg9010 andr-kun shulp2211 xjtu-omics yangxiaofeill bowangxjtu rmtsoa yjzhang2013 ozgegizlenci pang-hd arontommi huangziyan11111 wqhf plantgenomicslab cassimons elliotmartin92 caspargross echo1610 hj1994412 mustafaelshani tderrien mglubber yingli1218 legendzdy icanccwhite jonn-smith xnxqc standardgalactic juansilva89 baraaorabi wzf9 beiusxzw genomicsnx hongyhong dar19 ctl43 nailouzhang yuxing123 hacaoe bharathananth lananh-ngn mubashermohammed pawarad sisov wenmm xyloforce zxgsy520 pyoelii mintstella0419 jeltje guanguiwensy rnaimehaom changlabsnu chrisamiller mviscardi-ucsc listen2099 gerikson cafelton wook2014 byee4

flair's Issues

License and License file missing

Hi all,

we wanted to use flair and package it for further distribution, unfortunately, it is not clear under which license flair is developed.

I would be great if you could clarify this and maybe add a license file.

Thanks!
Bjoern

Error using Flair correct

Hello,

Thank you very much for writing this tool - I'm looking forward to using it. I'm trying to run the 'correct' stage on some nanopore cDNA reads aligned to the Xenopus tropicalis genome and am getting an error which seems to be due to an unexpected negative number. Here is my command:

$ python flair.py correct -f ../reference/GCF_000004195.3_Xenopus_tropicalis_v9.1_genomic.gtf -c ../reference/chrom_lengths.txt -q flair_alignments/BC02.flair.bed -o BC02.flair.correct

I get a few of the below error:

File "/flair-1.3/bin/ssPrep.py", line 386, in
main()
File "/flair-1.3/bin/ssPrep.py", line 377, in main
intTree, ssData = buildIntervalTree(knownJuncs, wiggle)
File "/flair-1.3/bin/ssPrep.py", line 339, in buildIntervalTree
x.add(c1S,c1E,c1)
File "src/kerneltree.pyx", line 22, in kerneltree.IntervalTree.add
OverflowError: can't convert negative value to unsigned long

It looks like c1S sometimes becomes negative when 15 is subtracted. Please let me know if you have any ideas.

Thanks,

Katherine

Issues regarding flair

Hi @anbrooks @belgravia

I tried to run the flair pipeline on my MinION RNAseq dataset. I have faces some issues.

while trying align I have came across the following error

[ERROR] unknown preset ' splice '

So I mapped the reads independently with minimap2. Thereafter while using the collapse options python /u/home/Resource/flair/flair.py collapse -r test.fastq -g GRCh37.primary_assembly.genome.fa -q test_strand_corrected.psl -m minimap2 -f GenCode_v28.gtf , it gives me a blank output file. Also I tried using psl_to_sequence.py tools separately, it gives a blank output file.

Am I doing anything wrong?

error correct

Hi ,

I try to have flair running on our Nanopore data.
I aligned reads using minimap, converted bam to bed 12 then ran the flair correct script with

python flair.py correct -c /mnt/BIG/MINION/sizes.genome -g /mnt/BIG/MINION/GRCh38.fa -q /mnt/BIG/MINION/aln_test_52_sorted.bed12 -f annotation_trimmed_GRCh38.gtf --print_check -t 4 -o /mnt/BIG/MINION/test_52

the script is staying for a while at 60 % in the fifth step but apparently still running;
I got the following error message in the err_tmp file (just the part where the error starts to appear). any hint ?

Thank you,
Yann

** Correcting /root/flair/tmp_a08014da-94b6-4222-9c78-002cd0d37027/chr2_temp_reads.bed with a wiggle of 15 against /root/flair/tmp_a08014da-94b6-4222-9c78-002cd0d37027/chr2_known_juncs.bed. Checking
splice sites with genome /mnt/BIG/MINION/GRCh38.fa.
** Initializing int tree for chromosome chr2
** Checking SS motifs for chromosome chr2
** Checked 19801 splice sites for chromosome chr5... Adding to int tree
** Unsuccessful correction for chromosome chr5 Traceback (most recent call last):
File "/root/flair/bin/ssPrep.py", line 479, in
main()
File "/root/flair/bin/ssPrep.py", line 470, in main
intTree, ssData = buildIntervalTree(knownJuncs, wiggle, fa)
File "/root/flair/bin/ssPrep.py", line 398, in buildIntervalTree
if dinucDict[c1][0] != strand or dinucDict[c2][0] != strand:
KeyError: 110739040

** Correcting /root/flair/tmp_a08014da-94b6-4222-9c78-002cd0d37027/chr1_temp_reads.bed with a wiggle of 15 against /root/flair/tmp_a08014da-94b6-4222-9c78-002cd0d37027/chr1_known_juncs.bed. Checking splice sites with genome /mnt/BIG/MINION/GRCh38.fa.
** Initializing int tree for chromosome chr1
** Checking SS motifs for chromosome chr1
** Checked 15299 splice sites for chromosome chr4... Adding to int tree
** Unsuccessful correction for chromosome chr4 Traceback (most recent call last):
File "/root/flair/bin/ssPrep.py", line 479, in
main()
File "/root/flair/bin/ssPrep.py", line 470, in main
intTree, ssData = buildIntervalTree(knownJuncs, wiggle, fa)
File "/root/flair/bin/ssPrep.py", line 398, in buildIntervalTree
if dinucDict[c1][0] != strand or dinucDict[c2][0] != strand:
KeyError: 75913968

** Correcting /root/flair/tmp_a08014da-94b6-4222-9c78-002cd0d37027/chr9_temp_reads.bed with a wiggle of 15 against /root/flair/tmp_a08014da-94b6-4222-9c78-002cd0d37027/chr9_known_juncs.bed. Checking splice sites with genome /mnt/BIG/MINION/GRCh38.fa.
** Initializing int tree for chromosome chr9
** Checking SS motifs for chromosome chr9
** Checked 9625 splice sites for chromosome chr9... Adding to int tree
** Unsuccessful correction for chromosome chr9 Traceback (most recent call last):
File "/root/flair/bin/ssPrep.py", line 479, in
main()
File "/root/flair/bin/ssPrep.py", line 470, in main
intTree, ssData = buildIntervalTree(knownJuncs, wiggle, fa)
File "/root/flair/bin/ssPrep.py", line 398, in buildIntervalTree
if dinucDict[c1][0] != strand or dinucDict[c2][0] != strand:
KeyError: 64440180

Flair correct returns blank file

Hello,

I ran flair correct on my Nanopore direct RNA sequencing data. It ran without any problems, but the flair_all_corrected files were blank. Please, can you help me? What did I do wrong?

Thanks in advance,
Natasha

This is my command line:

python ~/Tools/flair-master/flair.py correct -f Homo_sapiens.GRCh38.95.gtf -c /scr/becherovka/natasha/Alzheimer/hg38.extended.chr_sizes.txt -q second_call.minimap2.sort.bed12 -t 8 -n

This was the last output:

Step 5/5: Correcting Splice Sites: 0it [00:00, ?it/s]0:00, 507387.35it/s]██████████████████████████████████████████████████████████████████████████████████████████████████████████| 47/47 [00:06<00:00, 7.79it/s]

And these are the contents of the output directory:

-rw-r--r-- 1 natasha staff 16M Jul 8 16:02 second_call.minimap2.sort.bed12
-rw-r--r-- 1 natasha staff 411M Jul 8 16:33 Homo_sapiens.GRCh38.95.gff3
-rw-r--r-- 1 natasha staff 0 Jul 8 16:37 GL000009.2_known_juncs.bed
-rw-r--r-- 1 natasha staff 0 Jul 8 16:37 GL000216.2_known_juncs.bed
-rw-r--r-- 1 natasha staff 0 Jul 8 16:37 GL000218.1_known_juncs.bed
-rw-r--r-- 1 natasha staff 0 Jul 8 16:37 GL000225.1_known_juncs.bed
-rw-r--r-- 1 natasha staff 0 Jul 8 16:37 KI270442.1_known_juncs.bed
-rw-r--r-- 1 natasha staff 0 Jul 8 16:37 KI270744.1_known_juncs.bed
-rw-r--r-- 1 natasha staff 0 Jul 8 16:37 KI270750.1_known_juncs.bed
-rw-r--r-- 1 natasha staff 0 Jul 8 16:37 flair_all_inconsistent.bed
-rw-r--r-- 1 natasha staff 0 Jul 8 16:37 flair_all_corrected.bed
-rw-r--r-- 1 natasha staff 1.1G Jul 8 16:56 Homo_sapiens.GRCh38.95.gtf
-rw-r--r-- 1 natasha staff 0 Jul 8 16:58 flair_all_corrected.psl
-rw-r--r-- 1 natasha staff 2.3M Jul 8 17:06 1_known_juncs.bed
-rw-r--r-- 1 natasha staff 1.9M Jul 8 17:06 2_known_juncs.bed
-rw-r--r-- 1 natasha staff 1.5M Jul 8 17:06 3_known_juncs.bed
-rw-r--r-- 1 natasha staff 956K Jul 8 17:06 4_known_juncs.bed
-rw-r--r-- 1 natasha staff 1.1M Jul 8 17:06 5_known_juncs.bed
-rw-r--r-- 1 natasha staff 1.1M Jul 8 17:06 6_known_juncs.bed
-rw-r--r-- 1 natasha staff 1.2M Jul 8 17:06 7_known_juncs.bed
-rw-r--r-- 1 natasha staff 780K Jul 8 17:06 X_known_juncs.bed
-rw-r--r-- 1 natasha staff 880K Jul 8 17:06 8_known_juncs.bed
-rw-r--r-- 1 natasha staff 900K Jul 8 17:06 9_known_juncs.bed
-rw-r--r-- 1 natasha staff 1.4M Jul 8 17:06 11_known_juncs.bed
-rw-r--r-- 1 natasha staff 928K Jul 8 17:06 10_known_juncs.bed
-rw-r--r-- 1 natasha staff 1.4M Jul 8 17:06 12_known_juncs.bed
-rw-r--r-- 1 natasha staff 437K Jul 8 17:06 13_known_juncs.bed
-rw-r--r-- 1 natasha staff 823K Jul 8 17:06 14_known_juncs.bed
-rw-r--r-- 1 natasha staff 930K Jul 8 17:06 15_known_juncs.bed
-rw-r--r-- 1 natasha staff 1.1M Jul 8 17:06 16_known_juncs.bed
-rw-r--r-- 1 natasha staff 1.5M Jul 8 17:06 17_known_juncs.bed
-rw-r--r-- 1 natasha staff 413K Jul 8 17:06 18_known_juncs.bed
-rw-r--r-- 1 natasha staff 533K Jul 8 17:06 20_known_juncs.bed
-rw-r--r-- 1 natasha staff 1.4M Jul 8 17:06 19_known_juncs.bed
-rw-r--r-- 1 natasha staff 87K Jul 8 17:06 Y_known_juncs.bed
-rw-r--r-- 1 natasha staff 535K Jul 8 17:06 22_known_juncs.bed
-rw-r--r-- 1 natasha staff 282K Jul 8 17:06 21_known_juncs.bed
-rw-r--r-- 1 natasha staff 694 Jul 8 17:06 MT_known_juncs.bed
-rw-r--r-- 1 natasha staff 1.5K Jul 8 17:06 KI270728.1_known_juncs.bed
-rw-r--r-- 1 natasha staff 887 Jul 8 17:06 KI270727.1_known_juncs.bed
-rw-r--r-- 1 natasha staff 193 Jul 8 17:06 GL000194.1_known_juncs.bed
-rw-r--r-- 1 natasha staff 130 Jul 8 17:06 GL000205.2_known_juncs.bed
-rw-r--r-- 1 natasha staff 161 Jul 8 17:06 GL000195.1_known_juncs.bed
-rw-r--r-- 1 natasha staff 99 Jul 8 17:06 KI270733.1_known_juncs.bed
-rw-r--r-- 1 natasha staff 93 Jul 8 17:06 GL000219.1_known_juncs.bed
-rw-r--r-- 1 natasha staff 1.2K Jul 8 17:06 KI270734.1_known_juncs.bed
-rw-r--r-- 1 natasha staff 594 Jul 8 17:06 GL000213.1_known_juncs.bed
-rw-r--r-- 1 natasha staff 99 Jul 8 17:06 GL000220.1_known_juncs.bed
-rw-r--r-- 1 natasha staff 62 Jul 8 17:06 KI270731.1_known_juncs.bed
-rw-r--r-- 1 natasha staff 210 Jul 8 17:06 KI270721.1_known_juncs.bed
-rw-r--r-- 1 natasha staff 62 Jul 8 17:06 KI270726.1_known_juncs.bed
-rw-r--r-- 1 natasha staff 2.0K Jul 8 17:06 KI270711.1_known_juncs.bed
-rw-r--r-- 1 natasha staff 62 Jul 8 17:06 KI270713.1_known_juncs.bed

Basic questions to FLAIR for use with 1D cDNA reads

I am quite new in bioinformatics and currently implementing FLAIR in our lab. Thus, I have some basic questions I hope will answer:

For 1D cDNA reads, would you recommend using the -n flag in during FLAIR align?
In Tang et al. 2018, you mention that you used mRNAtoGene following spliced alignment mapping with minimap2. Is mRNAtoGene integrated in the FLAIR align module?
Though not required, would you recommend including TSS/TES to increase the confidence in the isoforms? Or what is the purpose of this?

Thanks a lot in advance!

Output from diff_iso_usage

Hello,

I had a question about the output from the diff_iso_usage.py script. I don't quite understand the information contained in the alternative isoforms columns, as well as the last few columns at the end, and was wondering if you could explain it a bit more.

Thanks!

Nanopore1_diffEx.txt.zip

Decreased number of isoform when comparing multiple samples

When I run the flair collapse and quantification step for 3 samples:

python $flair collapse -r $base".fastq"  -q $base".corrected_all_corrected.psl" -g $genome -m $minimap -o $base -f $annotation -sam $samtools
echo -e $base"\t"$base"\t"$base"\t"$base".fastq" > "manifest.tsv"
python3 $flair quantify -i $base".isoforms.fa" -r "manifest.tsv" -m $minimap

I get the following number of isoforms for a gene (in this case TEAD3):

sample1: 6 (in quantification: 154 reads in total)
sample2: 7 (in quantification: 123 reads in total)
sample3: 13 (in quantification: 430 reads in total)

I would like to compare the samples, therefore I concatenated the flair-corrected read psl files, concatenated the manifest files and gave raw read files as a comma-separated list for -r.
python $flair collapse -r $base1".fastq,"$base2".fastq",$base3".fastq" -q "concatenated_corrected.psl" -g $genome -m $minimap -f $annotation -sam $samtools

In this case, flair.collapse.isoforms.gtf does not contain any exons for the gene and consequently there are no isoforms detected or quantified for this gene.

How should I proceed in this case?
Thanks!

diffExp outputs

I have been running flair in a conda environment created from your flair_env_conda.yaml file, and after running flair.py diffExp I get the following output files:

dge_QCplots_Treatment_v_CTRL.pdf
dge_Treatment_v_CTRL_deseq2_results.tsv
dge_Treatment_v_CTRL_deseq2_results_shrinkage.tsv
die_QCplots_Treatment_v_CTRL.pdf
die_Treatment_v_CTRL_deseq2_results.tsv
die_Treatment_v_CTRL_deseq2_results_shrinkage.tsv
diu_Treatment_v_CTRL_drimseq2_results.tsv
dge_stderr.txt

The "dge_stderr.txt" file contains the following text:

"running DESEQ2 dge
running DESEQ2 die
running DRIMSEQ diu
/home/nanopore/src/miniconda3/envs/flair_env/lib/python3.6/site-packages/rpy2/robjects/pandas2ri.py:191: FutureWarning: from_items is deprecated. Please use DataFrame.from_dict(dict(items), ...) instead. DataFrame.from_dict(OrderedDict(items)) may be used to preserve the key order.
res = PandasDataFrame.from_items(items)
/home/nanopore/src/miniconda3/envs/flair_env/lib/python3.6/site-packages/pandas/core/dtypes/missing.py:220: RuntimeWarning: invalid value encountered in isnan
result = np.isnan(values)"

Am I getting all the output files?
Or is the error in the "dge_stderr.txt" file truncating/abrogating the output?
How is the "_shrinkage" files different from the corresponding .tsv files?
Should replicates of the same condition be denoted by a different batch descriptor (in flair quantify) or is the variation between replicates taken into account by flair diffExp?
Finally: can you elaborate on the '-e' flag in flair diffExp (default=10) and how this influences the stringency of the analysis?

Thanks for your time!
/Sebastian

conda install flair errors

Hi,
I used conda install flair by code: conda install -c bioconda flair, but reported the following errors:
"""
(flair_cor) [qinh@node29 ~]$ conda install -c bioconda flair
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment:
Found conflicts! Looking for incompatible packages. failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Package sqlite conflicts for:
python=3.6.5 -> sqlite[version='>=3.22.0,<4.0a0|>=3.23.1,<4.0a0']
Package pysam conflicts for:
flair -> pysam
Package xz conflicts for:
python=3.6.5 -> xz[version='>=5.2.3,<6.0a0']
Package minimap2 conflicts for:
flair -> minimap2
Package pip conflicts for:
python=3.6.5 -> pip
Package libstdcxx-ng conflicts for:
python=3.6.5 -> libstdcxx-ng[version='>=7.2.0']
Package bedtools conflicts for:
flair -> bedtools
Package samtools conflicts for:
flair -> samtools
Package bioconductor-drimseq conflicts for:
flair -> bioconductor-drimseq
Package seaborn conflicts for:
flair -> seaborn
Package bioconductor-stager conflicts for:
flair -> bioconductor-stager
Package pybedtools conflicts for:
flair -> pybedtools
Package ncls conflicts for:
flair -> ncls
Package tk conflicts for:
python=3.6.5 -> tk[version='>=8.6.7,<8.7.0a0']
Package r-qqman conflicts for:
flair -> r-qqman
Package r-ggplot2 conflicts for:
flair -> r-ggplot2=2.2.1
Package numpy conflicts for:
flair -> numpy
Package pandas conflicts for:
flair -> pandas
Package tqdm conflicts for:
flair -> tqdm
Package ncurses conflicts for:
python=3.6.5 -> ncurses[version='>=6.0,<7.0a0']
Package matplotlib conflicts for:
flair -> matplotlib
Package libgcc-ng conflicts for:
python=3.6.5 -> libgcc-ng[version='>=7.2.0']
Package rpy2 conflicts for:
flair -> rpy2
Package libffi conflicts for:
python=3.6.5 -> libffi[version='>=3.2.1,<4.0a0']
Package readline conflicts for:
python=3.6.5 -> readline[version='>=7.0,<8.0a0']
Package bioconductor-deseq2 conflicts for:
flair -> bioconductor-deseq2
Package zlib conflicts for:
python=3.6.5 -> zlib[version='>=1.2.11,<1.3.0a0']
Package openssl conflicts for:
python=3.6.5 -> openssl[version='>=1.0.2o,<1.0.3a']
Package intervaltree conflicts for:
flair -> intervaltree
"""
I tried various versions of python, such as 2.7/3.6.5/3.6.9/3.7, all versions report this mistake.
In the previous several installations, I've installed the packages of Cython, intervaltree, kerneltree, tqdm, pybedtools, pysam-v0.15, minimap2 and bedtools, but still gave same mistakes.
Could you give me some advice? Thanks.

And I had installed in previous installations,

error in diffExp

Hi,
I'm using flair on native RNA sequencing. While running diffExp submodule with my count_matrix.tsv, I had the following error:

deFLAIR.py:107: RuntimeWarning: invalid value encountered in divide
self.usage = ["%.2f" % np.divide(iso,gene) for iso,gene in zip(self.exp,self.parent.exp)]

We don't know how to solve the problem, do you have any idea?
Thanks

Synatx error in diff_iso_usage.py

Hi!

I just found a minor syntax issue in the diff_iso_usage.py script. In line 39 it should be count2 instead of counts2.
Currently the script immediately fails due to this error.

Best,

isoforms shorter and missing introns

Hi,
I am having an issue where many of the isoforms identified by flair collapse are shortened and miss reads mapped to much longer regions of the genome (see picture from IGV). Additionally, none of the isoforms seen in both the firstpass debugging and final psl files include introns; they only span a single exon or UTR. The top view is the corrected alignments (from flair correct), the following two are debugging files, the next one is the file containing the final isoforms (from flair collapse), and the bottom view are the original mapped reads. My data is Nanopore long reads from a human cell line.

I mapped my reads separately with Minimap2 and proceeded with your protocol starting from flair correct, after converting my bam file to bed12 format.

bam_input='2019-6-25_A431/minimap_ontLR/transcriptclean_VA/_clean_sorted.bam'
python software/flair/bin/bam2Bed12.py -i bam_input > 2019-6-25_A431/minimap_ontLR/transcriptclean_VA/_clean_sorted.bed12

#correct
bedfile='2019-6-25_A431/minimap_ontLR/transcriptclean_VA/_clean_sorted.bed12'
genome='2019-6-13_A431/GRCh38_trimmed.fa'
output_dir='2019-6-25_A431/minimap_ontLR/transcriptclean_VA/flair/'
annot='software/flair/GRCh38_latest_genomic.gff'
samtools faidx {genome}
python3 software/flair/flair.py correct
-c software/bedtools2/genomes/human.hg38.genome.chrom.sizes
-g {genome}
-q {bedfile}
-o {output_dir}
-f {annot}

#collapse
reads='2019-6-13_A431/fasta/merged.fasta'
query='2019-6-25_A431/minimap_ontLR/transcriptclean_VA/flair/flair_all_corrected.psl'
output_file='2019-6-25_A431/minimap_ontLR/transcriptclean_VA/flair/flair.collapse'
python3 software/flair/flair.py collapse
-g {genome}
-r {reads} #raw reads before alignment
-f {annot}
-s 1 # minimum number of supporting reads for an isoform (3) -s
-o {output_file}
-q {query}

If you have any advice for how to troubleshoot this or need any additional information in order to resolve this issue, please let me know! Thank you in advance ~

error reading junctions

Full command:
python /work/flair/flair.py align -t 16 -c /biodata/genome_indexes/hg38noAlt.fa.fai -p -v1.3 -r ../file.fastq.gz -g /biodata/genome_indexes/hg38noAlt.fa

Maps, converts, sorts, then fails:

Converting sam output to bed
[bam_sort_core] merging from 16 files and 16 in-memory blocks...
Traceback (most recent call last):
File "/work/flair/bin/bam2Bed12.py", line 144, in
main();
File "/work/flair/bin/bam2Bed12.py", line 124, in main
for num, readData in enumerate(sObj.readJuncs(),0):
File "/work/flair/bin/samJuncs.py", line 174, in readJuncs
chromosome = read.reference_name
AttributeError: 'pysam.calignmentfile.AlignedSegment' object has no attribute 'reference_name'

Test data set

I am looking much into this tool for differential isoform usage analysis, and I was wondering if you could make a data set available as well as your results for such data set to evaluate if FLAIR is working correctly on my system prior to running my own data.

That would be greatly appreciated!

FLAIR collapse misidentifying (and missing) isoforms

I'm having lots of fun using FLAIR to look at transcript isoforms, particularly in situations with lots of overlapping (polycistronic) transcripts, but am noticing some unusual outputs that don't make sense from looking at the read data. The attached screenshot showcases several of these curiosities.

Issue #1: Transcript isoforms shown in the wrong orientation. The left most transcription unit (gene A) comprises two exons on the forward strand. FLAIR correctly identified this but for some reason also shows some transcript isoforms as being present on the reverse strand. This is not obviously supported by the read data and I am curious as to why this is happening?

Issue #2: Distinct transcripts are being merged together. Next to the two exon transcript (on the forward strand) is a single exon transcript (gene B). For some reason FLAIR is merged this transcript with the previous one to create an alternative isoform. While this isoform is supported by a small subset of read (and is due to read-through transcription), the vast majority of reads support gene A and gene B as distinct entities. I am not clear why FLAIR isn't therefore presenting an isoform for gene B alone?

Issue #3: The next transcription unit (gene C) is on the reverse strand but no isoform is detected by FLAIR, regardless of whether using the --stringent setting or not.

Issue #4: The right most transcript unit suggests the presence of a two exon transcript. This is supported by a subset of reads but like Issue #2, additional transcript isoforms are not shown.

I should note that the transcripts I am describing (expecting) are all experimentally validated with confirmed TSS and TES data. I figure that if I can optimize the parameters for FLAIR so that I can generate an output close to 'reality' then I can apply this to the remainder of the genome in which transcripts are less well defined.

I am continually experimenting with different parameters (-w, -s, --max_ends) to try and resolve this but no joy yet. What I do have though is data on TSS and TES derived both from other techniques (e.g. CAGE) and from the direct RNASeq data itself. It would be very useful if I could feed this data to FLAIR to inform its decision making around overlapping transcripts.

Error in collapse

Hello,

I've just noticed a tiny bug in the collapse stage which causes the following error:

Filtering isoforms
Renaming isoforms
Traceback (most recent call last):
  File "../flair/bin/identify_gene_isoform.py", line 105, in <module>
    tn_to_juncs[chrom][this_transcript] = junctions
KeyError: 'NC_006839.1'

This is my command:

$ python flair.py collapse -r reads.fastq.gz -g ../../reference/GCF_000004195.3_Xenopus_tropicalis_v9.1_genomic.fna -q flair.correct.psl -o flair.collapse -f ../../reference/GCF_000004195.3_Xenopus_tropicalis_v9.1_genomic.gtf -t 8

I think it's due to the fact that the final chromosome in my gtf file (NC_006839.1) is the mitochondrial genome which has no exons. So in may case I had to remove 3 lines of code (105-107) in identify_gene_isoform.py and it worked fine.

Cheers,

Katherine

path to Minimap2

Hi,
I am running into an issue in flair quantify; it is asking me to provide the executable path for Minimap2. I have provided it as you can see in my code below (also tried software/flair/minimap2/) but I keep receiving the same error. I did not have to specify the minimap2 path for flair collapse, which has ran successfully consecutive times. Do you have any insight into this issue?

python software/flair/flair.py quantify -m software/flair/minimap2 --tpm -r 2019-7-8_3CL/SHSY5Y/SHSY5Y_reads_manifest.txt -o 2019-7-8_3CL/SHSY5Y/trcl_VA/flair/counts_matrix.tsv -i 2019-7-8_3CL/SHSY5Y/trcl_VA/flair/3.flair.collpase.isoforms.fa

Thank you,
Taylor

Question on corrected reads

Hi!

I had some question on the wonderful script that you have provided.

For the flair-correct, what criteria does this algorithm uses to differentiate reads with corrected and inconsistent? Would there be any option to modify the filtering parameter?
For the flair-collapse step, does it only use the reads that are corrected, not the inconsistent reads? If so, I am losing quite a lot of the reads to inconsistent reads from flair-correct step. (about a half) and is this alright?

Thank you for your wonderful help!

Missing chromosomes in annotation cause collapse to fail

After successfully correcting reads using gencode and a custom, unique splice junction file, FLAIR collapse fails, presumably because it can't fin the decoy chromosomes in the reference chromosome library.

Can you recommend a solution that doesn't involve stripping out non-Gencode chromosomes from the mappings/annotations/corrected reads?

Error log:

Collapsing isoforms
Annotated ends extracted from GTF
Read data extracted
Single-exon genes grouped, collapsing
Traceback (most recent call last):
File "/home/jamfer/work/flair/bin/collapse_isoforms_precise.py", line 545, in
res = p.map(run_se_collapse, chrom_names)
File "/home/jamfer/work/python-2714/lib/python2.7/multiprocessing/pool.py", line 253, in map
return self.map_async(func, iterable, chunksize).get()
File "/home/jamfer/work/python-2714/lib/python2.7/multiprocessing/pool.py", line 572, in get
raise self._value
KeyError: 'chr1_KI270706v1_random'

Command used:
python flair.py collapse -t 24 -f gencode.v30.annotation.gtf -e comprehensive -r reads.fastq -q flair_all_corrected.psl -g hg38noAlt.fa

flair align: bed file did not generate

Hello,

I ran flair align on my reads and I believe that the correct sam file is generated as well as a bam file. However, there is no bed file that generates. I get this output after minimap finishes running:

[M::main] Real time: 73222.383 sec; CPU: 731142.740 sec
Converting sam output to bed
[bam_sort] Use -T PREFIX / -o FILE to specify temporary and final output files
Usage: samtools sort [options...] [in.bam]
Options:
  -l INT     Set compression level, from 0 (uncompressed) to 9 (best)
  -m INT     Set maximum memory per thread; suffix K/M/G recognized [768M]
  -n         Sort by read name
  -t TAG     Sort by value of TAG. Uses position as secondary index (or read name if -n is set)
  -o FILE    Write final output to FILE rather than standard output
  -T PREFIX  Write temporary files to PREFIX.nnnn.bam
      --input-fmt-option OPT[=VAL]
               Specify a single input file format option in the form
               of OPTION or OPTION=VALUE
  -O, --output-fmt FORMAT[,OPT[=VAL]]...
               Specify output format (SAM, BAM, CRAM)
      --output-fmt-option OPT[=VAL]
               Specify a single output file format option in the form
               of OPTION or OPTION=VALUE
      --reference FILE
               Reference sequence FASTA FILE [null]
  -@, --threads INT
               Number of additional threads to use [0]
If using samtools v1.3+, please specify -v1.3 argument

My samtools version is 1.8 and I'm ran flair align like this:

python ~/flair/flair.py align -m ~/minimap2/ -t 10 -r /space/collaborator/upload/Chip158_N90_pass.fq -g /space/grp/Pipelines/rnaseq-pipeline/Assemblies/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa

I also saw that there was previous issue in January to just run the bam2bed.py script as follows: python pathtoflair/bin/bam2Bed12.py -i sorted.bam > sorted.bed
However, this resulted in this

Traceback (most recent call last):
  File "/home/sbhuiyan/flair/bin/bam2Bed12.py", line 22, in <module>
    from samJuncs import SAM
  File "/home/sbhuiyan/flair/bin/samJuncs.py", line 24, in <module>
    from tqdm import *
ImportError: No module named tqdm

What should I do?

Error with flair.py align

Hi I am using flair for the first time and and trying to align my fastq file.

I used $ python flair.py align --threads 10 -g GRCh38.fa -r ./rawdata/NA12878-cDNA-1D.pass.dedup.fastq

I downloaed the test fatsq file from https://github.com/nanopore-wgs-consortium
but I get the error:

Converting sam output to bed
[E::hts_open_format] Failed to open file flair.aligned.sam
samtools view: failed to open "flair.aligned.sam" for reading: No such file or directory
Possible issue with samtools executable

I have double checked and samtools executable is able to run.

flair align - no mapping

Hello,

I just found your tool and I am really excited to try it out for my PacBio data.
However, I tried at first flair align and it did not work at all. It seems it skips the mapping because my generated bam file is empty...

Of course I can map my data independently but maybe you can check the wrapper function?

Here my command & output:

python ~/programs/flair/flair.py align -r filtered.A_Nipponbare.fasta -g Oryza_sativa.IRGSP-1.0.dna.toplevel.fa -sam samtools1.9 -o test_Nipponbare -m ~/programs/minimap2/minimap2

Aligning to the genome with minimap2
Converting output sam
[E::hts_open_format] Failed to open file test_Nipponbare.sam
samtools view: failed to open "test_Nipponbare.sam" for reading: No such file or directory
[bam_sort] Use -T PREFIX / -o FILE to specify temporary and final output files
Usage: samtools sort [options...] [in.bam]
Options:
-l INT Set compression level, from 0 (uncompressed) to 9 (best)
-m INT Set maximum memory per thread; suffix K/M/G recognized [768M]
-n Sort by read name
-t TAG Sort by value of TAG. Uses position as secondary index (or read name if -n is set)
-o FILE Write final output to FILE rather than standard output
-T PREFIX Write temporary files to PREFIX.nnnn.bam
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
-O, --output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
-@, --threads INT
Number of additional threads to use [0]
[E::hts_open_format] Failed to open file test_Nipponbare.bam
samtools index: failed to open "test_Nipponbare.bam": No such file or directory

Gene names not assigning correctly in FLAIR correct

Hi,

I have rat transcriptomic data of which I have ran FLAIR align, correct and collapse on and I have used an ensembl rat gtf file for the correct and collapse step. When I look at my flair.collapse.isoforms.psl or flair.collapse.isoforms.gtf file, I notice that many of the isoforms are not assigned a gene, even if they overlap with gene's chromosomal location.

For example, none of the isoforms get annotated to gene Cacna1h (this was one of the genes the transcriptomic data was targetting). I have attached a screenshot of my flair.collapse.isoforms.gtf file uploaded to UCSC genome browser. There are clearly isoforms for that gene - why aren't they getting annotated as isoforms of Cacna1h?

(Apologies for the size of the image - There are a lot of isoforms, you can see the NCBI gene and Ensembl gene for Cacna1h at the bottom)

incorrect documentation on chromsize input

The documentation claims that flair takes chromsizes in TSV format, which is not correct. TSV files as similar to CSV files, in that the first row is a header of column names.
https://en.wikipedia.org/wiki/Tab-separated_values

The code expects a simple tab separate file without a header, which is fine, as that is the format the ucsc browser provides.

isoforms are shorter and missed several isoforms seen on raw nanopore reads

Hello,
I am following your step to consolidate and get the count of isoforms: I believe we are missing lot of isoform during collapsing step.

The step I have taken to follow your protocol are as follows:

flair align:
python /opt/flair/flair.py align -r all_4.fastq -g Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL5.fa -m /opt/minimap2-2.14_x64-linux/minimap2 -o flair.aligned.PL5. -t 40 -p -v1.3

flair correct:
python /opt/flair/bin/bam2Bed12.py -i flair.aligned.PL5.bam > flair.aligned.PL5.bam.bed12 python /opt/flair/flair.py correct -f Sus_scrofa.Sscrofa11.1.95_with_PL5.gtf -c chromsizes_PL5.tsv -q flair.aligned.PL5.bam.bed12 -t 40

flair collapse:
python /opt/flair/flair.py collapse -r all_4.fastq -q flair_all_corrected.psl -g Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL5.fa -m /opt/minimap2-2.14_x64-linux/minimap2 -f Sus_scrofa.Sscrofa11.1.95_with_PL5.gtf -t 40
flair quantify:
python /opt/flair/flair.py quantify -r reads_manifest.tsv -i flair.collapse.isoforms.fa -t 40 -m /opt/minimap2-2.14_x64-linux/minimap2 -o PL5_counts_matrix_v3.tsv

The result is remapped on the reference seq:
`#Remapping of collapse isoforms:
/opt/minimap2-2.14_x64-linux/minimap2 -ax splice Sus_scrofa.Sscrofa11.1.dna.toplevel_with_PL5.fa flair.collapse.isoforms.fa > aln_file_collapse_spliced_whole_genome_vs_ref_whole_genome.sam

samtools view -S -b aln_file_collapse_spliced_whole_genome_vs_ref_whole_genome.sam -o aln_file_collapse_spliced_whole_genome_vs_ref_whole_genome.sam.bam

samtools sort aln_file_collapse_spliced_whole_genome_vs_ref_whole_genome.sam.bam > aln_file_collapse_spliced_whole_genome_vs_ref_whole_genome.sam.sorted.bam

samtools index aln_file_collapse_spliced_whole_genome_vs_ref_whole_genome.sam.sorted.bam`

The result flair.collapse.isoforms.fa missed common isoforms for GAPDH and other important genes:
Please refer the following figure, the top view block (red box) are isoforms defined by FLAIR and the down view is nanopore raw reads bam file aligned to the reference, you can see it misses to connect the long isofrom as one isoform.

Similarly result flair.collapse.isoforms.fa, The other transgene seq we are interested missed to connect and annotate as long one isoform, we can see a high reads coverage but not extended the lenght of isoform denoted in that region, in top view (green box) are the collapsed isoforms assigned by FLAIR missed on isoforms which can be seen on the raw nanopore reads:

If you don't mind can you please guide us, what parameter we are missing to get full length isoforms and also step need to be taken to avoid missing on isoforms which we can see on the raw bam aligned reads.

Thanks,

With Regards,
Dharm

flair collapse: number of reads per isoform

Hello,

Is there any way to determine the number of reads per isoform after the flair collapse step?Would that be in the *.firstpass.q1.counts file that gets outputted?

conda package for flair

Hello,

Thanks for this great tool. I was wondering if you have already planned or considered providing a conda package for flair? Since conda pacakge can be installed in one line command, by taking care of all dependencies and versioning, It would make the tool more accessible and deployable.

As I checked, almost all dependencies should be available in bioconda and other channels.

Best,

FLAIR collapse stringency

When I am running FLAIR collapse, it looks like the module is too strict and removes even some isoforms that are clearly present, while the firstpass isoforms appear to be a more accurate representation of the isoforms present in my samples.

GAPDH locus after FLAIR correct for the untreated sample (before FLAIR collapse)

GAPDH locus after FLAIR collapse:

The goal is to investigate differentially used isoforms between two samples with high confidence, but also to obtain a broad image of the genes affected globally. Do you have any suggestions regarding the stringency criteria for such analysis? Note, that I am only using the annotation GTF during correction since no short read data is available to me, and I have 100.000 reads for each of two conditions.

Thanks in advance!

find_alt3prime_5prime_ss.py

Hi for the script "find_alt3prime_5prime_ss.py", you mentioned the "colnum" argument refers to "the 0-indexed column number of the two extra columns (assumed to be last two)", I am confused about this, shall I use one column or two columns in this argument? I've tried to use command like:
python flair/bin/find_alt3prime_5prime_ss.py output.psl 21 alt_acceptor.txt alt_donor.txt

Traceback (most recent call last):
File "flair/bin/find_alt3prime_5prime_ss.py", line 106, in
alljuncs = pslreader(psl)
File "flair/bin/find_alt3prime_5prime_ss.py", line 25, in pslreader
count0, count1 = int(line[colnum]), int(line[colnum + 1])
ValueError: invalid literal for int() with base 10: '27.0'

python flair/bin/find_alt3prime_5prime_ss.py output.psl 21,22 alt_acceptor.txt alt_donor.txt

Neither of them work from the beginning line of output.psl. Not sure what is going on.

The first line of my output.psl file is shown as below:
0 0 0 0 0 0 0 0 + 6753031f-f907-46de-933b-83191c940b72;0_1:5481000 484 0 484 1
30427671 5481659 5482143 1 484, 0, 5481659, 27.0 4.0

Thanks

flair exits with success status on error cases

There are many places where flair exits with a zero status when an error occurs. This makes it impossible to build a robust pipeline with flair.

for example
except: sys.stderr.write('Possible minimap2/samtools error, specify paths or make sure they are in $PATH\n') sys.exit()

Flair Align no bed file

Hi there,

I was using the recommended default commands for flair align (shown below) but it only generates sam and bam files, and not bed files.

Would you kindly be able to let me know how I can get these bed files?

Thank you

python "/media/Programmes/flair/flair.py" align -r "/media/BRCA12D.FASTA" -g "/media/GRCh38.genome.fa"

Minor bug

Hi,

just found another small thing. When aligning reads to first-pass isoform reference including salmon, the samtools call fails since it cannot open the output file.
Fixed the issue for me by replacing stdout=open(alignout+'.mapped.sam')): with stdout=open(alignout+'.mapped.sam', "w")): (Line 277)

Best,

error in correct

Hi,
I noticed that the FLAIR can use the short reads to correct the nanopore reads. And the example shows that I need to use the shortreads.sam as the infile. And my question is how can I get the shortreads.sam file. Is it got by mapping the Illumina reads to the reference with the alignment software?

flair create temporary files in the current directory

flair correct creates files in the current directory with names like chr22_known_juncs.bed.

flair collapse creates files in the form temp_0.firstpass.sam

If multiple instances of flair are run in the directory, as would be common in cluster runs, jobs will corrupt other jobs data.

The files are also not deleted after the run.

Suggest using tempfile facility or the output file prefix

Question about nanopore strand correction using splice sites

Hi,

I have cDNA ONT reads that I correct with junctions from stranded short reads libraries. I used -n option to resolve ONT read strand during the correction step.

I noticed that running flair correct with -n does not correct the ONT reads strand. The reads are corrected in the absence of -n option (see example Gene01; what is shown in red/green are the transcripts after flair collapse).
I looked up how -n option is handled in the scripts (flair.py):

if not args.n:
		correction_cmd += ['--correctStrand']

Shouldn't this be the opposite? It would explain why the strand is corrected in absence of -n option.

Gene02 does not have a splice site, so I guess the strand is not resolved. In this case, I was wondering how the final strand is inferred when a transcript is supported by +/- reads?

Many thanks!

Best,
Amira

mark_productivity.py producing no valid transcripts (?)

Thank you for the great tool... I'm running mark_productivity.py on flair.py generated *isoforms.psl files. I keep on getting the following message:
Traceback (most recent call last): File "../../../SourceFiles/flair-master/bin/mark_productivity.py", line 228, in <module> sys.stderr.write('Unproductive proportion estimate ' + str(unproductive / float(valid_transcripts)) + '\n') ZeroDivisionError: float division by zero

I believe the script doesn't identify that there are any valid transcripts? But I see that there are transcripts beginning with ATG when I look at the isoforms.fa files of the predicted isoforms.

Can you please help me by explaining what's going on? I might be probably missing something here! One thing I did when running the flair-collapse module is not filtering the firstpass.sam files based on quality to generate firstpass.q1.sam because I wanted to look at all the entries first. So I used firstpass.sam to generate the final .psl files that I used as input for mark_productivity.py

My script for running mark_productivity.py:
for file in ../*.isoforms.psl;
do
out=${file%%.psl}; out=${out##../};
python3 ../../../SourceFiles/flair-master/bin/mark_productivity.py \
${file} \
../../../references/GRCh38_ucsc.gtf \
../../../references/human38_sequins_cal_merged.fa \
> ${out}.productivity.psl;
done

Thanks a lot.

Error in correct : no inconsistent.bed files

Having issues figuring out the error message from flair correct. The command and error message are:

$python flair.py correct -f gencode.v30lift37.fixed.annotation.gtf -c hg19.chrom.sizes.fixed2.txt -q pancreas.bed
                                                                                                                                                                                                                                                 Traceback (most recent call last):: 100%|##########################################################################################################################################################################| 31/31 [01:14<00:00,  2.03s/it]
  File "/tools/flair/bin/ssCorrect.py", line 327, in <module>
    main()
  File "tools/flair/bin/ssCorrect.py", line 309, in main
    with open(os.path.join(tempDir, "%s_inconsistent.bed" % chrom),'rb') as fd:
FileNotFoundError: [Errno 2] No such file or directory: '/analysis/flair/tmp_bda22e0a-6614-4624-8703-4212e76e4505/7_inconsistent.bed'
Correction command did not exit with success status

It seems to be failing at the splice site correction step 5/5

The gencode GTF, the bed file and the chromosome size file all have chr7 represented.

Also not sure if helps, but there is a *_known_juncs.bed and *_temp_juncs.bed in the temp folder for every chromosome, but there are no *_inconsistent.bed files.

Any idea how I can get this to run? Thank you!

Error in correct

Hello,

I'm trying to run this pipeline with data I've previously aligned and getting an error. I have created the bed12 file and when I run:

python ~/flair/flair.py correct -f /data2/csijcs/hg38/gencode.v29.annotation.gtf -c /data2/csijcs/hg38/hg38.chrom.sizes -q /data2/csijcs/nanopore/AML_009_combined/AML_009_combined_sorted.bed

I get the following error:

Traceback (most recent call last): File "/home/csijcs/flair/bin/ssPrep.py", line 21, in <module> File "/home/csijcs/flair/bin/ssPrep.py", line 21, in <module> from kerneltree import IntervalTree ImportError: No module named kerneltree from kerneltree import IntervalTree ImportError: No module named kerneltree

Can you tell me what's going wrong here?

Thanks,
J

What does the "-1" after the isoform identifier mean?

In my FLAIR collapse output, I have noted that some isoform identifiers are followed by "-1" or "-2":

>ENSRNOT00000043626_ENSRNOG00000033893
>ENSRNOT00000043626-1_ENSRNOG00000033893
>ENSRNOT00000043626-2_ENSRNOG00000033893
>ENSRNOT00000043626-3_ENSRNOG00000033893

Does this mean that the reads were similar enough to ENSRNOT00000043626, but they were four distinct groups of reads?

-Shams

Identified isoforms can be found in UCSC genome browser

Hi,

I can successfully run through flair pipeline and obtain quite a lot novel isoforms. But when I import randomly a few into UCSC genome browser and it seems to me that the isoforms are already identified, which is not novel.

I am currently using gencode version 31, which is the latest. I used Comprehensive gene annotation GTF file for PRI assembly (https://www.gencodegenes.org/human/).

I also attached a screenshot here for one of the isoforms. Am I missing something here? My primary purpose is to identify novel isoforms. My track is the top one and you can see that we can find exactly the same one among one of the tracks below.

Thanks a lot for developing such a handy tool for the nanopore RNAseq community.

Best,
Yeting

Input for diff_iso_usage.py

I have a problem figuring out which columns are used as input in the diff_iso_usage.py script.

Before appending the counts to .psl, it is clear that the columns containing the read counts are #2 and #3 in my counts_matrix.tsv file.

But after appending to .psl, I cannot figure out the right columns to specified as input in my output.psl file for diff_iso_usage.py.

tsv_psl.tar.gz

An error in correct step

Hello，
I met an error in correct step. The error message is like this:

File "/path/flair-master/bin/ssCorrect.py", line 304, in
main()
File "/path/flair-master/bin/ssCorrect.py", line 269, in main
outDict[chrom] = open("%s_temp_reads.bed" % chrom,'w')
IOError: [Errno 24] Too many open files: 'NW_017860819.1_temp_reads.bed'
Exception KeyError: KeyError(<weakref at 0x2b95b617ac00; to 'tqdm' at 0x2b95b61efd50>,) in <bound method tqdm.del of Step 4/5: Preparing reads for correction: 11379698it [00:24, 154307.47it/s]> ignored
usage: script.py psl ref.gtf/ref.gp isos_matched.psl
rm: cannot remove ‘test.cDNA_all_corrected.unnamed.psl’: No such file or directory

I check the ssCorrect.py and I think the error may be caused by my reference. This reference is not assembled very well. It has more than 9000 scaffolds and does not have the chromosomes. Maybe the script can't handle so many scaffolds. But the point is how can I solve this problem? I can't find other version of the reference genome. Could you give me some suggestions about this?

Error with collapse

I was running flair.py collapse:
flair.py collapse -f genome.gtf -g genome.fa -r sample.fastq -q corrected.psl
and got the following error:

Annotated ends extracted from GTF
Read data extracted
Single-exon genes grouped, collapsing
Filtering isoforms
Renaming isoforms
Traceback (most recent call last):
File "/home/zoe/bin/flair-master/bin/identify_gene_isoform.py", line 177, in
if tn_to_juncs[chrom][t] == junctions:
KeyError: '3'

Please help?

bedToPsl error

Hi,

I am attempting to analyse some Nanopore direct RNA sequencing reads using flair but appear to be stuck at the correction/collapse steps. Specifically, I cannot obtain the psl of corrected reads for further analysis. Below is my command and the error message.

python flair.py correct -f /Users/vvernlee/Downloads/ToxoDB-39_TgondiiME49.gff -q /Users/vvernlee/Downloads/flair-master/T2.bed\ -c chromsize.tsv

...

Traceback (most recent call last):
File "flair.py", line 98, in
subprocess.call([path+'bin/bedToPsl', args.c, args.o+'.corrected.bed', args.o+'.corrected.unnamed.psl'])
File "/anaconda3/lib/python3.6/subprocess.py", line 267, in call
with Popen(*popenargs, **kwargs) as p:
File "/anaconda3/lib/python3.6/subprocess.py", line 709, in init
restore_signals, start_new_session)
File "/anaconda3/lib/python3.6/subprocess.py", line 1344, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: 'bin/bedToPsl'

The corrected bed file is obtained but when I attempt to use it for flair collapse, I obtain:

Collapsing isoforms
Traceback (most recent call last):
File "bin/collapse_isoforms_precise.py", line 304, in
singleexon[chrom] = add_se(singleexon[chrom], tss, tes, line)
File "bin/collapse_isoforms_precise.py", line 104, in add_se
for coord in sedict.keys():
RuntimeError: dictionary changed size during iteration
Filtering isoforms
usage: script.py collapsed.psl (default/comprehensive/ginormous) filtered.psl [tolerance]
mv: rename /Users/vvernlee/Downloads/flair-master/flair.corrected.firstpass.filtered.psl to /Users/vvernlee/Downloads/flair-master/flair.corrected.firstpass.psl: No such file or directory
usage: script.py psl genome.fa outfilename
Aligning reads to first-pass isoform reference
[ERROR] failed to open file '/Users/vvernlee/Downloads/flair-master/flair.corrected.firstpass.fa'
Counting isoform expression
Filtering isoforms by read coverage
Traceback (most recent call last):
File "bin/match_counts.py", line 9, in
for line in open(sys.argv[2]):
FileNotFoundError: [Errno 2] No such file or directory: '/Users/vvernlee/Downloads/flair-master/flair.corrected.firstpass.psl'
Removing intermediate files/done
rm: /Users/vvernlee/Downloads/flair-master/flair.corrected.firstpass.fa: No such file or directory

Any help would be much appreciated.
Vern

make sure psl are valid and pass pslCheck

The alignment statistics are not filled in. While not possible without an RNA sequence, at least set them to all bases matching so that pslCheck can be used to validate alignments.

overlapping annotations

Hi,

I'm trying to get a handle on how FLAIR handles overlapping annotations.

For instance, if I have two RNAs that share the same 3' termination but have different 5' start sites, would these be counted as distinct isoforms?

junctions_from_sam crashes with ValueError: invalid literal for int() with base 10: '11=' error

Hello!

thanks for creating Flair!! It looks like a really great tool! I am trying to use Illumina PE reads to correct Nanopore DRS reads. So the first thing I did was trying to run junctions_from sam.py but it died with the error below:

Traceback (most recent call last):
File "/homes/brauerei/natasha/Tools/flair-master/bin/junctions_from_sam.py", line 820, in
if name == "main": main()
File "/homes/brauerei/natasha/Tools/flair-master/bin/junctions_from_sam.py", line 438, in main
downstr_len = int(n_split.pop().rstrip("M"))
ValueError: invalid literal for int() with base 10: '11='

My command line is:
$ python ~/Tools/flair-master/bin/junctions_from_sam.py -s UU_Library_X09_S44_mapped_sorted_unique.sam -n UU_Library_X09_S44_mapped_sorted_unique.sj

I am inside a conda environment with Python 3.7.2

The log is this:
Parsing sam/bam file
Not supporting softclipping, yet e.g., 4S21=
Expecting a junction read: 92=
Expecting a junction read: 92=
Expecting a junction read: 150=
Expecting a junction read: 24=1X15=1X8=1X4=1X5=1X42=1X47=
Expecting a junction read: 23=1X15=1X8=1X4=2X4=1X41=2X47=
Expecting a junction read: 23=1X15=1X8=1X4=1X5=1X42=1X47=
Expecting a junction read: 22=1X15=1X8=1X4=1X5=1X42=1X49=
Expecting a junction read: 22=1X15=1X8=1X4=1X5=1X42=1X49=
Expecting a junction read: 22=1X15=1X8=1X4=1X5=1X42=1X49=
Expecting a junction read: 1=1X149=
Expecting a junction read: 151=
Expecting a junction read: 151=
Expecting a junction read: 4=1X15=1X8=1X4=1X5=1X42=1X16=1X50=
Expecting a junction read: 3=2X15=1X8=1X4=1X5=1X42=1X67=
Expecting a junction read: 2=1X15=1X8=1X4=1X5=1X42=1X68=
Expecting a junction read: 15=1X8=1X4=1X5=1X42=1X71=
Expecting a junction read: 15=1X8=1X4=1X5=1X42=1X71=
Expecting a junction read: 15=1X8=1X4=1X5=1X42=1X72=
Expecting a junction read: 151=
Expecting a junction read: 151=
Expecting a junction read: 1=1X42=1X5=1X84=1X15=
Expecting a junction read: 42=1X67=1X22=1X15=
Expecting a junction read: 37=1X108=1X1=
Expecting a junction read: 37=1X85=1X22=1X4=
Expecting a junction read: 37=1X110=
Expecting a junction read: 34=1X116=
Expecting a junction read: 30=1X66=
Expecting a junction read: 30=1X75=
Expecting a junction read: 25=1X123=
Expecting a junction read: 25=1X125=
Expecting a junction read: 25=1X124=
Expecting a junction read: 25=1X124=
Expecting a junction read: 24=1X111=
Expecting a junction read: 24=1X111=
Expecting a junction read: 150=
Expecting a junction read: 1X22=1X122=
Expecting a junction read: 1X22=1X122=
Expecting a junction read: 23=1X113=
Expecting a junction read: 23=1X113=
Expecting a junction read: 16=1X134=
Expecting a junction read: 149=
Expecting a junction read: 149=
Expecting a junction read: 1X15=1X24=1X79=
Expecting a junction read: 1X15=1X24=1X79=
Expecting a junction read: 14=1X50=1X10=1X73=
Expecting a junction read: 14=1X135=
Expecting a junction read: 10=1X139=
Expecting a junction read: 10=1X140=
Expecting a junction read: 10=1X139=
Expecting a junction read: 43=1X107=
Expecting a junction read: 8=1X142=
Expecting a junction read: 7=1X142=
Expecting a junction read: 7=1X118=
Expecting a junction read: 7=1X118=
Expecting a junction read: 4=1X145=
Expecting a junction read: 4=1X145=
Expecting a junction read: 2=1X147=
Expecting a junction read: 1=1X149=
Expecting a junction read: 1=1X37=1X64=1X17=1X27=
Expecting a junction read: 151=
Expecting a junction read: 66=1X82=
Expecting a junction read: 151=
Expecting a junction read: 151=
Expecting a junction read: 151=
Expecting a junction read: 143=
Expecting a junction read: 143=
Expecting a junction read: 151=
Expecting a junction read: 111=
Expecting a junction read: 111=
Expecting a junction read: 151=
Expecting a junction read: 64=1X86=
Expecting a junction read: 151=
Expecting a junction read: 131=
Expecting a junction read: 131=
Expecting a junction read: 143=1X7=
Expecting a junction read: 4=1X11=1X3=1X10=1X1=1X104=1X3=1X7=
Expecting a junction read: 150=
Expecting a junction read: 151=
Expecting a junction read: 151=
Expecting a junction read: 149=
Expecting a junction read: 150=
Expecting a junction read: 148=
Expecting a junction read: 150=
Expecting a junction read: 148=
Expecting a junction read: 150=
Expecting a junction read: 134=1X5=1X1=1X5=
Expecting a junction read: 148=
Expecting a junction read: 101=
Expecting a junction read: 101=
Expecting a junction read: 148=
Expecting a junction read: 148=
Expecting a junction read: 150=
Not supporting insertions, yet e.g., 148=1I1=1X
Expecting a junction read: 144=1X3=1X
Expecting a junction read: 138=
Expecting a junction read: 138=
Traceback (most recent call last):
File "/homes/brauerei/natasha/Tools/flair-master/bin/junctions_from_sam.py", line 820, in
if name == "main": main()
File "/homes/brauerei/natasha/Tools/flair-master/bin/junctions_from_sam.py", line 438, in main
downstr_len = int(n_split.pop().rstrip("M"))
ValueError: invalid literal for int() with base 10: '11='

I also tried with the sorted bam file but got the same error. Please, can you help me?

best wishes,
Natasha

Error at flair collapse

I get a following messages running flair

Collapsing isoforms
Make sure all files (query, GTF) have valid paths and can be opened
Filtering isoforms
usage: script.py collapsed.psl (default/comprehensive/ginormous) filtered.psl [tolerance]
mv: cannot stat ‘$base.corrected_all_corrected.firstpass.filtered.psl’: No such file or directory
Renaming isoforms
usage: script.py psl gtf isos_matched.psl

What might be my problem here?
My code:

python` $flair align -r $fnano -g $genome -m $minimap -o $base -sam $samtools
python $scriptdir"bam2Bed12.py" -i $base".sam" > $base".bed12"
python $scriptdir"junctions_from_sam.py" -s $frna -n shortreads
python $flair correct -f $annotation -c $chrSize -q $base".bed12" -j shortreads_junctions.bed -o $base".corrected"
python $flair collapse -r $fnano -q $base".corrected_all_corrected.psl" -g $genome -m 
$minimap -o $base -f $annotation -sam $samtools

Error message if --temp_dir is missing

Running the flair collapse with --temp_dir option fails with a very unspecific error message when the temporary folder does not exists:

"Possible minimap2/samtools error, specify paths or make sure they are in $PATH"

Either could you just test if the file/folder exists or just report a more specific error message, so other users might not run into the same issue as I struggled quite long to find the core of the problem as I was searching for problems with minimap2/samtools.

flair/flair.py

Line 254 in a7d2927

    
           alignout = args.temp_dir + '/' + tempfile_name[tempfile_name.rfind('/')+1:]+'.firstpass'

Thx Alex

brookslabucsc / flair Goto Github PK

flair's Introduction

flair

Cite FLAIR

flair's People

Contributors

Stargazers

Watchers

Forkers

flair's Issues

Recommend Projects

Recommend Topics

Recommend Org