Giter Site home page Giter Site logo

Comments (18)

r78v10a07 avatar r78v10a07 commented on June 6, 2024

Hi,
Thank you for reporting this.
Could you, please, share a portion of your BAM file in SAM or BAM format?
Thanks,
Roberto

from tpmcalculator.

r78v10a07 avatar r78v10a07 commented on June 6, 2024

Hi,
I've reading your GTF and BAM file and it seems to be some incompatibilities.

The SAM format specify that in the third column should be reference name. Therefore, TPMCalculator expect to find in that column the chromosome name. Your BAM file has the transcript id instead of the chromosome name.

I haven't used subjunc ever but you should check if the aligner is writing the BAM file using the transcript id as reference instead of the chromosome.

I would recommend you to use STAR (https://github.com/alexdobin/STAR) for the alignments and you will get a 100% compatible BAM file to use in TPMCalculator.

Please, let me know if this help.
Best,
Roberto

from tpmcalculator.

Moo-cow avatar Moo-cow commented on June 6, 2024

Hi,

Thank you very much for your help. I tried STAR once but mapping no result ,maybe my annotation file is not good for STAR and TPMCalculator . So I will try other annotation file.

Thank you once again for your help.

Best,
zyzhang

from tpmcalculator.

Moo-cow avatar Moo-cow commented on June 6, 2024

Hi Roberto,
The TPMCalculator is only use GTF ,GFF3 is not working .
Right ?
image
Best,
zyzhang

from tpmcalculator.

r78v10a07 avatar r78v10a07 commented on June 6, 2024

Hi,
You're right. GFF3 is not implemented yet.
Best,
Roberto

from tpmcalculator.

r78v10a07 avatar r78v10a07 commented on June 6, 2024

Hi,

Thank you very much for your help. I tried STAR once but mapping no result ,maybe my annotation file is not good for STAR and TPMCalculator . So I will try other annotation file.

Thank you once again for your help.

Best,
zyzhang

Your GTF seems good to me but it will be good if you can try another annotations as well.
Best,
Roberto

from tpmcalculator.

r78v10a07 avatar r78v10a07 commented on June 6, 2024

Hi,
I've aligned the first 2 053 369 reads of the SRA accession: SRR3724613

Everything worked fine, see description below.

Please, send me a descrition of how did you get your BAM file to see if I can help to find the problem.

Best,
Roberto

I used the GTF you provided with the NCBI genome downloaded from:

Genome

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/433/935/GCA_001433935.1_IRGSP-1.0/GCA_001433935.1_IRGSP-1.0_assembly_structure/Primary_Assembly/assembled_chromosomes/FASTA/

I've renamed the header of the fasta files like in your GTF for each chromosome, ie.
>chr01

Subread indexes

subread-buildindex -o IRGSP-1.0 ../*.fna

Subjunc alignment

subjunc -T 16 -d 50 -D 600 -i /gfs/data/genomes/NCBI/IRGSP-1.0/subjunc/IRGSP-1.0 -r 1.fastq -R 2.fastq -o subjunc_results.bam -a /gfs/data/genomes/NCBI/IRGSP-1.0/IRGSP-1.0_representative_transcript_exon_2018-11-26.gtf -F GTF

Output

//============================= subjunc setting ==============================\\
||                                                                            ||
|| Function      : Read alignment + Junction detection (RNA-Seq)              ||
|| Input file 1  : 1.fastq                                                    ||
|| Input file 2  : 2.fastq                                                    ||
|| Output file   : subjunc_results.bam (BAM)                                  ||
|| Index name    : IRGSP-1.0                                                  ||
||                                                                            ||
||                    ------------------------------------                    ||
||                                                                            ||
||                               Threads : 16                                 ||
||                          Phred offset : 33                                 ||
||               # of extracted subreads : 14                                 ||
||                        Min read1 vote : 1                                  ||
||                        Min read2 vote : 1                                  ||
||                     Max fragment size : 600                                ||
||                     Min fragment size : 50                                 ||
||                        Max mismatches : 3                                  ||
||                      Max indel length : 5                                  ||
||            Report multi-mapping reads : no                                 ||
|| Max alignments per multi-mapping read : 1                                  ||
||                           Annotations : IRGSP-1.0_representative_trans ... ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

...

//================================= Summary ==================================\\
||                                                                            ||
||     Total fragments : 2053369                                              ||
||              Mapped : 1816629 (88.5%)                                      ||
||     Uniquely mapped : 1816629                                              ||
||       Multi-mapping : 0                                                    ||
||                                                                            ||
||            Unmapped : 236740                                               ||
||                                                                            ||
||    Correctly paired : 1654461                                              ||
|| Not mapped in pairs : 162168                                               ||
|| Only one end mapped : 112105                                               ||
||   Multi-chromosomes : 9786                                                 ||
||   Different strands : 945                                                  ||
||  Not in PE distance : 19339                                                ||
||      Abnormal order : 19993                                                ||
||                                                                            ||
||           Junctions : 86895                                                ||
||              Indels : 19946                                                ||
||                                                                            ||
||        Running time : 1.1 minutes                                          ||
||                                                                            ||
\\===================== http://subread.sourceforge.net/ ======================//

BAM

perseo:data> samtools view subjunc_results.bam | head
SRR3724613.1	89	chr03	10563073	40	100M1S	*	0	0	AAGTGAGCAGCGAGAAGCTTGGTGTGTGGTTTGGTTGCTGTCGGAGAATGGTTTTGTTTGACATGATGCTGTCAACACCGGCTATGTTATGTATCTATCTN	ACCCACBBBBBCC@CC@BBB?B?BA;?BB?CCCBD@DEFIGEIHFFDBEGIHGGGDGGFBDGIIGIFC3EDGE?EDF8GIGHBGIGIIHHHFDD>DD?:1#	HI:i:1	NH:i:1	NM:i:0
SRR3724613.1	165	*	0	0	*	chr03	10563073	0	GTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN	<<###################################################################################################	HI:i:1	NH:i:1
SRR3724613.2	77	*	0	0	*	*	0	0	NTTTGAAATTCCACACTCCAATACGTCATTCTAGTGGTTCAGACAAAGAGCATCAAAGCAAGAATACACAGCAACACAATTGGAATCACCTAAAACGATGC	#073(()2)2<):=@())9=1)6:()2(.)2=>))1(2(:).3=(<).:)<)0))).91.7.6)6)).).((...33((:(.799=>1((--5;;28'(,(
SRR3724613.2	141	*	0	0	*	*	0	0	GGCNNNNNNNNNNNNNNNNNTTANNNNNNNNNNNNNNNNGACNNNNNNNNNNNNNNNNANANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN	(((#################(((################(0(################,#,########################################
SRR3724613.3	73	chr10	2756847	40	1S100M	*	0	0	NCCTTCCACTGGCAGGCGATCGTGTTTTTCACAGGATTTCTCGTTACTCATGTCAGCATTCTCACTTCTGATATCTCCAGGTCTTGTCACCAAAAACCTTC	#1=BDDDDDDDDDDEIAA>FDE3CDDDDDD9BD<D;9DDEI>;BC;C@7@CDE=;=CC7??CCDDDDDD:@@@;@@@A@A?AAA>AAAAAAA??????A>A	HI:i:1	NH:i:1	NM:i:0
SRR3724613.3	133	*	0	0	*	chr10	2756847	0	GANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN	<<###################################################################################################	HI:i:1	NH:i:1
SRR3724613.5	73	chr11	296266	40	1S100M	*	0	0	NGGCCCTTCTCCTTGAGCCAGTCCAGAAGCTCCCTCAGCGTGATGTTGCCAGTTATGGTCCAGCGGTCCCAAACAGTCCAGGCCATGTCCTGGTGCTTGAT	#1=DDFFFHHGHHIHIGIIIIICHDFHIHHBHIIIIIGIHFHGIIGGCGIIIHIIIIICHGGHIIGHEBDF@BCECCCCC>ABCCCCDCDCCCCCCCCABC	HI:i:1	NH:i:1	NM:i:0
SRR3724613.5	133	*	0	0	*	chr11	296266	0	AGCNNNNNNNNNNNNNNNNCGGGCNNNNNNNNNNNNNNTTCCTNNNNNNNANNNNNTGAAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN	<<<################22@@<##############00<??#######-#####,,9==########################################	HI:i:1	NH:i:1
SRR3724613.7	73	chr10	22333670	40	1S7M364N93M	*	0	0	NTCCAAACCAGGTCATAGCCAAGTGGCAGAGCGAAATTTCCACCCACATTTTTGTAGACAGCAGCAACATGTGGTGGATGAATGCCAACATGAGCAATATC	#1=DDFFFHHGHHIJIJIJJJJJHIJIJJJJIIJJJJJJJIGIJJJJIJJJJJIIGHJJJHHHGHFFFFEECEEC?@BDDCDEDDDDDDDCCDCDDDCDDE	HI:i:1	NH:i:1	XS:A:-	NM:i:0
SRR3724613.7	133	*	0	0	*	chr10	22333670	0	CCTNNNNNNNNNNNNNNNNNTGTNNNNNNNNNNNNNNNNGCTNNNNNNNNNNNNNNNNCNCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN	<<<#################22@################00<################,#,########################################	HI:i:1	NH:i:1
perseo:data>

See the column 3 in the BAM output that is the chromosome name, the same in the GTF.

TPMCalculator

TPMCalculator -g /gfs/data/genomes/NCBI/IRGSP-1.0/IRGSP-1.0_representative_transcript_exon_2018-11-26.gtf -b subjunc_results.bam -e

Output

_gene.out

perseo:data> head subjunc_results_genes.out
Gene_Id	Chr	Start	End	Length	Reads	TPM	ExonLength	ExonReads	ExonTPM	IntronLength	IntronReads	IntronTPM	UniqueLegth	UniqueReads	UniqueTPM	UniqueExonLength	UniqueExonReads	UniqueExonTPM	UniqueIntronLength	UniqueIntronReads	UniqueIntronTPM
Os01g0100100	chr01	2982	10814	7833	115	8.68579	2935	115	14.0137	4898	0	0	7833	115	8.65548	2935	115	15.0465	4898	0	0
Os01g0100200	chr01	11217	12434	1218	2	0.971454	1127	2	0.6347	91	0	0	305	0	0	305	0	0	0	0	0
Os01g0100300	chr01	11371	12283	913	2	1.29598	810	2	0.883095	103	0	0	0	0	0	0	0	0	0	0	0
Os01g0100400	chr01	12720	15684	2965	25	4.98833	2161	25	4.13759	804	0	0	1794	19	6.24385	1082	19	6.7433	712	0	0
Os01g0100466	chr01	12807	13977	1171	6	3.03133	1074	6	1.99806	97	2	123.241	0	0	0	0	0	0	0	0	0
Os01g0100500	chr01	16398	20143	3746	136	21.4788	2222	129	20.7639	1524	12	47.0646	3746	136	21.4039	2222	129	22.2942	1524	12	98.2904
Os01g0100600	chr01	22840	26891	4052	48	7.00828	1960	46	8.39391	2092	3	8.57151	3488	33	5.57775	1418	31	8.39521	2070	3	18.0911
Os01g0100650	chr01	25860	26423	564	18	18.8813	564	18	11.4145	0	0	0	0	0	0	0	0	0	0	0	0
Os01g0100700	chr01	27142	28643	1502	420	165.432	906	420	165.8	596	2	20.0577	1502	420	164.854	906	420	178.019	596	2	41.8889
perseo:data>

_transcripts.out

perseo:data> head subjunc_results_transcripts.out
Gene_Id	Transcript_Id	Chr	Start	End	Length	Reads	TPM	ExonLength	ExonReads	ExonTPM	IntronLength	IntronReads	IntronTPM
Os01g0100100	Os01t0100100-01	chr01	2982	10814	7833	115	5.8889	2935	115	9.67839	4898	0	0
Os01g0100200	Os01t0100200-01	chr01	11217	12434	1218	2	0.658639	1127	2	0.438348	91	0	0
Os01g0100300	Os01t0100300-00	chr01	11371	12283	913	2	0.878666	810	2	0.609899	103	0	0
Os01g0100400	Os01t0100400-01	chr01	12720	15684	2965	25	3.38205	2161	25	2.85758	804	0	0
Os01g0100466	Os01t0100466-00	chr01	12807	13977	1171	6	2.05522	1074	6	1.37994	97	2	71.9577
Os01g0100500	Os01t0100500-01	chr01	16398	20143	3746	136	14.5625	2222	129	14.3403	1524	12	27.4799
Os01g0100600	Os01t0100600-01	chr01	22840	26891	4052	48	4.75156	1960	46	5.79716	2092	3	5.00471
Os01g0100650	Os01t0100650-00	chr01	25860	26423	564	18	12.8014	564	18	7.88327	0	0	0
Os01g0100700	Os01t0100700-01	chr01	27142	28643	1502	420	112.161	906	420	114.508	596	2	11.7112
perseo:data>

from tpmcalculator.

Moo-cow avatar Moo-cow commented on June 6, 2024

Wow ~
Thanks,

This is my subjunc code
subread-buildindex -o rice /ho me/zyzhang/RNA-seq/data/Reference/IRGSP-1.0_gene_2018-11-26.fasta subjunc -T 15 -i Reference/index/index_subjunc/rice -r 6914_1.fastq.gz -R 6914_2.fastq.gz -o result/aligned_subjunc/6914.bam featureCounts -p -T 64 -a /home/zyzhang/RNA-seq/data/Reference/IRGSP-1.0_representative_transcript_exon_2018-11-26.gtf -o 6914_featureCounts.out 6914.bam TPMCalculator -g /home/zyzhang/RNA-seq/data/Reference/IRGSP-1.0_representative_transcript_exon_2018-11-26.gtf -b 6914.bam

I will try with your procdure.Thank you for helping me patiently.

Best,
zyzhang

from tpmcalculator.

r78v10a07 avatar r78v10a07 commented on June 6, 2024

Hi,
It looks good to me. I don't see anything weird here.

You should use the GTF with subjunc adding these options:

-a IRGSP-1.0_representative_transcript_exon_2018-11-26.gtf -F GTF

Can you check your fasta IRGSP-1.0_gene_2018-11-26.fasta header?

It should have all chromosomes independently:

>chr01 
>chr02
...

Best,
Roberto

from tpmcalculator.

Moo-cow avatar Moo-cow commented on June 6, 2024

Hi ,
You are right .
The IRGSP-1.0_gene_2018-11-26.fasta header is
image
It doesn't match my GTF .I will try with your Genome .
Thank you for your patience.

Best,
zyzhang

from tpmcalculator.

r78v10a07 avatar r78v10a07 commented on June 6, 2024

Hi,
Your fasta doesn't seem to be a Genome, it looks to me as the genetic code of proteins. You should use the genome for the alignment and the GTF for the annotation.
Please, remember to change the header of the chromosomes fasta files before creating the indexes.
Best,
Roberto

from tpmcalculator.

Moo-cow avatar Moo-cow commented on June 6, 2024

Hi,
I got it . I have got good output files by TPMCalculaor.

-rw-rw-r-- 1 zyzhang zyzhang 9423565861 3月 13 22:55 subjunc_2_results.bam
-rw-rw-r-- 1 zyzhang zyzhang 7500797 3月 13 22:55 subjunc_2_results.bam.indel.vcf
-rw-rw-r-- 1 zyzhang zyzhang 10649766 3月 13 22:55 subjunc_2_results.bam.junction.bed
-rw-rw-r-- 1 zyzhang zyzhang 15689097 3月 13 23:30 subjunc_2_results_genes.ent
-rw-rw-r-- 1 zyzhang zyzhang 4608259 3月 13 23:30 subjunc_2_results_genes.out
-rw-rw-r-- 1 zyzhang zyzhang 13312048 3月 13 23:30 subjunc_2_results_genes.uni
-rw-rw-r-- 1 zyzhang zyzhang 443 3月 14 09:55 subjunc_2_results.stat
-rw-rw-r-- 1 zyzhang zyzhang 28265992 3月 13 23:30 subjunc_2_results_transcripts.ent
-rw-rw-r-- 1 zyzhang zyzhang 4941586 3月 13 23:30 subjunc_2_results_transcripts.out

subjunc_2_results_genes.out
Gene_Id Chr Start End Length Reads TPM ExonLength ExonReads ExonTPM IntronLength
gene:Os01g0100100 1 2982 10814 7833 2421 6.64109 2935 2391 12.1541 4898 72
gene:Os01g0100200 1 11217 12434 1218 18 0.31754 1127 18 0.238286 91
gene:Os01g0100300 1 11371 12283 913 18 0.423618 810 18 0.331542
gene:Os01g0100400 1 12720 15684 2965 285 2.06535 2161 285 1.96762 804 0
gene:Os01g0100466 1 12807 13977 1171 135 2.47713 1074 135 1.87534 97 4
gene:EPlOSAG00000007463 1 16182 16269 88 22 5.37171 88 22 3.72984 0 0
gene:Os01g0100500 1 16398 20143 3746 3300 18.9286 2222 3270 21.956 1524 158
gene:Os01g0100600 1 22840 26891 4052 3206 17.0007 1960 3127 23.8025 2092 206
gene:Os01g0100650 1 25860 26423 564 1080 41.145 564 1080 28.569 0 0
gene:Os01g0100700 1 27142 28643 1502 1558 22.288 906 1527 25.1456 596 51

So the problem is that the Genome is wrong . And have to usu the head of the Genome is the same as GTF.

Thank you for your patience these days.
Wish you send more paper.

Best,
zyzhang

from tpmcalculator.

r78v10a07 avatar r78v10a07 commented on June 6, 2024

Hi Zyzhang,
Thank you so much for letting us now.
Please, remember to cite our TPMCalculator paper once you publish your research.
Best,
Roberto

from tpmcalculator.

Moo-cow avatar Moo-cow commented on June 6, 2024

Hi Roberto,
If I publish a paper, I'll quote it affirmatively.
Best,
zyzhang

from tpmcalculator.

VGalata avatar VGalata commented on June 6, 2024

Dear @r78v10a07,

I have the same issue with TMPcalculator - all output files are empty.
I looked into my GTF and BAM files and cannot find anything suspicious.

My set-up

GTF: The original file is from NCBI: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/007/325/GCF_000007325.1_ASM732v1. I decompressed the file and removed features without a gene ID using grep -v 'gene_id ""' (these were exon features of tRNAs and rRNAs).

BAM: The BAM file was created using STAR (version 2.7.3a). However, I used the original GTF file here, i.e. the features w/o a gene ID were not removed because this has not been an issue until now. The BAM file was processed afterwards using samtools (version 1.10):

samtools sort -@ {threads} -n -o {output.sorted} {input} &&
samtools fixmate -@ {threads} -m {output.sorted} {output.fm} &&
samtools sort -@ {threads} -o {output.sc} {output.fm} &&
samtools markdup -@ {threads} -r {output.sc} {output.rm} &&
samtools sort -@ {threads} -n -o {output.srmdup} {output.rm}

The file {output.srmdup} is the one used as input for TMPCalculator. Here are the first three lines:

NB501252:178:HJCCMBGXB:1:11101:1059:13206       99      NC_003454.1     39988   255     74M     =       40035   122     GCTGCTGTTGAATTTCCACTTAAATAGAANTTAAGGGAATCTGGGTTT
TCTACATTTATTTTTGAATTTGACCC      AAAAAEEEEEEEEEEEEEEEEEEEEEEEE#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEE      NH:i:1  HI:i:1  AS:i:146        nM:i:0  MQ:i:255       M
C:Z:75M ms:i:2573
NB501252:178:HJCCMBGXB:1:11101:1059:13206       147     NC_003454.1     40035   255     75M     =       39988   -122    TTCTACATTTATTTTTGAATTTGACCCTGTTGAATATAAAGGAACTGC
ATCTTTTACAAAATTACTTGTAGCCTT     A6EEEEEEEEAEEEEEEEEEEEEEEEEAEEEEEEEEEE/EEEEEEEEEE/EEEAEEEEEEEEEEEAEEEEAAAAA     NH:i:1  HI:i:1  AS:i:146        nM:i:0  MQ:i:255       M
C:Z:74M ms:i:2604
NB501252:178:HJCCMBGXB:1:11101:1060:11521       99      NC_003454.1     46237   255     75M     =       46698   536     CTCCATAATCAAAGCATTTTTTTGCTGCTTCATCTAAATTTTCTCCAA
TTAAAACTGCTATAACTTTCTCATTAT     AAAAAEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE//AEAEA<AE     NH:i:1  HI:i:1  AS:i:146        nM:i:1  MQ:i:255       M
C:Z:75M ms:i:2298

TMPCalculator
I installed the tool via conda using this YAML:

channels:
    - conda-forge
    - bioconda
dependencies:
    - tpmcalculator==0.0.3

The tool was called as follows:

TPMCalculator -g ATCC_25586_norna.gtf -b Fn2611_in_S24_Aligned.out.sorted.dedup.bam -p 2> err.log

The stderr output was:

Reading GTF file ... 
Done in 0.04 seconds
Parsing sample: Fn2611_in_S24_Aligned.out.sorted.dedup
 13188282 reads processed in 118.29 seconds
Printing results
Total time: 118.33 seconds

Thank you in advance!

from tpmcalculator.

r78v10a07 avatar r78v10a07 commented on June 6, 2024

Hi @VGalata ,
I've checked your GTF and it does not include the exons. TPMCalculator use the "exon" keyword in the third column to create the gene model.
In your specific case, you could replace the keyword CDS with exon if what would you like to do is quantify the expression in the coding regions. That will make TPMCalculator able to create the gene model and quantify the data.
Let me know if you need anything else.

from tpmcalculator.

VGalata avatar VGalata commented on June 6, 2024

Hi @r78v10a07,

Thank you very much for the fast reply and the advice!
Replacing the keyword "CDS" with "exon" worked and the files *_genes.(ent|out|uni) are not empty anymore.

from tpmcalculator.

r78v10a07 avatar r78v10a07 commented on June 6, 2024

Hi @VGalata ,
I'm glad to know that it worked for you.
Please, let me know if you need anything else.
Best,
Roberto

from tpmcalculator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.