ncbi / tpmcalculator Goto Github PK
View Code? Open in Web Editor NEWTPMCalculator quantifies mRNA abundance directly from the alignments by parsing BAM files
License: Other
TPMCalculator quantifies mRNA abundance directly from the alignments by parsing BAM files
License: Other
My input GTF and the GFF from which it was derived contains 62417 genes. I ran TPMCalculator on 15 BAM files corresponding to 15 RNA-seq experiments. The number of lines in the file alignments.sorted.pos.uniq_genes.out differs in each case:
31124
33419
33888
34408
34876
36120
36162
36412
37151
37192
37521
38853
40305
40581
40760
There was a repeated message to STDERR saying:
Chromosome with name: XXXXXX does not exist
But I don't know its meaning or whether it is an error or a warning.
Hello,
I am not sure I understand the meaning of the -c flag "-c Smaller size allowed for an intron created for genes. Default: 16. We recommend to use the reads length". Why do you recommend the reads length?
Also, how does the software treat multi-mapping reads, so reads matching multiple locations across the genome (value more than 1 in NH:i: field). Is that normally handled by MAPQ filtering?
All the best,
Agnieszka
Do we need to index or sort the bam before using TPMCalculator ?
Thanks.
Hello everyone, can anyone tell me what the error is?
when building nginx packages with the vidayoot etu oshibku brotley module
objs/addon/static/ngx_http_brotli_static_module.o
objs/ngx_modules.o
-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -pie -ldl -lpthread -lpthread -lcrypt -L/usr/local/src/ngx_brotli/deps/brotli/c/../out -lbrotlienc -lbrotlicommon -lm -lpcre2-8 -lssl -lcrypto -ldl -lpthread -lz
-Wl,-E
/usr/bin/ld: cannot find -lbrotlienc
/usr/bin/ld: cannot find -lbrotlicommon
collect2: error: ld returned 1 exit status
make[2]: *** [objs/Makefile:333: objs/nginx] Error 1
make[2]: Leaving directory '/usr/local/src/nginx-1.24.0/debian/build-nginx'
make[1]: *** [Makefile:10: build] Error 2
make[1]: Leaving directory '/usr/local/src/nginx-1.24.0/debian/build-nginx'
make: *** [debian/rules:52: build-arch.nginx] Error 2
dpkg-buildpackage: error: debian/rules build subprocess returned exit status 2
How could I set the name of the output and specify full path for the output in TPMCalculator tool?
My bam file comes from the STAR. When I run TPMCalculator, I received an error message:
Reading GTF file ...
Done in 0.19 seconds
Parsing sample: R500.Aligned.sortedByCoord.out.bam
Chromosome with name: A01 does not exist
Chromosome with name: A01 does not exist
Chromosome with name: A01 does not exist
Chromosome with name: A01 does not exist
Chromosome with name: A01 does not exist
Chromosome with name: A01 does not exist
...
Thanks for your help!
Hello,
I have used TPMCalculator to quantify RNA expression using bam files obtained using HISAT2.
For some genes I am getting extremely high intronic TPM counts when compared to exon values in the same gene. I have inspected those genes and I found that some introns overlap other genes in the opposite strand, partially or completely (see attached picture). So, when the TPM is calculated, the intron siganl seems to be inflated and doesn't correlate with the exonic values which are very low or zero.
To run TPMCalculator I use this commans:
TPMCalculator -v -g $annotation -k gene_id -t transcript_id -p -b $bam
My question is, when TPM counts the reads per feature, is the counting strand-specific?
Thank you,
Manuel.
I tried ensembl/grch37/release-94.
Didn't work ;(
Hi
I need to have tpm counts for paired end reads mapped with star, should i use option -p (Use only properly paired reads. Default: No. Recommended for paired-end reads.) or by default without it ? whats pros and cons os using it ?
Thanks
Hello, thanks for your good tool!
My problem is some genes are duplicated in *_genes.out result. I used hisat2 to align reads.
TPMCalculator -g ${GRCh38GTF} -b ${BamDir}/${Sample}.bam \
-c 150 -k gene_id -t transcript_id -q 60 -p -a
Then, some genes have multiple records in *_genes.out, for example "ENSG00000135406" and "ENSG00000235538"
# first 7 columns
ENSG00000135406.14#1 chr12 49293251 49293996 746 0 0
ENSG00000135406.14#2 chr12 49295146 49298685 3540 0 0
# ENSG00000235538
ENSG00000235538.3#1 chr6 163671576 163798848 127273 4 0.00366198
ENSG00000235538.3#2 chr6 163927121 164009979 82859 2 0.00281244
ENSG00000235538.3#3 chr6 164228330 164231609 3280 0 0
This is my GTF record:
$ grep "ENSG00000235538" gencode.v35.annotation.gtf | awk -v FS="\t" '$3!="exon" {$NF="";print}'
chr6 HAVANA gene 163671577 164231610 . + .
chr6 HAVANA transcript 163671577 163760739 . + .
chr6 HAVANA transcript 163671603 163716199 . + .
chr6 HAVANA transcript 163671609 163774630 . + .
chr6 HAVANA transcript 163671613 163774630 . + .
chr6 HAVANA transcript 163671641 163774628 . + .
chr6 HAVANA transcript 163671642 163773768 . + .
chr6 HAVANA transcript 163671643 163760376 . + .
chr6 HAVANA transcript 163671658 163760376 . + .
chr6 HAVANA transcript 163671666 163774624 . + .
chr6 HAVANA transcript 163671678 163774626 . + .
chr6 HAVANA transcript 163703904 163759841 . + .
chr6 HAVANA transcript 163748718 163760382 . + .
chr6 HAVANA transcript 163759267 163798849 . + .
chr6 HAVANA transcript 163927122 164009980 . + .
chr6 HAVANA transcript 164228331 164231610 . + .
THis is bug?
Hello,
I found some details about output in your article: "The TPMCalculator output is comprised of five files per sample reporting the TPM values and raw read counts for genes, transcripts, exons and introns."
In my case the TPMCalculator generated three files: _genes.uni, _genes.out and _genes.ent. I found most of the values in the _genes.out file, but there are two other files. What are they need for? Could you give a more detailed description of the TPMCalculator output, please.
Hi, I used TPMCalculator release 0.0.4 and the outputs don't quite match the descriptions on the wiki, could you please clarify what the outputs are?
My command:
TPMCalculator -a -p \
-q 255 \
-g ${gtf} \
-b ${bam}
I get 3 output files per BAM, e.g. Control_1.final_genes.out, Control_1.final_genes.ent, Control_1.final_genes.uni
final_genes.out
seems to match the first output file described, headers are a little different:
Gene_Id Chr Start End Length Reads TPM ExonLength ExonReads ExonTPM IntronLength IntronReads IntronTPM UniqueLength UniqueReads UniqueTPM UniqueExonLength UniqueExonReads UniqueExonTPM UniqueIntronLength UniqueIntronReads UniqueIntronTPM
The _genes.ent
file and _genes.uni
files have these headers - what are the differences and where could I find transcript ID?
Gene_Id Chr Type Type_Number start end Length Reads TPM
Thanks
Hi there,
I installed the BamTools and TPMCalculateor, but when I use make in the TPMCalulator fold, it reported some errors, as below:
"make" -f nbproject/Makefile-Release.mk QMAKE= SUBPROJECTS= .build-conf
make[1]: Entering directory '/home/congxiao/software/TPMCalculator-master'
"make" -f nbproject/Makefile-Release.mk bin/TPMCalculator
make[2]: Entering directory '/home/congxiao/software/TPMCalculator-master'
mkdir -p bin
g++ -g -I /home/congxiao/software/bamtools/include/bamtools/ -L /home/congxiao/software/bamtools/lib64 -Wl,-rpath,/home/congxiao/software/bamtools/lib64 -o bin/TPMCalculator build/Release/GNU-MacOSX/src/DiffExpIR.o build/Release/GNU-MacOSX/src/FastaFactory.o build/Release/GNU-MacOSX/src/RandomFactory.o build/Release/GNU-MacOSX/src/ReadFactory.o build/Release/GNU-MacOSX/src/Stats.o build/Release/GNU-MacOSX/src/TextParser.o build/Release/GNU-MacOSX/src/bd0.o build/Release/GNU-MacOSX/src/bratio.o build/Release/GNU-MacOSX/src/bstring.o build/Release/GNU-MacOSX/src/chebyshev.o build/Release/GNU-MacOSX/src/choose.o build/Release/GNU-MacOSX/src/dnorm.o build/Release/GNU-MacOSX/src/dt.o build/Release/GNU-MacOSX/src/gamma.o build/Release/GNU-MacOSX/src/lbeta.o build/Release/GNU-MacOSX/src/lgamma.o build/Release/GNU-MacOSX/src/lgammacor.o build/Release/GNU-MacOSX/src/main.o build/Release/GNU-MacOSX/src/pbeta.o build/Release/GNU-MacOSX/src/phyper.o build/Release/GNU-MacOSX/src/pnorm.o build/Release/GNU-MacOSX/src/pt.o build/Release/GNU-MacOSX/src/qnorm.o build/Release/GNU-MacOSX/src/qt.o build/Release/GNU-MacOSX/src/stirlerr.o build/Release/GNU-MacOSX/src/sunif.o build/Release/GNU-MacOSX/src/wilcox.o -L../../bamtools/lib -lbamtools -lm -lz
/usr/bin/ld: cannot find -lbamtools
collect2: error: ld returned 1 exit status
make[2]: *** [nbproject/Makefile-Release.mk:89: bin/TPMCalculator] Error 1
make[2]: Leaving directory '/home/congxiao/software/TPMCalculator-master'
make[1]: *** [nbproject/Makefile-Release.mk:85: .build-conf] Error 2
make[1]: Leaving directory '/home/congxiao/software/TPMCalculator-master'
make: *** [nbproject/Makefile-impl.mk:40: .build-impl] Error 2
I installed the bamtools just as your README file, I don't know why.
Thank you!
Hello, I have different BAM files wich corresponds do different tissue transcript sequencing reads.
So my question is the following:
Is there a way to pool all the BAM file, get a TPM value for a list of specific genes (with coordinates), and to know if the genes are expressed globaly (pooled BAM) and (specificaly to certain tissues)?
And also I wondered if TPMCalculator can calculate TPM values for genes but just with start/stop informations ? (I do not have any informations about exon/intron - my genes should be only exons since they are blast hits from a BLASTP analysis)
Thank you very much for your help
Hello!
How i understand that gtf file not contain version number of transcipts. Can you help me, how can i solve this problem without correct bam file?
Hi ,
Thank you for a exceeding useful tool. I am running RNA-seq of Rice now. I use subjunc mapping ,then i use TPMCalculator to run.But the result files don't have anything . I don't know why .I will upload I'd appreciate it if you could help.
The run code .
The gtf file.
IRGSP-1.0_representative_transcript_exon_2018-11-26.gtf.zip
A portion of the bam.
Hi,
I am using TPMCalculator using the following command:
TPMCalculator -g file.gtf -d ./ -a -p -e
In the current directory, I have sorted bam files deriving from HISAT2 alignment tool of RNA reads against the genome.
Here, you will find the head of samtools view on bam file.
Here, follows part of the log file:
Reading GTF file ...
Done in 12.0684 seconds
Parsing sample: trimmed_hisat_out.bam.sorted
0 reads processed in 23.4511 seconds
Printing results
Total time: 42.3443 seconds
Thanks for the support!
Br,
Marco
Hi,
I installed TPMcalculator using conda. I have aligned FASTQs to GRCh38 with bowtie and have obtained GTF from Ensembl. I am not too sure what to make of this error - any help would be great, thanks.
Tracy
[tc6463@gadi-login-02 Scripts]$ /scratch/er01/apps/miniconda3/bin/TPMCalculator -g ../References/GRCh38_v103/Homo_sapiens.GRCh38.103.gtf -b ../RNA_cutadapt_min17_align_Coenen-Stass/HB10-aln-hg38.sorted.dedup.bam
Reading GTF file ...
Done in 163.879 seconds
Parsing sample: HB10-aln-hg38.sorted.dedup/scratch/er01/apps/miniconda3/bin/TPMCalculator: symbol lookup error: /scratch/er01/apps/miniconda3/bin/TPMCalculator: undefined symbol: _ZN8BamTools11SamSequenceC1ERKS0_
Following the README instructions for building TPMCalculator from source fails for a couple of reasons:
1: setting LDFLAGS="-L $BAMTOOLS_DIR/lib64 -Wl,-rpath,$BAMTOOLS_DIR/lib64"
is not correct - the build process for bamtools does not generate a lib64 directory, rather
libbamtools.a is in the lib directory.
2: the TPMCalculator make fails since the final build command contains -L../../bamtools/lib
whereas the instructions have the default dir as /TPMCalculator from where the command
should be -L../bamtools/lib.
For paired-end alignments generated with STAR, the reported ExonReads number is almost exactly double the number that STAR generates using the --quantMode GeneCounts option. Counts generated using kallisto agree closely with those from STAR. This discrepancy does not occur with single-end alignments.
TPMCalculator is a useful tool to calculate expression of RNA-Seq datasets.
I try to install on our Linux system. But it failed.
So, could you provide binary version?
In the installation instructions it says "Use provided Dockerfile based on the BioContainers base image." Is this step necessary? I do not have docker and it is difficult to install on my work system. I do not have sudo access.
Hello,
Thank you for developing this nice tool. I tried using a GFF3 file with the -g flag, but it didn't work as expected. I then converted my GFF3 file to GTF, but I noticed that a significant amount of information was lost in the GTF file. So, I prefer to use the GFF3 file.
I am just wondering when the new version, incorporating the feature to handle GFF3 files, will be released.
Thanks in advance for your nice effort.
Cheers,
Farid
Need some documentation on how to use it, how to install it, etc.
I am getting the following error when I try to run TPMcalculator:
Key for isoform name was not found on GTF line.
Error processing GTF line at Chromosome level:
scaffold11323 maker exon 2123 2371 . - . gene_ID "Ha_G028047"; transcript_ID "Ha_T028047_A"
My GTF doesn't have isoforms and I don't see a way to either specific an isoform key or skip it.
From a gtf where coordinate is chr3 52554724 52554879 for an exon.
Looking into _genes.ent output then for this same exon :
chr3 52554724 52554880
It's always adding 1 to the end coordinate.
Weird, no ?
New feature to be added
I have been comparing HTSeq (htseq-count) counts to TPMcalcualtor UniqueReads counts.
I find them generally to be in very good agreement. However, there are occasionally some cases where HTSeq and TPMcalculator will be in agreement for the "Reads" column of TPMcalculator output. but will disagree in the last 8 columns (TPMcalculator output will report zeros).
For example:
DTO2_FUN_009269 scaffold_6 3072673 3073284 612 366201 9037.3 612 366201 7579.98 0 0 0 0 0 0 0 0 0 0 0 0
This can happen with genes that receive just a few reads or with genes have 10's of thousands of reads aligning. It can happen with intronless genes or with multi-exon genes.
Since I'm using de novo assembly and annotation, I'm wondering if there could be something about my gff file that is tripping up TPMcalculator?
Here's my gff file for the example case above
$ grep DTO2_FUN_009269 ../../final_assemblies/full_assemblies/DTO2/genome.masked.gff3
scaffold_6 funannotate gene 3072674 3073285 . - . ID=DTO2_FUN_009269;
scaffold_6 funannotate mRNA 3072674 3073285 . - . ID=DTO2_FUN_009269-T1;Parent=DTO2_FUN_009269;product=hypothetical protein;
scaffold_6 funannotate exon 3072674 3073285 . - . ID=DTO2_FUN_009269-T1.exon1;Parent=DTO2_FUN_009269-T1;
scaffold_6 funannotate CDS 3072674 3073285 . - 0 ID=DTO2_FUN_009269-T1.cds;Parent=DTO2_FUN_009269-T1;
Hello I'm using TPMcalculator on a cluster, so I installed the software with a BAMtools version installed in a conda environnement (during the installation it gave something like :
**```
export BAMTOOLS_DIR=/path1/mycondaenv/
export CPPFLAGS="-I $BAMTOOLS_DIR/include/bamtools/"
export LDFLAGS="-L $BAMTOOLS_DIR/lib64 -Wl,-rpath,$BAMTOOLS_DIR/lib64"
and then I simply ran a make, so the `/path1/TOOLS/TPMCalculator/bin/TPMCalculator` workes locally but when I try to run the program on a cluster I get the following error message :
` /path1/TOOLS/TPMCalculator/bin/TPMCalculatorr: error while loading shared libraries: libbamtools.so.2.4.0: cannot open shared object file: No such file or directory`
Do you have an idea of what is going on ?
Thank you for your help
I use gtf file from Gencode database.
I found the TPMCalculator would prodeced two or more gene information of one gene id (ENSMUSG00000026162.7#1, ENSMUSG00000026162.7#2).
But the original id of gtf file is ENSMUSG00000026162.7.
Why TPMCalculator output two or more gene id with "#"?
Hi
Follow the protocol to install and get the following error when doing make in TPM calculator folder.
:~/bin/TPMCalculator$ make
"make" -f nbproject/Makefile-Release.mk QMAKE= SUBPROJECTS= .build-conf
make[1]: Entering directory '/home/jean-philippe.villemin/bin/TPMCalculator'
"make" -f nbproject/Makefile-Release.mk bin/TPMCalculator
make[2]: Entering directory '/home/jean-philippe.villemin/bin/TPMCalculator'
mkdir -p build/Release/GNU-MacOSX/src
rm -f "build/Release/GNU-MacOSX/src/DiffExpIR.o.d"
g++ -g -I /home/jean-philippe.villemin/bin/bamtools/include -c -O2 -Iincludes -std=c++14 -MMD -MP -MF "build/Release/GNU-MacOSX/src/DiffExpIR.o.d" -o build/Release/GNU-MacOSX/src/DiffExpIR.o src/DiffExpIR.cpp
src/DiffExpIR.cpp:26:27: fatal error: api/BamReader.h: No such file or directory
#include "api/BamReader.h"
^
compilation terminated.
nbproject/Makefile-Release.mk:92: recipe for target 'build/Release/GNU-MacOSX/src/DiffExpIR.o' failed
make[2]: *** [build/Release/GNU-MacOSX/src/DiffExpIR.o] Error 1
make[2]: Leaving directory '/home/jean-philippe.villemin/bin/TPMCalculator'
nbproject/Makefile-Release.mk:85: recipe for target '.build-conf' failed
make[1]: *** [.build-conf] Error 2
make[1]: Leaving directory '/home/jean-philippe.villemin/bin/TPMCalculator'
nbproject/Makefile-impl.mk:39: recipe for target '.build-impl' failed
make: *** [.build-impl] Error
I know you didn't test TPMCalculator with Gencode gtf in your publication .
That would be great if you could confirm that there no incompatibilities using gtf from Gencode.
What main difference between ucsc and gencode gtf could lead to wrong results ?
From test I have done, it seems to work. I didn't get specific errors with human gtf v25. But still, I didn't go into details to check the counts are well done.
Thanks
--Hi,
is it possible to add an option to output CPM instead of TPM ?
thank you --
I am facing issues with TPMcalculator! First, I noticed that it is counting much less number of genes. Second, among the counted ones there are duplications (same gene_ID in two lines).
I've checked my gtf file and looks fine... What could it be?
My bam file generated from Star, when I used TPMCalculater, the output files are empty. It complains "Chromosome with name: TraesCS4D01G266700.2 does not exist Chromosome" for all the queries. I am not sure what's wrong. Thank you in advance!
btw, I also attached image file for bam, gtf and genome fasta files:
Hello,
I am using tpmcalculator (version 0.0.4) in conda env with a .gff3 annotation file. After running couple times with different -k parameters (first time default, second time ID, third time Parent), I got the same error message "Key gene_id/ID/Parent for gene name was not found on GTF line. Error processing GTF line at Chromosome level:"
The parameters are as following:
tpmcalculator
-g data/annotation/Cs_genes_v2_annot.gff3
-d $input1_dir
-b Aligned.sortedByCoord.out.bam
-p
-k "Parent"
The first couple line of my .gff3 annotation file are:
Chr1 AAFC_NRC gene 1 6504 . - . ID=Csa01g001000;Name=Csa01g001000;Note=methyl-CPG-binding domain 9
Chr1 AAFC_NRC gene 1 6504 . - . ID=Csa01g001000;Name=Csa01g001000
Chr1 AAFC_NRC mRNA 1 6504 . - . ID=Csa01g001000.1;Name=Csa01g001000.1;Parent=Csa01g001000;Note=methyl-CPG-binding domain 9
Chr1 AAFC_NRC five_prime_UTR 6380 6504 . - . ID=Csa01g001000.1.utr5p1;Parent=Csa01g001000.1
Chr1 AAFC_NRC exon 5865 6504 . - . ID=Csa01g001000.1.exon1;Parent=Csa01g001000.1
Does TPMCalculator not work with .gff3 annotation file? or I got some setting wrong when running the program?
Thank you in advance.
Hi,
I was trying to compare TPM counts from different experiments. Because they have different seq. depths, for sanity check, I downsampled the same file and analyzed TPM in both cases (10 million vs 30 million PE fragments).
It seems that at higher depth, there is underestimation of TPM. Is there something I should be adjusting when running the program to fix this?
Thanks!
Hi there,
Compilation seems to fail due to our cluster running a relatively old version of the gcc compiler (maybe)
Error:
make
"make" -f nbproject/Makefile-Release.mk QMAKE= SUBPROJECTS= .build-conf
make[1]: Entering directory `/SAN/vyplab/alb_projects/tools/TPMCalculator'
"make" -f nbproject/Makefile-Release.mk bin/TPMCalculator
make[2]: Entering directory `/SAN/vyplab/alb_projects/tools/TPMCalculator'
mkdir -p build/Release/GNU-MacOSX/src
rm -f "build/Release/GNU-MacOSX/src/DiffExpIR.o.d"
g++ -g -I /home/annbrown/tools/bamtools/include/bamtools/ -c -O2 -Iincludes -I../../bamtools/include/bamtools -std=c++14 -MMD -MP -MF "build/Release/GNU-MacOSX/src/DiffExpIR.o.d" -o build/Release/GNU-MacOSX/src/DiffExpIR.o src/DiffExpIR.cpp
g++: error: unrecognized command line option ‘-std=c++14
Our cluster currenty has
gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36)
Copyright (C) 2015 Free Software Foundation, Inc.
And a brief google of the error returns this SO
https://stackoverflow.com/questions/36245428/c-error-unrecognized-command-line-option-std-c14
which suggest changing
-std=c++14
to
-std=c++1y
However just editing the Makefile-Release.mk(which is the only place I can find calls to -std=c++14, and yes I see the warning I should not edit it ^^) leads to another error
In file included from src/DiffExpIR.cpp:33:0:
includes/Sequence.h: In member function ‘std::pair<std::shared_ptr<sequence::DNA>, bool> sequence::DNAContainer::addElement(std::string)’:
includes/Sequence.h:218:58: error: ‘make_unique’ is not a member of ‘std’
result = sequences.insert(std::make_pair(id, std::make_unique<DNA>()));
^
includes/Sequence.h:218:78: error: expected primary-expression before ‘>’ token
result = sequences.insert(std::make_pair(id, std::make_unique<DNA>()));
^
includes/Sequence.h:218:80: error: expected primary-expression before ‘)’ token
result = sequences.insert(std::make_pair(id, std::make_unique<DNA>()));
^
make[2]: *** [build/Release/GNU-MacOSX/src/DiffExpIR.o] Error 1
make[2]: Leaving directory `/SAN/vyplab/alb_projects/tools/TPMCalculator'
make[1]: *** [.build-conf] Error 2
make[1]: Leaving directory `/SAN/vyplab/alb_projects/tools/TPMCalculator'
make: *** [.build-impl] Error 2
I've got the bamtools installed already as well.
echo $CPPFLAGS
-I /home/annbrown/tools/bamtools/include/bamtools/
echo $LDFLAGS
-L /home/annbrown/tools/bamtools/lib64 -Wl,-rpath,/home/annbrown/tools/bamtools/lib64
cheers and thanks!
AL
How does this package deal with multi-mapped reads?
Hi! Thanks for developing such a useful tool! :)
I've installed it with Bioconda in a linux server and I'm trying to use it to analyze the STAR BAM files from PE reads, but I got two errors.
When running the following command:
nohup /localhome/icmm/kns601/.local/bin/anaconda3/bin/TPMCalculator -g /path/to/gencode.v35.annotation.gtf -d BAMwithMT -p -k gene_id &> nohup_TPMCounts_withMT.out &
I got this in the nohup report:
nohup: ignoring input
Reading GTF file ...
Done in 286.83 seconds
Parsing BAM files
Processing sample: PT979-2_Aligned.sortedByCoord.out
(NUL)(NUL)(NUL)(NUL)(NUL)(NUL)(NUL)(NUL)(NUL)(NUL)(NUL)(NUL)(NUL)(NUL)(NUL)(NUL)(NUL)(NUL)
Chromosome with name: GL000008.2 does not exist
Chromosome with name: GL000008.2 does not exist
Chromosome with name: GL000008.2 does not exist
Chromosome with name: GL000008.2 does not exist
Chromosome with name: GL000008.2 does not exist
Chromosome with name: GL000008.2 does not exist
...
Chromosome with name: KI270757.1 does not exist
Chromosome with name: KI270757.1 does not exist
Chromosome with name: KI270757.1 does not exist
Chromosome with name: KI270757.1 does not exist
Chromosome with name: KI270757.1 does not exist
Chromosome with name: KI270757.1 does not exist
in 2180.64 seconds. Processed 50949830 reads
Processing sample: 2730-006_Aligned.sortedByCoord.outCan't open file BAMwithMT/2730-006_Aligned.sortedByCoord.out.bam
The first problem I had is that its taking the geneIDs as chromosomes, and it prompts the "does not exist" error. It seems that when analyzing the genes from GTF reference that are in the regular gene annotation it doesn't prompt any error, but when arriving to the ones that are outside that annotation gets the geneID as chromosome for some reson.
The other problem is that the following file is not read properly, as the BAMwithMT/2730-006_Aligned.sortedByCoord.out.bam exists in my folder and the program itself has printed the path as I was using the "-d" option.
Can you please help me solving this issues?
Thanks!
Quim
hi~ How do you add threads in Linux?
Hi, I got this error message while compiling TPMcalculator 0.0.4:
export ...
make
...
...
g++ -g -I /share/app/bamtools/2.5.1/include/bamtools/ -L /share/app/bamtools/2.5.1/lib64 -Wl,-rpath,/share/app/bamtools/2.5.1/lib64 -o bin/TPMCalculator build/Release/GNU-MacOSX/src/DiffExpIR.o build/Release/GNU-MacOSX/src/FastaFactory.o build/Release/GNU-MacOSX/src/RandomFactory.o build/Release/GNU-MacOSX/src/ReadFactory.o build/Release/GNU-MacOSX/src/Stats.o build/Release/GNU-MacOSX/src/TextParser.o build/Release/GNU-MacOSX/src/bd0.o build/Release/GNU-MacOSX/src/bratio.o build/Release/GNU-MacOSX/src/bstring.o build/Release/GNU-MacOSX/src/chebyshev.o build/Release/GNU-MacOSX/src/choose.o build/Release/GNU-MacOSX/src/dnorm.o build/Release/GNU-MacOSX/src/dt.o build/Release/GNU-MacOSX/src/gamma.o build/Release/GNU-MacOSX/src/lbeta.o build/Release/GNU-MacOSX/src/lgamma.o build/Release/GNU-MacOSX/src/lgammacor.o build/Release/GNU-MacOSX/src/main.o build/Release/GNU-MacOSX/src/pbeta.o build/Release/GNU-MacOSX/src/phyper.o build/Release/GNU-MacOSX/src/pnorm.o build/Release/GNU-MacOSX/src/pt.o build/Release/GNU-MacOSX/src/qnorm.o build/Release/GNU-MacOSX/src/qt.o build/Release/GNU-MacOSX/src/stirlerr.o build/Release/GNU-MacOSX/src/sunif.o build/Release/GNU-MacOSX/src/wilcox.o -L../../bamtools/lib -lbamtools -lm -lz
build/Release/GNU-MacOSX/src/ReadFactory.o: In function `ngs::ReadFactory::ReadFactory()':
/software/TPMCalculator/src/ReadFactory.cpp:39: undefined reference to `BamTools::SamHeader::SamHeader(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
build/Release/GNU-MacOSX/src/ReadFactory.o: In function `ngs::ReadFactory::processReadsFromBAM(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, unsigned short, unsigned short)':
/software/TPMCalculator/src/ReadFactory.cpp:349: undefined reference to `BamTools::BamReader::Open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
collect2: error: ld returned 1 exit status
make[2]: *** [bin/TPMCalculator] Error 1
make[2]: Leaving directory `/software/TPMCalculator'
make[1]: *** [.build-conf] Error 2
make[1]: Leaving directory `/software/TPMCalculator'
make: *** [.build-impl] Error 2
OS: CentOS Linux release 7.8.2003 (Core)
GCC: gcc version 7.5.0 (GCC)
Any idea how to solve this problem? Thanks!
I noticed that if using multiple bam files it generates a table '_data_per_samples.txt' of TPM values for each gene/transcript for each sample. In order to replace FeatureCounts in a workflow, was wondering if it be useful to have a similar table of read counts generated?
That way you could use the TPM table to identify genes/transcripts that had low TPM across samples (e.g. you only want to analyze genes that had a TPM of 2 or greater in 75% of your samples), filter out those genes from the read counts table and then go onto some differential gene analysis directly. Wouldn't that save a step of having to merge '_gene.out' files across each sample if you wanted read counts?
Maybe a simple table such as:
Gene_Id
Chr
sample1_Count_Reads
sample2_Count_Reads
sample3_Count_Reads
Hi,
running TPMCalculator installed with bioconda) on a directory: how do I get / create the output file as described in the "Outputr file section"?
command I used:
TPMCalculator -g GENOME.gtf -d BAMS/ -e
-> with all bam files in the filder BAM.
It worked fine :
Reading GTF file ...
Done in 8.92957 seconds
Parsing BAM files
...
Printing results
BUT I did not find in the usage then how to create the
Output file: [genes|transcripts]_data_per_samples.txt
Any help appreiated!!
Cheers Karen
Dear TPMCalculator team,
Thank you for a very useful tool, but I found that it produces a different number of genes for different BAM files with the same GTF file. It seems that the tool output only genes with a non-zero count. This is a wrong way - for example, one produced 10 bam files and calculated TPMs, next week he produced another 5 BAM files and calculated TPMs for them but it is very difficult to join all these 15 TPM count files because they have a different number of lines! The genes with non-zero counts should be present in the output and should always be in the same order based on the GTF file. It makes much easy to join them with just a "paste" Linux command.
best regards,
Andrey
Hello, first of all thank you for the program it is very usefull for my work !
I installed TPMCalculator with conda and mapped transcriptomic data against a genome, wich gave me :
- 1 Transcriptomic_map.bam
and I also create a GTF file format :
-2 GTF_candidates..gtf
Then I used the code :
/data/anaconda3/envs/my_python3.6_env/bin/TPMCalculator -g /data/Genomes/Species/Mapping/Transcriptomic_reads/GFF_candidates.gtf -b /data/these/Genomes/Species/Mapping/Transcriptomic_reads/Transcriptomic_map.bam -e -a
but here I get the following issue :
Reading GTF file ...
Done in 0.005154 seconds
Parsing sample: mapping_Species
/var/spool/slurmd/job1928015/slurm_script: line 46: 77660 Segmentation fault /data/anaconda3/envs/my_python3.6_env/bin/TPMCalculator -g /data/Genomes/Species/Mapping/Transcriptomic_reads/GTF_candidates.
gff -b /data/Genomes/Species/Mapping/Transcriptomic_reads/Transcriptomic_map.bam -e -a
I already used the program with other genomes and mapping and everything worked well, i do not understand what is going on with this one ?
Thank you for your help
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.