Giter Site home page Giter Site logo

Comments (4)

HUNNNGRY avatar HUNNNGRY commented on June 19, 2024 1

Yes, you are right, I've checked ENST00000468844.1 and ENST00000477403.1, both of which has no CDS annotaion in genocode.v27.gtf, only transcript and exons.
Currently everything works fine, many thanks for your contribution : )

from bed2gtf.

alejandrogzi avatar alejandrogzi commented on June 19, 2024

Hi @HUNNNGRY ,

Thanks for using bed2gtf and for reporting this bug! I ran some tests here and confirm that this is a bug due to the some error within an iteration. I have fix this, now if you convert your test.bed12 file, it should look like this:

#provider: bed2gtf
#version: 1.8.0
#contact: github.com/alejandrogzi/bed2gtf
#date: 2024-1-3
chrX	bed2gtf	gene	119786504	119791643	.	-	.	gene_id "ENSG00000198918.7";
chrX	bed2gtf	transcript	119786504	119787456	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000477403.1";
chrX	bed2gtf	exon	119787281	119787456	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000477403.1"; exon_number "1"; exon_id "ENST00000477403.1.1";
chrX	bed2gtf	exon	119786504	119786732	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000477403.1"; exon_number "2"; exon_id "ENST00000477403.1.2";
chrX	bed2gtf	transcript	119786506	119791643	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3";
chrX	bed2gtf	exon	119791574	119791643	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3"; exon_number "1"; exon_id "ENST00000361575.3.1";
chrX	bed2gtf	CDS	119791574	119791576	.	-	0	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3"; exon_number "1"; exon_id "ENST00000361575.3.1";
chrX	bed2gtf	start_codon	119791574	119791576	.	-	0	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3"; exon_number "1"; exon_id "ENST00000361575.3.1";
chrX	bed2gtf	exon	119789908	119790011	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3"; exon_number "2"; exon_id "ENST00000361575.3.2";
chrX	bed2gtf	CDS	119789908	119790011	.	-	0	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3"; exon_number "2"; exon_id "ENST00000361575.3.2";
chrX	bed2gtf	exon	119786506	119786732	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3"; exon_number "3"; exon_id "ENST00000361575.3.3";
chrX	bed2gtf	CDS	119786687	119786732	.	-	1	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3"; exon_number "3"; exon_id "ENST00000361575.3.3";
chrX	bed2gtf	stop_codon	119786684	119786686	.	-	0	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3"; exon_number "3"; exon_id "ENST00000361575.3.3";
chrX	bed2gtf	five_prime_utr	119791577	119791643	.	-	0	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3";
chrX	bed2gtf	three_prime_utr	119786506	119786683	.	-	1	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3";
chrX	bed2gtf	transcript	119786506	119791595	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000468844.1";
chrX	bed2gtf	exon	119789908	119791595	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000468844.1"; exon_number "1"; exon_id "ENST00000468844.1.1";
chrX	bed2gtf	exon	119786506	119786732	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000468844.1"; exon_number "2"; exon_id "ENST00000468844.1.2";

Your catch will force a new release of bed2gtf; however, the remaining improvements planned will take 1-2 days because of my responsabilities, delaying the new release by that amount of time. Sorry for any inconvenience.

You can still run the fixed bed2gtf version by:

git clone https://github.com/alejandrogzi/bed2gtf.git && cd bed2gtf

and

cargo run release -- --bed test.bed12 --isoforms test.isoform --output test.gtf

Please let me know if everything works as intended and do not hesitate to communicate another bug/problem. I will let you know when the new release is published (probably tomorrow), otherwise you could star this repo and the notifications will appear automatically in your home page.

Best,
Alejandro

from bed2gtf.

HUNNNGRY avatar HUNNNGRY commented on June 19, 2024

Thank you for your quick reply, the new version seem output correct result for test data.
But I'm wondering why only one CDS/Start_codon/UTR is generated for one gene with multiple transcripts ? I think its common to output all CDS/Start_codon/UTR under each transcripts of a gene (like gencode.v27.gtf downloaded from GENCODE project).
And how did you select the representative transcript for each gene for this version ? (maybe the transcript with longest CDS ? or just random ?)

from bed2gtf.

alejandrogzi avatar alejandrogzi commented on June 19, 2024

@HUNNNGRY,

I am glad to hear that!

But I'm wondering why only one CDS/Start_codon/UTR is generated for one gene with multiple transcripts ? I think its common to output all CDS/Start_codon/UTR under each transcripts of a gene [...] And how did you select the representative transcript for each gene for this version ? (maybe the transcript with longest CDS ? or just random ?)

This happens because of the coordinates in your .bed file. bed2gtf does not choose the longest CDS, the annotation of features is transcript-dependent. Here is the step-by-step explanation:

Your .bed file looks like this:

chrX	119786505	119791643	ENST00000361575.3	0	-	119786683	119791576	0	3	227,104,70,	0,3402,5068,
chrX	119786505	119791595	ENST00000468844.1	0	-	119791595	119791595	0	2	227,1688,	0,3402,
chrX	119786503	119787456	ENST00000477403.1	0	-	119787456	119787456	0	2	229,176,	0,777,

Note that for ENST00000468844.1 and ENST00000477403.1 the CDS start and end are equal:

ENST00000468844.1 [...] 119791595	119791595
ENST00000477403.1 [...] 119787456	119787456

This will not allow bed2gtf to move around exon boundaries to establish frame-corrected CDSs. When those CDS are corrected, for example:

from

ENST00000477403.1 [...] 119787456	119787456

to

ENST00000477403.1 [...] 119786503	119787456

and bed2gtf is run, the output is now complete for that transcript:

#provider: bed2gtf
#version: 1.8.0
#contact: github.com/alejandrogzi/bed2gtf
#date: 2024-1-3
chrX	bed2gtf	gene	119786504	119791643	.	-	.	gene_id "ENSG00000198918.7";
chrX	bed2gtf	transcript	119786504	119787456	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000477403.1";
chrX	bed2gtf	exon	119787281	119787456	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000477403.1"; exon_number "1"; exon_id "ENST00000477403.1.1";
chrX	bed2gtf	CDS	119787281	119787456	.	-	0	gene_id "ENSG00000198918.7"; transcript_id "ENST00000477403.1"; exon_number "1"; exon_id "ENST00000477403.1.1";
chrX	bed2gtf	start_codon	119787454	119787456	.	-	0	gene_id "ENSG00000198918.7"; transcript_id "ENST00000477403.1"; exon_number "1"; exon_id "ENST00000477403.1.1";
chrX	bed2gtf	exon	119786504	119786732	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000477403.1"; exon_number "2"; exon_id "ENST00000477403.1.2";
chrX	bed2gtf	CDS	119786507	119786732	.	-	1	gene_id "ENSG00000198918.7"; transcript_id "ENST00000477403.1"; exon_number "2"; exon_id "ENST00000477403.1.2";
chrX	bed2gtf	stop_codon	119786504	119786506	.	-	0	gene_id "ENSG00000198918.7"; transcript_id "ENST00000477403.1"; exon_number "2"; exon_id "ENST00000477403.1.2";
chrX	bed2gtf	transcript	119786506	119791643	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3";
chrX	bed2gtf	exon	119791574	119791643	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3"; exon_number "1"; exon_id "ENST00000361575.3.1";
chrX	bed2gtf	CDS	119791574	119791576	.	-	0	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3"; exon_number "1"; exon_id "ENST00000361575.3.1";
chrX	bed2gtf	start_codon	119791574	119791576	.	-	0	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3"; exon_number "1"; exon_id "ENST00000361575.3.1";
chrX	bed2gtf	exon	119789908	119790011	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3"; exon_number "2"; exon_id "ENST00000361575.3.2";
chrX	bed2gtf	CDS	119789908	119790011	.	-	0	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3"; exon_number "2"; exon_id "ENST00000361575.3.2";
chrX	bed2gtf	exon	119786506	119786732	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3"; exon_number "3"; exon_id "ENST00000361575.3.3";
chrX	bed2gtf	CDS	119786687	119786732	.	-	1	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3"; exon_number "3"; exon_id "ENST00000361575.3.3";
chrX	bed2gtf	stop_codon	119786684	119786686	.	-	0	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3"; exon_number "3"; exon_id "ENST00000361575.3.3";
chrX	bed2gtf	five_prime_utr	119791577	119791643	.	-	0	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3";
chrX	bed2gtf	three_prime_utr	119786506	119786683	.	-	1	gene_id "ENSG00000198918.7"; transcript_id "ENST00000361575.3";
chrX	bed2gtf	transcript	119786506	119791595	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000468844.1";
chrX	bed2gtf	exon	119789908	119791595	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000468844.1"; exon_number "1"; exon_id "ENST00000468844.1.1";
chrX	bed2gtf	exon	119786506	119786732	.	-	.	gene_id "ENSG00000198918.7"; transcript_id "ENST00000468844.1"; exon_number "2"; exon_id "ENST00000468844.1.2";

Note now that the only transcript that lacks a CDS is ENSG00000198918.7. All these results are consistent with the counterpart tools from USCS's utils (bedToGenePred | genePredToGtf (since bed2gtf uses their core functionatlity).

Hope this helps!

Best,
Alejandro

from bed2gtf.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.