Comments (5)
Just remembered add_gff3_locus_tags.py
. But apparently some entries in the gff file dont get a locus_tag. I'm using this command line:
python3 gff/add_gff3_locus_tags.py -i ../juncus.fasta.transdecoder.refined.sort.gff3 -o ../juncus.fasta.transdecoder.refined.sort.lt.gff3 -p PREFIX -a 10
from biocode.
Home from my conference and travels. Will get to these tickets later today, just FYI.
from biocode.
I have tracked down this issue. The problem is with the library's treatment of genes with multiple isoforms. When it sees more than one mRNA for a particular gene, it's currently spawning off another gene and attaching the mRNA to that one, flattening out the gene/mRNA relationships. I can find no justification of why this was the decided behavior (after about an hour spent tonight searching through fun archives of e-mails with NCBI staff when submitting eukaryotic genomes.)
Your file has 95,646 genes and 120,335 mRNAs, so multiple isoforms are common. What was a little surprising was that the mRNA, CDS and exon count are all 120,335. At first I thought it strange that all your genes were single-exon genes, then realized the source (transdecoder) implied these were from Trinity. So you're doing in this in preparation for tbl2asn running for transcriptome submission.
I'll fix this so that proper gene representation is done when more than one mRNA is present. If you haven't already, it would be good to review the submission guidelines to see if there are any transcriptome-specific format details. I'll be happy to add any you uncover.
from biocode.
Wonderful. Please send me a ping here, then I can try.
I guess @arsilan324 can say about if the the counts of genes, mRNA, CDS, and exons are reasonable.
from biocode.
According to Brian Haas (Transdecoder developer): In the data model of transdecoder, each CDS (and corresponding exon) is tied to it's own mRNA, and a single gene is allowed to produce multiple mRNAs. It doesn't allow for the single mRNA, multi-CDS arrangement (ie. doesn't do operons).
from biocode.
Related Issues (20)
- Syntax error on gff.py HOT 4
- Exclude mRNA features in bacterial TBL exports
- Attribute error for update_selected_column9_values.py HOT 1
- Biocode.gff module error HOT 2
- [convert_genbank_to_gff3.py] key_error: locus_tag HOT 5
- AttributeError: type object 'str' has no attribute 'maketrans' HOT 2
- AttributeError: 'Gene' object has no attribute 'add_CDS' HOT 4
- Insert EC numbers into chado database issue HOT 5
- convert_augustus_to_gff3.py error HOT 6
- Conda based install HOT 2
- convert_gff3_to_ncbi_tbl.py HOT 2
- convert_gff_to_ncbi_tbl.py HOT 3
- Incorrect parent features from convert_tRNAScanSE_to_gff3.pl HOT 2
- Formatting Issue? HOT 2
- biocode error HOT 16
- product info not printout in tbl HOT 2
- fasta/fasta_simple_stats.py fails on any file with only one sequence
- [convert_genbank_to_gff3.py] No Locus_tag present in my genbank file HOT 2
- Python upgrade/conversion? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from biocode.