Comments (7)
A new parameter could be implemented to force to accept a Qualifier
value even if it doesn't respect the specification. I'm not sure to add that. Maybe in a future release.
You can easily fix that yourself:
Uninstall EMBLmyGFF3:
pip unistall EMBLmyGFF3
clone the repo in a nice place:
mkdir ~/git
cd ~/git
git clone https://github.com/NBISweden/EMBLmyGFF3.git
cd EMBLmyGFF3
Comment line 460 of the feature.py
file and save the change (here using the nano text editor but you can use what ever you want):
nano EMBLmyGFF3/modules/feature.py
install:
python setup.py install
or if you do not have administartive rights on your machine:
python setup.py install --user
from emblmygff3.
The warning remains with this modification, but you should have the protein_id qualifier in the output now.
If it is not the case you sould also comment line 443, and un-indent line 444.
from emblmygff3.
e.g https://www.ebi.ac.uk/ena/WebFeat/ (=>CDS)
here the officical recommendation for the protein_id:
Qualifier | protein_id |
---|---|
Definition | protein identifier, issued by International collaborators. this qualifier consists of a stable ID portion (accessioned data before the end of 2018 uses a 3+5 format; from the end of 2018 new accessions may be extended to a 3+7 accession format with 3 position letters and 7 numbers) plus a version number after the decimal point. |
Value Format | |
Example | /protein_id="AAA12345.1" /protein_id="AAA1234567.1" |
Comment | when the protein sequence encoded by the CDS changes, only the version number of the /protein_id value is incremented; the stable part of the /protein_id remains unchanged and as a result will permanently be associated with a given protein; this qualifier is valid only on CDS features which translate into a valid protein. |
Accordingly -PB is the problem
from emblmygff3.
Dear Jacques,
I checked the EMBL file available for a previous assembly provided by Ensembl (available at ftp://ftp.ensemblgenomes.org/pub/metazoa/release-44/embl/aedes_aegypti/). The file contains protein IDs of the same format as in my GFF3 file.
FT CDS 132625..133230
FT /gene="AAEL000117"
FT /protein_id="AAEL000117-PA"
FT /note="transcript_id=AAEL000117-RA"
FT /db_xref="RefSeq_peptide:XP_001657650.1"
FT /db_xref="RefSeq_dna:XM_001657600.1"
FT /db_xref="Uniprot/SPTREMBL:Q17Q75"
FT /db_xref="protein_id:EAT48841.1"
FT [/db_xref="UniParc:UPI0000DA8512"
]
Is there anyway to suppress/override the above warning? I am anyway looking to only transfer annotations using RATT, not sumbit to EMBL.
from emblmygff3.
Dear Jacques,
I followed your instructions and commented out the return statement in feature.py (the 460th line). I'm still getting the same warning.
from emblmygff3.
It works fine, thank you.
I have one more question.
11:09:16 WARNING qualifier: Unknown db_xref 'RefSeq' - skipped.
I have references to external databases that are not supported by INSDC (http://www.insdc.org/db_xref.html), eg. RefSeq, KEGG_Enzyme etc. that I would like to retain. Can I modify the legal_dbxref.json file to include the databases?
from emblmygff3.
Great.
Yes definitly you can do that.
from emblmygff3.
Related Issues (20)
- How to add in comment or CC line HOT 1
- Webin-CLI validation failing due to duplicated feature locations in EMBLmyGFF3 flat file HOT 2
- thank you HOT 1
- ImportError Bio.Alphabet error HOT 1
- What the option "-a accesion" parameter should to be set(means:What type?) HOT 3
- TypeError: read() takes 1 positional argument but 2 were given? HOT 3
- attribute formats broken across several lines HOT 2
- Use of example data HOT 2
- Contigs are disorganised in the EMBL file HOT 10
- Warning qualifier unknown db_xref HOT 1
- Translation problem HOT 2
- Reporting the line number of the problematic input GFF3 files when parsing error is triggered HOT 2
- BioPython 1.81 installs via conda, but needs an older version HOT 2
- if I can specify certain python source during python setup.py install HOT 2
- Not for ENA submission: Sequence too short
- Installation issue EMBLmyGFF3 & python version requirements
- Translation when circular genome and ORF in the cut of the assembly HOT 1
- unexpected keyword argument 'strand' HOT 3
- Gene sorting compared to fasta HOT 4
- Bug when specifying -g mitochrondrion (or plastid)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from emblmygff3.