nbisweden / emblmygff3 Goto Github PK
View Code? Open in Web Editor NEWAn efficient way to convert gff3 annotation files into EMBL format ready to submit.
License: GNU General Public License v3.0
An efficient way to convert gff3 annotation files into EMBL format ready to submit.
License: GNU General Public License v3.0
If l1 (e.g. gene) and l2 feature (e.g. mRNA) are missing for several CDS that must be collectively linked (one CDS several position in the EMBL file), the tool create one EMBL CDS feature per GFF CDS feature.
Would be nice to deal with that.
To deal with such case, currently we need to run agat_sp_gxf_to_gff.pl
from AGAT to create the missing L1, L2 features.
Traceback (most recent call last):
File "/opt/6.x/python-2.7.2/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/opt/6.x/python-2.7.2/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/projet/fr2424/sib/lgueguen/git/gmod/EMBLmyGFF3/EMBLmyGFF3/__main__.py", line 4, in <module>
main()
File "/projet/fr2424/sib/lgueguen/git/gmod/EMBLmyGFF3/EMBLmyGFF3/EMBLmyGFF3.py", line 1277, in main
writer.set_organelle( args.organelle )
File "/projet/fr2424/sib/lgueguen/git/gmod/EMBLmyGFF3/EMBLmyGFF3/EMBLmyGFF3.py", line 964, in set_organelle
organelle = self._verify( self.organelle, "organelle")
AttributeError: 'EMBL' object has no attribute 'organelle'
Hi,
Thanks for developing this tool. It would be very useful for me.
Regarding the requirement to write the locus tag in the command (--locus_tag MY_LOCUS_TAG), do you mean the locus tag prefix, as described here? So, for Caenorhabditis elegans, this would be CELE because all gene features start with CELE?
Hi,
There are some rules for EBI about the locus tag:
https://ena-docs.readthedocs.io/en/latest/faq/locus_tags.html
I used your script to create a EMBL flat file but for each locus, _LOCUSXX
(with XX
a number) is added after my prefix, example : PRE_LOCUSXX
In their example, a locus tag would be /locus_tag="BN5_00001"
. Therefore, is PRE_LOCUS
will be considered as a prefix by EBI and then refused because of this rule ?
All characters must be alphanumeric with none such as -_*
issue reported by an user:
When i used the option --translate, some CDSs were translated error in embl file. For instance
In the gff3
Chr_1 AUGUSTUS gene 55249 56486 0.84 - . ID=g13
Chr_1 AUGUSTUS transcript 55249 56486 0.84 - . ID=g13.t1;Parent=g13
Chr_1 AUGUSTUS stop_codon 55249 55251 . - 0 Parent=g13.t1
Chr_1 AUGUSTUS intron 55679 55753 0.95 - . Parent=g13.t1
Chr_1 AUGUSTUS intron 55904 55957 0.94 - . Parent=g13.t1
Chr_1 AUGUSTUS intron 56015 56069 1 - . Parent=g13.t1
Chr_1 AUGUSTUS intron 56228 56296 1 - . Parent=g13.t1
Chr_1 AUGUSTUS intron 56394 56472 0.99 - . Parent=g13.t1
Chr_1 AUGUSTUS CDS 55249 55678 0.91 - 1 ID=g13.t1.cds;Parent=g13.t1
Chr_1 AUGUSTUS CDS 55754 55903 0.94 - 1 ID=g13.t1.cds;Parent=g13.t1
Chr_1 AUGUSTUS CDS 55958 56014 0.94 - 1 ID=g13.t1.cds;Parent=g13.t1
Chr_1 AUGUSTUS CDS 56070 56227 1 - 0 ID=g13.t1.cds;Parent=g13.t1
Chr_1 AUGUSTUS CDS 56297 56393 0.99 - 1 ID=g13.t1.cds;Parent=g13.t1
Chr_1 AUGUSTUS CDS 56473 56486 1 - 0 ID=g13.t1.cds;Parent=g13.t1
Chr_1 AUGUSTUS start_codon 56484 56486 . - 0 Parent=g13.t1
# protein sequence = [MSLIRDSGPRRLVDGFWEYGRYYGSWRPRKYLFPIDAEELNRMDIFHKFFLVARDEALFASPLDPNRDQPLRILDLGT
#GTGIWAINVAEVTAVPPEIMVVDLHQIQPALIPLGISPLQFDIEEASWEPLMKDCDLVHIRMLYGSIQTDLWPDIYHKTFEHLKPGSGYIEHIEIDWV
#PRWDGNDVPPESSLHEWSQLLLRGLDRFNRNARIDVGEVRITLDKAGFVDFREETIRCYVNPWSSERREREIARWFNLGLSQCLEAMSLMPMIEGLSM
# TKEQVKELCDRAKKEICILRYHAYMTL]
In the converted embl
CDS complement(join(55249..55678,55754..55903,55958..56014,
FT 56070..56227,56297..56393,56473..56486))
FT /locus_tag="LOCUSTAG_LOCUS13"
FT /codon_start=1
FT /note="source:AUGUSTUS"
FT /note="ID:g13.t1.cds"
FT /translation="CP*LEIQGQGVL*MDFGSMAGIMAHGDRGSICSRLTRRNLTGWTS
FT FTSSS*LLETKLYLPPHWTRTGTNPFEYLILELVPEYGPLMLQK*LLFHRRSWLWISIR
FT FSQPSFPSVFLPYNLTSKKHHGSL**KIATWCTYECSMAVSRPICGQIYTIKLSNI*SL
FT GLDT*NTLKSIGCPGGMETTSRPSHRCMNGPSYYCEAWIVSTGMPELMWGKFE*PSTRP
FT GSSISEKRPFGAT*THGPRSVVSGKLRDGSTSGFLNVSRR*V*CP**RG*V*PKNKSRS
FT SVTGPKRRFAYCAITLI*RC"
FT /transl_table=1
Any suggestions?
Thanks
Edison
Hello! I have many ncRNA features (lncRNA,snoRNA,snRNA, etc.) in my GFF3 file. According to your instructions I included the following in the translation_gff_feature_to_embl_feature.json
file:
"ncRNA_gene": {
"target": "ncRNA",
},
"snoRNA": {
"target": "ncRNA",
},
"lnc_RNA": {
"target": "ncRNA",
},
I am getting a new warning saying,
WARNING feature: The qualifier >ncRNA_class< is mandatory for the feature >ncRNA<. We will not report the feature.
I'm not quite sure how (and where) to add this qualifier ncRNA_class, seeing that it changes according to feature, ie., if the feature is a lnc_RNA, then it will map to ncRNA in the EMBL file but the ncRNA_class will be lncRNA, snoRNA -> ncRNA_class:snoRNA etc.
I'm working with yeasts and I really need their embl files, but when i run the programm (using bash and python), i encounter with the next problem:
The current warnings pops on the terminal (though I dont think they are the cause of the problem):
17:25:17 ERROR feature: >>trna<< is not a valid EMBL feature type. You can ignore this message if you don't need the feature.
Otherwise tell me which EMBL feature it corresponds to by adding the information within the json mapping file.
17:25:17 WARNING feature: Unknown qualifier 'NAME' - skipped
17:25:17 ERROR feature: >>trna_exon<< is not a valid EMBL feature type. You can ignore this message if you don't need the feature.
Otherwise tell me which EMBL feature it corresponds to by adding the information within the json mapping file.
17:25:30 ERROR feature: >>UTR<< is not a valid EMBL feature type. You can ignore this message if you don't need the feature.
Otherwise tell me which EMBL feature it corresponds to by adding the information within the json mapping file.
Conversion done
And, as it says on the final line, the conversion is done, when I open the generated embl the features are fine, but the sequence is all interrogation marks.
FT /transl_table=12
XX
SQ Sequence 2596028 BP; 0 A; 0 C; 0 G; 0 T; 2596028 other;
?????????? ?????????? ?????????? ?????????? ?????????? ?????????? 60
?????????? ?????????? ?????????? ?????????? ?????????? ?????????? 120
And if I keep scrolling, it is as if the conversion had started again:
???????? 2596028
//
ID XXX; XXX; linear; genomic DNA; XXX; XXX; 2596667 BP.
XX
AC XXX;
XX
AC * SOME_YEAST
XX
PR Project:XXX;
After that the only existing feature is "gap" and the sequence (SQ) is now like it is suppossed to be:
FT gap 2556681..2556981
FT /estimated_length=301
XX
SQ Sequence 2596667 BP; 806943 A; 475017 C; 477105 G; 804281 T; 33321 other;
AATCTGCTCA GTAAGGCCCA TAAATCGGCT CTGCATTTCT TCTGTGGGCA TTTTGCCGTA 60
CTTTTTTAAT TATGTTGCAG ACGAAACTGA ATCAAGCTCG TCGACAGCTT CGTACAGCCT 120
I have no idea why this would happen, I really hope you can help me figure out what is happening,
I really need those emb files.
It would be nice to implement a progress bar. Using tqdm ?
I don't have an issue, I am just really grateful you wrote this!
I was so frustrated trying to generate a gap file and it was so much easier to convert it directly with the annotation to EMBL format using your scrpit! So, thank you!
Hi Jacques
Could you please add some functions to convert embl to gbk?
Best
Edison
Hi,
I've run EMBLmyGFF3 on a cluster in the following way:
EMBLmyGFF3 c_elegans.PRJNA13758.WS263.annotations.gff3 c_elegans.PRJNA13758.WS263.genomic.fa --topology linear --molecule_type 'genomic DNA' --transl_table 1 --species 'Caenorhabditis elegans' --locus_tag CELE --project_id PRJNA13758 -o c_elegans.PRJNA13758.WS263.annotations.embl
I have checked that I have Python 2.6, Biopython 1.67 and bcbio-gff 0.6.4. I can successfully pull out the help from EMBLmyGFF3.
I got the following output from my command above:
Traceback (most recent call last):
File "/nfs/users/nfs_c/user/anaconda3/envs/python2env/bin/EMBLmyGFF3", line 11, in <module>
load_entry_point('EMBLmyGFF3==1.2.3', 'console_scripts', 'EMBLmyGFF3')()
File "/nfs/users/nfs_c/user/anaconda3/envs/python2env/lib/python2.7/site-packages/EMBLmyGFF3-1.2.3-py2.7.egg/EMBLmyGFF3/EMBLmyGFF3.py", line 1275, in main
for record in GFF.parse(infile, base_dict=seq_dict):
File "build/bdist.linux-x86_64/egg/BCBio/GFF/GFFParser.py", line 742, in parse
File "build/bdist.linux-x86_64/egg/BCBio/GFF/GFFParser.py", line 322, in parse_in_parts
File "build/bdist.linux-x86_64/egg/BCBio/GFF/GFFParser.py", line 343, in parse_simple
File "build/bdist.linux-x86_64/egg/BCBio/GFF/GFFParser.py", line 637, in _gff_process
File "build/bdist.linux-x86_64/egg/BCBio/GFF/GFFParser.py", line 667, in _lines_to_out_info
File "build/bdist.linux-x86_64/egg/BCBio/GFF/GFFParser.py", line 189, in _gff_line_map
File "build/bdist.linux-x86_64/egg/BCBio/GFF/GFFParser.py", line 89, in _split_keyvals
KeyboardInterrupt
Terminated
I submitted the command (in a .sh file) as a batch job with 20 MB memory on a GNU/Linux machine.
It would be nice to make the tool python 3 compliant.
Hi,
thank you for this great tool! It is of great help!
I would like to ask if it would be possible to introduce a new parameter with which the tool uses an already existing attribute as locus_tag, e.g. the ID or just locus_tag itself.
Thank You
Best Regards
Nadine
when a sequence from the gff is not found within the provided fasta file, it will create string of ????
as sequence. the length will be related to the end position of the last feature of the missing sequence.
Would be nice to inform the user that a potential de-synchronization of the sequence names occurred between the gff and the fasta file.
In top of that, using --translate
option will raise an error due to ???
codon does not exist. So if a warning before that error is displayed it could help the user to understand its problem.
Hi,
I've stumbeled upon another thing. In the mithochondrium of our organism not all stop codons are used as such. More precisely TGA is used to code for tryptophan and not as a stop codon. So each time a gene has a TGA inside I get the error "Stop codon found within the CDS...". Would it be possible to exclude certain stop (or even start) codons?
Thanks in advance
Best Regards,
Nadine
Hi,
It seems to me that the --locus_numbering_start parameter
is not working.
I provide an integer for this parameter (e.g. --locus_numbering_start 30
) but it is not taken into consideration.
Could this be related to a 10 step increment in my gff3 input file coming from Prokka?
Thanks in advance, best
Hello again. I'm getting the following warning:
WARNING feature: The value(s) ['AAEL012102-PB'] is(are) invalid for the qualifier protein_id of the feature CDS. We will not report the qualifier. (Here is the regex expected: [a-zA-Z]{3}[0-9]{5}\.[0-9]+)
I guess the hyphen in the name is causing an issue? All the protein IDs in my GFF3 file have a hyphen and end up triggering this error (after a point the program just gets tired of them and quits printing them). I would like to preserve this information in my EMBL file, can you suggest a fix?
I have a translation table of 4 for mitochondria but when create embl file it says there is a conflict between species translation table 1, how do I set organelle to get round this?
organelle = self._verify( self.organelle, "organelle")
AttributeError: 'EMBL' object has no attribute 'organelle'
Hi,
in testing EMBLmyGFF3
to prepare an embl file with genome annotation for submission to EMBL, I'm getting the following error:
EMBLmyGFF3 scaffold1.gff scaffold1.fa --topology linear --transl_table 1 --molecule_type 'genomic DNA' --species 'Salix viminalis' --locus_tag TEST --project_id PRJEB00001 --de 'Single-molecule assembly' -o scaffold1.embl
#############################################################################
# NBIS 2018 - Sweden #
# Authors: Martin Norling, Niclas Jareborg, Jacques Dainat #
# Please visit https://github.com/NBISweden/EMBLmyGFF3 for more information #
#############################################################################
12:25:01 WARNING feature: Unknown qualifier 'makerName' - skipped ]
12:25:01 WARNING feature: Unknown qualifier '_QI' - skipped
12:25:01 WARNING feature: Unknown qualifier '_AED' - skipped
12:25:01 WARNING feature: Unknown qualifier '_eAED' - skipped
12:25:01 ERROR qualifier: local variable 'new_value' referenced before assignment
Traceback (most recent call last):
File "/opt/pyenv/versions/2.7.10/envs/EMBLmyGFF3_venv/lib/python2.7/site-packages/EMBLmyGFF3/modules/qualifier.py", line 88, in _by_value_format
formatted_value=new_value
UnboundLocalError: local variable 'new_value' referenced before assignment
I'm just using one scaffold for testing purposes. The head
of the GFF file looks like this:
##gff-version 3
scaffold1 repeatmasker match 2 581 896 + . ID=scaffold1:hit:709:1.3.0.0;Name=species:rnd-4_family-62|genus:Unspecified;Target=species:rnd-4_family-62|genus:Unspecified 253 760 +
scaffold1 maker gene 444 3240 . + . ID=salix_viminalisG00000000001;Name=at3g47200_37;makerName=genemark-scaffold1-processed-gene-0.6
scaffold1 maker mRNA 444 3240 . + . ID=salix_viminalisT00000000001;Parent=salix_viminalisG00000000001;Dbxref=PFAM:PF03140,InterPro:IPR004158;Name=at3g47200_37;_AED=0.25;_QI=0|0|0|0.25|1|1|4|0|412;_eAED=0.25;makerName=genemark-scaffold1-processed-gene-0.6-mRNA-1;product=UPF0481 protein At3g47200
scaffold1 maker exon 444 618 . + . ID=salix_viminalisE00000000001;Parent=salix_viminalisT00000000001;makerName=genemark-scaffold1-processed-gene-0.6-mRNA-1:1
scaffold1 maker CDS 444 618 . + 0 ID=salix_viminalisC00000000001;Parent=salix_viminalisT00000000001;makerName=genemark-scaffold1-processed-gene-0.6-mRNA-1:cds
scaffold1 maker exon 827 912 . + . ID=salix_viminalisE00000000002;Parent=salix_viminalisT00000000001;makerName=genemark-scaffold1-processed-gene-0.6-mRNA-1:2
scaffold1 maker CDS 827 912 . + 2 ID=salix_viminalisC00000000001;Parent=salix_viminalisT00000000001;makerName=genemark-scaffold1-processed-gene-0.6-mRNA-1:cds
scaffold1 maker exon 2167 3072 . + . ID=salix_viminalisE00000000003;Parent=salix_viminalisT00000000001;makerName=genemark-scaffold1-processed-gene-0.6-mRNA-1:3
scaffold1 maker CDS 2167 3072 . + 0 ID=salix_viminalisC00000000001;Parent=salix_viminalisT00000000001;makerName=genemark-scaffold1-processed-gene-0.6-mRNA-1:cds
Note that I have a local translation_gff_feature_to_embl_feature.json
mapping "match" to "repeat_region":
"match": {
"target": "repeat_region"
},
The head of the fasta file is as:
>scaffold1
TAAAATAAAAAAAATCGGGTCGGGCCCAACTAAATGGGCCGGCTAGCCAAGATGGGCCAAAGCCCAATTTTAATGGGCTGGGCCGAGAGCCGTCCAGCCC
AAGACACGCAGAAGAGGAAAAAAAGAAAAGGGCAAAACGGCACTGTTTAGCACATGTTAATTAAACAGTTTACGTATTTCGTGAACAGTAAAATGGTGGT
CGGCCGACCACGACGAGAGGGTCACCTGCTATTGCCGCCGGGTAGAGGAGGTCGAGGTGGTTGTCCCTGTGGTTGTGGAGTCGAAAATGGTGGCCCGTGG
CGGCCGGAGGAGGCGTTGGAAGTGGCCGGTCTGTTGCTTGCTGTCGTCGGGGCTGTCACTGTTTCTTCGCCGGAGAGGACGACTGAGCTGCTGGAGTTGA
GGGGAGGCTGAAGGTGGTGATGAGGGTGGATATGGGTGGTTGAATGGTGGCTGTTGGAGGAGAGAGAGAGACGCCGGGTCCTCTGGTTTTAGAGAGAGAA
TGCTGTCGGGGAGAGAGAAAGGAGCTGCAACAGGCTGAGAGACGAAGGAGAGAGAGAGAGAAAGGGCTGCTGTGTCGCCGGAGCTGGAGAGGAAGAAAGG
GTGGCTGCCTCTGCGTGTGTATGCTTGTGTTCTGCAAATTTACCACGTCTTCGTCTTCCTCCTCCAGCCTTAATTTGAAACTGAAACTAAAATATTCGCC
TCTGTTCTCTCAAAACTTCTCAGTTTCTTCCTTGCTTTTCTTTGCCCAAATTTCTGTCGATTTTCCTCCCGTTTTTTCTCCCTTCTTTCTCCCCCTTCTG
CATGCATTTCATGCATGTATTTATAGGTTTGAAAGGCAACCCTTCAGCTGCCCATGGCGTGCAGCGAAGGGTTGCCGCCTGTGATTGCAGGTGGCGTGCC
Also, I installed EMBLmyGFF3
in a virtual environment, but when I run the maker example with the command EMBLmyGFF3-maker-example
, I do get the correct output EMBLmyGFF3-maker-example.embl
and without errors.
I can't seem to spot possible formatting errors in the input files but I'm just using this tool for the first time so there may be something that I'm missing.
Any help would be kindly appreciated.
Many thanks,
Pedro
The following points should be checked on the locus tag prefix given with option -i / --locus_tag (required when the locus-tag is registered at ENA):
A locus tag prefix must have the following format:
starts with a letter is at least 3 characters long is upper case contains only alpha-numeric characters and no symbols such as -_*
Hi, excited to see something that may make things easier. bit of a nightmare otherwise. I have ID's within my Gff and was expecting them to be used for the locus tags but they are not and sequential numbers are instead. A note is created of the ID which I think would be better if just the locus_tag became the ID as I think that is it's purpose. I don't have gene names in gff but ideally a tab file of gene names could be given to add these to the resulting embl file too, as this is the likely starting point of having gene names available. I would also like to parse the exon number after the : and add this in, although I don't think this is essential. I'm still trying to work out the ENA format requirements for submission. I think I could just have a locus tag as the minimum feature and what I'm working towards. The webin validation tool complains about overlapping UTR and CDS features of two genes in the same direction. Could a correction part be added to cleave UTR and correct gene when detects this? As I have to work out how to fix this and start again. I know of a script somewhere that will do the cleaving of UTR at least. Sorry a few change requests or otherwise I'll try to make the changes myself when time but harder when don't know the code.
FT mRNA join(433449..433533,433946..434073,434612..434836,
FT 435438..435904)
FT /locus_tag="SPEXI_LOCUS1"
FT /note="source:maker"
FT /note="ID:SPEXI_01T000001"
FT CDS join(433449..433533,433946..434073,434612..434836,
FT 435438..435710)
FT /locus_tag="SPEXI_LOCUS1"
FT /note="source:maker"
FT /note="ID:SPEXI_01T000001:cds"
FT /transl_table=1
FT exon 433449..433533
FT /locus_tag="SPEXI_LOCUS1"
FT /note="source:maker"
FT /note="ID:SPEXI_01T000001:1"
Currently EMBLmyGFF3 joins intron entries. Here is an example of joined introns:
FT intron join(6625..6675,6797..6841,6924..6966,7119..7161,
FT 7245..7286,7423..7476,7630..7673,7750..7962,8110..8158,
FT 8225..8265,8365..8407)
FT /locus_tag="LOCUS1"
FT /note="source:AUGUSTUS"
As this does not make sense biologically, this issue should be fixed in later versions.
Hi,
I've followed the instructions to install EMBLmyGFF3 with git on Mac. I got the following error:
Download error on https://pypi.org/simple/bcbio-gff/: [Errno 54] Connection reset by peer -- Some packages may not be found! Couldn't find index page for 'bcbio-gff' (maybe misspelled?) Scanning index of all packages (this may take a while) Reading https://pypi.org/simple/ Download error on https://pypi.org/simple/: [Errno 54] Connection reset by peer -- Some packages may not be found! No local packages or working download links found for bcbio-gff==0.6.4 error: Could not find suitable distribution for Requirement.parse('bcbio-gff==0.6.4')
Does this mean I have to install bcbio myself before installing EMBLmyGFF3?
Cheers.
Thanks for this nice tool. I'm running into an issue trying to validate embl files that were generated on your tool. I'm using webin-cli-1.7.1 and it throws up the below error when I try to validate/submit the embl files
ERROR: "tRNA" Features locations are duplicated - consider merging qualifiers.
The command-line I used is this:
EMBLmyGFF3 test/6666666.419437.gff test/6666666.419437.contigs.fa -o test/test_new.embl
Any help in this regard would be highly appreciated
Hi there,
I installed EMBLmyGFF3 via conda. When I try to run it, I get an error:
Traceback (most recent call last):
File "/home/tagirdzh/miniconda/envs/EMBLmyGFF3/bin/EMBLmyGFF3", line 6, in <module>
from EMBLmyGFF3 import main
File "/home/tagirdzh/miniconda/envs/EMBLmyGFF3/lib/python3.8/site-packages/EMBLmyGFF3/__init__.py", line 3, in <module>
from .EMBLmyGFF3 import *
File "/home/tagirdzh/miniconda/envs/EMBLmyGFF3/lib/python3.8/site-packages/EMBLmyGFF3/EMBLmyGFF3.py", line 4, in <module>
from .modules.feature import Feature
File "/home/tagirdzh/miniconda/envs/EMBLmyGFF3/lib/python3.8/site-packages/EMBLmyGFF3/modules/feature.py", line 10, in <module>
from Bio.Alphabet.IUPAC import *
File "/home/tagirdzh/miniconda/envs/EMBLmyGFF3/lib/python3.8/site-packages/Bio/Alphabet/__init__.py", line 20, in <module>
raise ImportError(
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the ``molecule_type`` as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.
Thanks!
emblmygff3 1.2.3 has requirement biopython==1.67, but you'll have biopython 1.72 which is incompatible.
Hello,
Thanks for making a very useful tool, I'm glad to have come across it.
There is an issue I'm having with line-wrapping. EMBLmyGFF3 appears to be wrapping lines at 80 characters. This creates problems with longer qualifier values, e.g. product names, because they are broken across several lines, often in the middle of a word. When one later runs the ENA flat file validator tool with the "fix" option, it unwraps the line but adds a space, so now the unwrapped name is broken.
Would it be possible to add an option to turn off line wrapping? Thank you!
-- Brandon
Hi,
I need to add more than one publication, how would this be possible? I already tried to reuse the --ra, --rt, --rl parameter for each publication, but only the last one will be used. Of course I could do it manually. but the RP field won't be filled automatically anymore and I would have to do it severeal times which can be very time consuming.
Thank you very much in advance.
Best Regards,
Nadine
Hi,
I really like the tool but, there is a little problem when I convert my .gff file to .embl file. I cannot see the annotation information in the output file format. Rest it seems okay. I am copying a subset of my data as text file in this email for your review. Can you please comment?
line 607 of feature.py has to be replaced by
translated_seq = str(seq.translate(codon_table)).replace('B','X').replace('Z','X').replace('J','X')
Original question from @Iseez
Just one question more, when i was tryng to obtain the embl for a different species i encountered the following error:
Traceback (most recent call last): ]
File "/cm/shared/apps/emblmygff3/1.2.6/bin/EMBLmyGFF3", line 11, in <module>
load_entry_point('EMBLmyGFF3==1.2.6', 'console_scripts', 'EMBLmyGFF3')()
File "/cm/shared/apps/emblmygff3/1.2.6/lib/python2.7/site-packages/EMBLmyGFF3-1.2.6-py2.7.egg/EMBLmyGFF3/EMBLmyGFF3.py", line 1383, in main
writer.write_all( outfile )
File "/cm/shared/apps/emblmygff3/1.2.6/lib/python2.7/site-packages/EMBLmyGFF3-1.2.6-py2.7.egg/EMBLmyGFF3/EMBLmyGFF3.py", line 1179, in write_all
self._add_mandatory()
File "/cm/shared/apps/emblmygff3/1.2.6/lib/python2.7/site-packages/EMBLmyGFF3-1.2.6-py2.7.egg/EMBLmyGFF3/EMBLmyGFF3.py", line 195, in _add_mandatory
if seq[end] == 'n' :
IndexError: string index out of range
Is the problem due to the files I'm using as input?
Hello all,
thank you for your tool - very useful in dealing with EMBL/GBK/GFF3 formatting nightmare.
I've recommended it to several people who need to do submissions of annotated genomes. One thing that makes it hard to use is the fact that most people use bioconda now - thus both python and pip are installed via conda. This breaks EMBLmyGFF3 - neither installation command specified in the readme provides a working script. If you could make it easier to install in a cluster environment, it would be great.
For a 370Mb chromosome arm, it is going to take me 4-5 days to convert the the GFF3 to EMBL. The feature part was done in a few second, And then the sequence 'SQ' part takes so long time. Is it normal?
But it's very fast to write short sequences with hundreds of KB, Wonder why?
Thank you for this great tool! Very useful! and flexible enough to adapt to home attribute tags.
Please find below a suggestion for improvement.
Using this tool, I faced some errors with my big input GFF3 file (>300k features), e.g.:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 5: ordinal not in range(128)
ERROR feature: Stop codon found within the CDS. It will rise an error submiting the data to ENA. Please fix your gff file.
Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
And it was not easy to find out what character or what CDS was the wrong one in the GFF3 file. I think it would be great to provide information about the location of the error (e.g. line number, feature ID).
Hello
I am trying to use EMBLmyGFF3 and I get the following error:
Traceback (most recent call last):
File "/scratch/OSR/bin/EMBLmyGFF3/scripts/EMBLmyGFF3", line 11, in <module>
load_entry_point('EMBLmyGFF3==1.2.6', 'console_scripts', 'EMBLmyGFF3')()
File "/home/psur9757/.local/lib/python2.7/site-packages/EMBLmyGFF3/EMBLmyGFF3.py", line 1386, in main
EMBL.print_progress(True)
File "/home/psur9757/.local/lib/python2.7/site-packages/EMBLmyGFF3/EMBLmyGFF3.py", line 386, in print_progress
progress = "[" + "="* int(78 * (float(EMBL.progress)/EMBL.total_features))
ZeroDivisionError: float division by zero
I checked the biopython and bcbgio-gff versions
$ python --version
Python 2.7.9
$ python -c "import Bio; from BCBio import GFF; print 'biopython version: '+Bio.version; print 'bcbio-gff version: '+GFF.version"
biopython version: 1.67
bcbio-gff version: 0.6.4
My command:
samtools faidx $genome $scaffold -o $FASTAs/${scaffold}.fa
grep "^$scaffold" $gff > $GFFs/${scaffold}.gff3
$project/EMBLmyGFF3 --shame --no_progress --ra $AUTHOR --rg $REFERENCE_GROUP -i $LOCUS_TAG -p $PROJECT -m "$MOLECULE" -r $TABLE -t linear -s "$SPECIES" -x $TAXONOMY -o $EMBLs/${scaffold}.embl $GFFs/${scaffold}.gff3 $FASTAs/${scaffold}.fa
Thanks for the conversion software. I have a query that might be a bit off track.
I am trying to use the output .embl (1.31 GB) from augustus gff3 with RATT software (run with linux) and get this error:
I am using the reference.fa. Please make sure that the description line of each fasta entry is the same than in the embl file name!
Just wonder if this is a known issue?
many thanks!
I've tried downloading the program with all 3 options but I get errors everytime I try to download it or run it and I'm not sure what is wrong. Is it because I have the updated version of python?
Traceback (most recent call last):
File "/Users/chengh1/miniconda3/bin/EMBLmyGFF3", line 33, in
sys.exit(load_entry_point('EMBLmyGFF3==2', 'console_scripts', 'EMBLmyGFF3')())
File "/Users/chengh1/miniconda3/bin/EMBLmyGFF3", line 25, in importlib_load_entry_point
return next(matches).load()
File "/Users/chengh1/miniconda3/lib/python3.8/importlib/metadata.py", line 77, in load
module = import_module(match.group('module'))
File "/Users/chengh1/miniconda3/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 783, in exec_module
File "", line 219, in _call_with_frames_removed
File "/Users/chengh1/miniconda3/lib/python3.8/site-packages/EMBLmyGFF3-2-py3.8.egg/EMBLmyGFF3/init.py", line 3, in
from .EMBLmyGFF3 import *
File "/Users/chengh1/miniconda3/lib/python3.8/site-packages/EMBLmyGFF3-2-py3.8.egg/EMBLmyGFF3/EMBLmyGFF3.py", line 4, in
from .modules.feature import Feature
File "/Users/chengh1/miniconda3/lib/python3.8/site-packages/EMBLmyGFF3-2-py3.8.egg/EMBLmyGFF3/modules/feature.py", line 8, in
from Bio.Seq import Seq
ModuleNotFoundError: No module named 'Bio'
Hello,
I've come across an issue with how CDS features are printed for genes encoded on the complementary strand. The problem manifests itself clearly when using the --translate
flag, as it produces lots of erroneous translations riddled with stop codons *
.
I give an example below.
The EMBL output for an affected gene looks like:
FT gene complement(123273..128445)
FT /locus_tag="BANY_locus6"
FT /note="source:GenomeHubs"
FT /note="ID:BANY.1.2.g00007"
FT mRNA complement(join(128366..128445,126919..127115,124188..124
FT 406,123273..123484))
FT /locus_tag="BANY_locus6"
FT /note="source:GenomeHubs"
FT /note="ID:BANY.1.2.t00007"
FT exon complement(128366..128445)
FT /locus_tag="BANY_locus6"
FT /note="source:GenomeHubs"
FT /note="ID:BANY.1.2.t00007-E1"
FT exon complement(126919..127115)
FT /locus_tag="BANY_locus6"
FT /note="source:GenomeHubs"
FT /note="ID:BANY.1.2.t00007-E2"
FT exon complement(124188..124406)
FT /locus_tag="BANY_locus6"
FT /note="source:GenomeHubs"
FT /note="ID:BANY.1.2.t00007-E3"
FT exon complement(123273..123484)
FT /locus_tag="BANY_locus6"
FT /note="source:GenomeHubs"
FT /note="ID:BANY.1.2.t00007-E4"
FT CDS complement(join(<128366..128445,126919..127115,124188..12
FT 4406,123273..>123484))
FT /locus_tag="BANY_locus6"
FT /codon_start=1
FT /note="source:GenomeHubs"
FT /note="ID:BANY.1.2.t00007-CDS"
FT /translation="QKFI*SNIWC*HLVIRS*TTNALTLVCVTFSACRRGSSIRCRVVS
FT LHVAAALSSRAMEIPPRAMTTPL*VSS*QTNMDRE*RASNDRHTVVQRNVWRTCEDRKI
FT DS*RRNSNRKRLSV*GRCR*CCF*MWFR*L**MGSSYKL*FGEKCEIIKISKPIKSHWA
FT KENNLNLNELLSDGEYKELYRLAMIKWSEDMREKDYGCFCRAACENDVSTSNFTVQR*E
FT KVWQRFFN*SLKRK"
FT /transl_table=1
The mRNA feature looks fine, but there are some puzzling <
and >
characters in the CDS feature that I think may be the problem. The translation is then subsequently messed up, and in fact appears to be the translation for the exons in reverse order, as QKFI*
corresponds to the first 4 "codons" of the last exon (E4, 123273..123484).
Hopefully an easy issue, and thanks for a great tool, this is going to extremely useful :-)
Or maybe something funny in the GFF? the entry for this gene is:
BANY00001 GenomeHubs gene 123273 128445 . - . ID=BANY.1.2.g00007
BANY00001 GenomeHubs mRNA 123273 128445 . - . ID=BANY.1.2.t00007;Parent=BANY.1.2.g00007
BANY00001 GenomeHubs exon 128366 128445 . - . ID=BANY.1.2.t00007-E1;Parent=BANY.1.2.t00007
BANY00001 GenomeHubs exon 126919 127115 . - . ID=BANY.1.2.t00007-E2;Parent=BANY.1.2.t00007
BANY00001 GenomeHubs exon 124188 124406 . - . ID=BANY.1.2.t00007-E3;Parent=BANY.1.2.t00007
BANY00001 GenomeHubs exon 123273 123484 . - . ID=BANY.1.2.t00007-E4;Parent=BANY.1.2.t00007
BANY00001 GenomeHubs CDS 128366 128445 . - 0 ID=BANY.1.2.t00007-CDS;Parent=BANY.1.2.t00007
BANY00001 GenomeHubs CDS 126919 127115 . - 2 ID=BANY.1.2.t00007-CDS;Parent=BANY.1.2.t00007
BANY00001 GenomeHubs CDS 124188 124406 . - 1 ID=BANY.1.2.t00007-CDS;Parent=BANY.1.2.t00007
BANY00001 GenomeHubs CDS 123273 123484 . - 1 ID=BANY.1.2.t00007-CDS;Parent=BANY.1.2.t00007
Running biopython version: 1.67 and bcbio-gff version: 0.6.4
Hi,
I encountered multiple, indentical db_xref entries for CDS features with more than on exon. How can I avoid this? Do I really have to limit the db_xref information to one of multiple CDS entries of one feature in the gff file?
Thanks in advance
Best regards
Nadine
We have a user who has used a huge GFF annotation of 3.34 GB. It took ~ 24h computation and apparently it has used more than 50 GB of memory...
We should investigate how to optimise the speedness and the memory usage.
Hi,
Thanks for this software, I'm looking forward to using it. I've just installed it via conda, but I am having some issues that relate to an update of biopython. Can you tell me the versions of biopython and bcbio-gff that you were able to run the software on?
The install I ran today is running into issues with Bio.Alphabet. See the error:
"Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the ``molecule_type`` as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information."
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the ``molecule_type`` as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.
Here is my detailed conda environment information.
channels:
- bioconda
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=1_gnu
- bcbio-gff=0.6.6=pyh864c0ab_1
- biopython=1.78=py37h8f50634_0
- bx-python=0.8.9=py37h73d7ac5_2
- ca-certificates=2020.6.20=hecda079_0
- certifi=2020.6.20=py37hc8dfbb8_0
- emblmygff3=2=py_0
- ld_impl_linux-64=2.35=h769bd43_9
- libblas=3.8.0=17_openblas
- libcblas=3.8.0=17_openblas
- libffi=3.2.1=he1b5a44_1007
- libgcc-ng=9.3.0=h5dbcf3e_17
- libgfortran-ng=7.5.0=hae1eefd_17
- libgfortran4=7.5.0=hae1eefd_17
- libgomp=9.3.0=h5dbcf3e_17
- liblapack=3.8.0=17_openblas
- libopenblas=0.3.10=pthreads_hb3c22a3_4
- libstdcxx-ng=9.3.0=h2ae2ef3_17
- lzo=2.10=h516909a_1000
- ncurses=6.2=he1b5a44_1
- numpy=1.16.4=py37h95a1406_0
- openssl=1.1.1h=h516909a_0
- pip=20.2.3=py_0
- python=3.7.8=h6f2ec95_1_cpython
- python-lzo=1.12=py37h81344f2_1001
- python_abi=3.7=1_cp37m
- readline=8.0=he28a2e2_2
- setuptools=49.6.0=py37hc8dfbb8_1
- six=1.15.0=pyh9f0ad1d_0
- sqlite=3.33.0=h4cf870e_1
- tk=8.6.10=hed695b0_1
- wheel=0.35.1=pyh9f0ad1d_0
- xz=5.2.5=h516909a_1
- zlib=1.2.11=h516909a_1009
Hi there,
I was trying to validate the embl files from EMBLmyGFF3 on ENA's webin-cli but I got the below error message.
ERROR: The qualifier "isolation_source" must exist when qualifier "environmental_sample" exists within the same feature.
I used taxid: 77133 (uncultured bacterium) at the time of running EMBLmyGFF3. To give you more of a background, I'm trying to submit a genome annotation file of an uncultivated bacterium. Any help here would be very much appreciated. Thanks in advance for your excellent support (as always!).
Since we have wrapped the tool up as python module to ease the installation and make sure that people use the correct version of the dependencies, the json mapping files are less easily accessible.
We could make them mandatory to have them where the tool is launched. If they are missing we copy past the default json files locally and use them.
How should you input text that you want to be incorporated into the CC comments or notes line?
Hi,
Thanks for the script it works well so far. During the process some of my scaffolds have been discarded (because too short for EBI).
Do you have a reverse script to convert the EMBL to FASTA in order for me to recompute some statistics (N50 etc.) or do you know a script that can perform directly some statistics from EMBL format ?
Best,
Hi,
I am trying to use this tool to make an embl file for upload to IMG, however I am having issues with the output. First thing, is there a way to assign ID as the contig name?, in IMG they dont accept files with XXX as id name.
Thanks
We should just not report db_xref not accepted
mRNA with Short introns (<10 bp) are not accepted for submission. Would be nice to catch those cases. It would be easy to find them looking at the list of coordinates from the mRNA features.
I am getting the following errors when I try to validate my EMBLmyGFF3-generated flat file through Webin-CLI.
head genome/Pmacd_v0.10/validate/Pmacd_v0.10_ENAsubmit.embl.gz.report
ERROR: "exon" Features locations are duplicated - consider merging qualifiers. [ line: 6951 of Pmacd_v0.10_ENAsubmit.embl.gz, line: 6921 of Pmacd_v0.10_ENAsubmit.embl.gz]
ERROR: "exon" Features locations are duplicated - consider merging qualifiers. [ line: 7321 of Pmacd_v0.10_ENAsubmit.embl.gz, line: 7283 of Pmacd_v0.10_ENAsubmit.embl.gz]
ERROR: "exon" Features locations are duplicated - consider merging qualifiers. [ line: 13785 of Pmacd_v0.10_ENAsubmit.embl.gz, line: 13757 of Pmacd_v0.10_ENAsubmit.embl.gz]
My gff has the format:
Pmacd_v0.10_Sc0000027_pilon AUGUSTUS gene 14395 28338 . - . ID=PmacdG00000006135
Pmacd_v0.10_Sc0000027_pilon AUGUSTUS transcript 14395 28338 . - . ID=PmacdG00000006135.1;Parent=PmacdG00000006135
Pmacd_v0.10_Sc0000027_pilon AUGUSTUS exon 14395 14538 . - . ID=PmacdG00000006135.1-exon1;Parent=PmacdG00000006135.1
Pmacd_v0.10_Sc0000027_pilon AUGUSTUS exon 25250 25354 . - . ID=PmacdG00000006135.1-exon2;Parent=PmacdG00000006135.1
Pmacd_v0.10_Sc0000027_pilon AUGUSTUS exon 28297 28338 . - . ID=PmacdG00000006135.1-exon3;Parent=PmacdG00000006135.1
Pmacd_v0.10_Sc0000027_pilon AUGUSTUS CDS 14395 14538 . - 0 ID=PmacdG00000006135.1-cds1;Parent=PmacdG00000006135.1
Pmacd_v0.10_Sc0000027_pilon AUGUSTUS CDS 25250 25354 . - 0 ID=PmacdG00000006135.1-cds1;Parent=PmacdG00000006135.1
I ran EMBLmyGFF3 using the command:
/home/racste/.local/bin/EMBLmyGFF3 Pmacd_v0.10_braker_gffread_merge_mod_nogeneID.gff Pmacd_v0.10.fasta \
--topology linear \
--molecule_type 'genomic DNA' \
--transl_table 1 \
--species 'Pieris macdunnoughii' \
--project_id PRJEB42400 \
-o result.embl \
-locus_tag PMACD
Of course my first thought was to remove the duplicates with agat_sp_fix_features_locations_duplicated.pl
, but there were no duplicates detected:
=> OmniscientI total time: 198 seconds
Pmacd_v0.10_braker_gffread_merge_mod_nogeneID.gff file parsed
We found 0 cases where isoforms have identical exon structures (we removed duplicates by keeping the one with longest CDS).
We found 0 cases where l2 from different gene identifier have identical exon but no CDS at all (we removed one duplicate).
We found 0 cases where l2 from different gene identifier have identical exon and CDS structures (we removed duplicates by keeping the one with longest CDS).
We found 0 cases where l2 from different gene identifier have identical exon structures (we reshaped UTRs to modify gene locations).
Whe removed 0 genes because no more l2 were linked to them.
We found 0 cases where 2 genes have same location while CDS are differents. In that case we modified the gene locations by clipping UTRs.
Here's an example of one of the overlapping features from all three files:
# webin-CLI report
ERROR: "exon" Features locations are duplicated - consider merging qualifiers. [ line: 6951 of Pmacd_v0.10_ENAsubmit.embl.gz, line: 6921 of Pmacd_v0.10_ENAsubmit.embl.gz]
# EMBLmyGFF3 output
#line: 6921-6924 of Pmacd_v0.10_ENAsubmit.embl.gz
FT exon complement(2756392..2756483)
FT /locus_tag="PMACD_LOCUS154"
FT /note="ID:PmacdG00000009802.2-exon5"
FT /note="source:AUGUSTUS"
#line: 6951-6954 of Pmacd_v0.10_ENAsubmit.embl.gz
FT exon complement(2756392..2756483)
FT /locus_tag="PMACD_LOCUS155"
FT /note="ID:PmacdG00000009803.1-exon2"
FT /note="source:GeneMark.hmm"
# gff3 input
# relevant exon is starred (**)
grep PmacdG00000009802 *nogeneID.gff
Pmacd_v0.10_Sc0000000_pilon GeneMark.hmm gene 2751271 2756483 . - . ID=PmacdG00000009802
Pmacd_v0.10_Sc0000000_pilon GeneMark.hmm transcript 2751271 2752952 . - . ID=PmacdG00000009802.1;Parent=PmacdG00000009802
Pmacd_v0.10_Sc0000000_pilon GeneMark.hmm exon 2751271 2751537 0 - . ID=PmacdG00000009802.1-exon1;Parent=PmacdG00000009802.1
Pmacd_v0.10_Sc0000000_pilon GeneMark.hmm exon 2752178 2752342 0 - . ID=PmacdG00000009802.1-exon2;Parent=PmacdG00000009802.1
Pmacd_v0.10_Sc0000000_pilon GeneMark.hmm exon 2752767 2752952 0 - . ID=PmacdG00000009802.1-exon3;Parent=PmacdG00000009802.1
Pmacd_v0.10_Sc0000000_pilon GeneMark.hmm CDS 2751271 2751537 . - 0 ID=PmacdG00000009802.1-cds1;Parent=PmacdG00000009802.1
Pmacd_v0.10_Sc0000000_pilon GeneMark.hmm CDS 2752178 2752342 . - 0 ID=PmacdG00000009802.1-cds1;Parent=PmacdG00000009802.1
Pmacd_v0.10_Sc0000000_pilon GeneMark.hmm CDS 2752767 2752952 . - 0 ID=PmacdG00000009802.1-cds1;Parent=PmacdG00000009802.1
Pmacd_v0.10_Sc0000000_pilon AUGUSTUS transcript 2751271 2756483 . - . ID=PmacdG00000009802.2;Parent=PmacdG00000009802
Pmacd_v0.10_Sc0000000_pilon AUGUSTUS exon 2751271 2751537 . - . ID=PmacdG00000009802.2-exon1;Parent=PmacdG00000009802.2
Pmacd_v0.10_Sc0000000_pilon AUGUSTUS exon 2752178 2752342 . - . ID=PmacdG00000009802.2-exon2;Parent=PmacdG00000009802.2
Pmacd_v0.10_Sc0000000_pilon AUGUSTUS exon 2752767 2752930 . - . ID=PmacdG00000009802.2-exon3;Parent=PmacdG00000009802.2
Pmacd_v0.10_Sc0000000_pilon AUGUSTUS exon 2755781 2755980 . - . ID=PmacdG00000009802.2-exon4;Parent=PmacdG00000009802.2
** Pmacd_v0.10_Sc0000000_pilon AUGUSTUS exon 2756392 2756483 . - . ID=PmacdG00000009802.2-exon5;Parent=PmacdG00000009802.2
Pmacd_v0.10_Sc0000000_pilon AUGUSTUS CDS 2751271 2751537 . - 0 ID=PmacdG00000009802.2-cds1;Parent=PmacdG00000009802.2
Pmacd_v0.10_Sc0000000_pilon AUGUSTUS CDS 2752178 2752342 . - 0 ID=PmacdG00000009802.2-cds1;Parent=PmacdG00000009802.2
Pmacd_v0.10_Sc0000000_pilon AUGUSTUS CDS 2752767 2752930 . - 2 ID=PmacdG00000009802.2-cds1;Parent=PmacdG00000009802.2
Pmacd_v0.10_Sc0000000_pilon AUGUSTUS CDS 2755781 2755980 . - 1 ID=PmacdG00000009802.2-cds1;Parent=PmacdG00000009802.2
Pmacd_v0.10_Sc0000000_pilon AUGUSTUS CDS 2756392 2756483 . - 0 ID=PmacdG00000009802.2-cds1;Parent=PmacdG00000009802.2
# relevant exon is starred (**)
grep PmacdG00000009803 *nogeneID.gff
Pmacd_v0.10_Sc0000000_pilon GeneMark.hmm gene 2755740 2756483 . - . ID=PmacdG00000009803
Pmacd_v0.10_Sc0000000_pilon GeneMark.hmm transcript 2755740 2756483 . - . ID=PmacdG00000009803.1;Parent=PmacdG00000009803
Pmacd_v0.10_Sc0000000_pilon GeneMark.hmm exon 2755740 2755980 0 - . ID=PmacdG00000009803.1-exon1;Parent=PmacdG00000009803.1
** Pmacd_v0.10_Sc0000000_pilon GeneMark.hmm exon 2756392 2756483 0 - . ID=PmacdG00000009803.1-exon2;Parent=PmacdG00000009803.1
Pmacd_v0.10_Sc0000000_pilon GeneMark.hmm CDS 2755740 2755980 . - 1 ID=PmacdG00000009803.1-cds1;Parent=PmacdG00000009803.1
Pmacd_v0.10_Sc0000000_pilon GeneMark.hmm CDS 2756392 2756483 . - 0 ID=PmacdG00000009803.1-cds1;Parent=PmacdG00000009803.1
Do you have any other suggestions of how to fix this error so I can validate and submit my flat file?
Thanks!
Rachel
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.