Giter Site home page Giter Site logo

Comments (14)

Juke34 avatar Juke34 commented on June 21, 2024

Hello,
The tool gives this error message:

15:55:39 ERROR feature: >>match_part<< is not a valid EMBL feature type. You can ignore this message if you don't need it.
Otherwise tell me which EMBL feature it corresponds to by adding the information within the json mapping file.

As you only have match_part features (3th column) in your file, and this feature type is not a valid EMBL feature type, all your features are skipped.

You must add in the translation_gff_feature_to_embl_feature.json file, an information like this example:

"transcript": {
"target": "mRNA"
}

i.e: Transcript feature type from the gff3 will be translated in mRNA EMBL feature type.

So for you it will be:

"match_part": {
"target": "XXX"
}

where XXX will be the corresponding EMBL feature type. Have a look here https://www.ebi.ac.uk/ena/WebFeat/ to see if one of the EMBL feature corresponds to what you want to describe.

Best regards,

Jacques

from emblmygff3.

arsilan324 avatar arsilan324 commented on June 21, 2024

Hi,

thanks for the prompt reply. I would like to show you detailed gff3 file which do includes valid features. See attached file. But problems remains as it is. No description of annotation in the output. Both are attached.
example-embl.txt
example-gff3.txt

from emblmygff3.

Juke34 avatar Juke34 commented on June 21, 2024

Same thing as I described you previously for features “protein_hmm_match”, “translated_nucleotide_match”,etc. They are all listed in the Standard output (in your terminal). For all other ones (valid features) they are well present into your EMBL output.

from emblmygff3.

arsilan324 avatar arsilan324 commented on June 21, 2024

I agree on it. But probably I am not able to explain you my point. I would like to include this annotation information such as names of genes, i.e., Name=Pkinase, or Name=Stress-antifung etc. Don't you think that this information is important?
Thanks

from emblmygff3.

Juke34 avatar Juke34 commented on June 21, 2024

Ok I understand what you mean, you talk about the gff3 attributes (key=value) called qualifier in the EMBL format. Yes they are importants, and yes there is a way to include them. In the same way as for the features, using this time the translation_gff_attribute_to_embl_qualifier.json file.

All those that are skipped are listed in the standard output with messages like this one:

15:55:39 WARNING feature: Unknown qualifier 'hin' - skipped

Have a look at the paragraph "GFF3 Attribute to EMBL qualifier" in the Readme.
Let me know if I answered your question.

from emblmygff3.

arsilan324 avatar arsilan324 commented on June 21, 2024

Yes. This is exactly what I am missing... But I think I am missing all of them. So, for example if this is one annotated sequence of GFF3 file (below);

Transcript_100002 HMMER protein_hmm_match 1 123 6e-49 . . ID=homology:175096;Name=Pkinase_Tyr;Target=Pkinase_Tyr 4 259 +;Note=Protein tyrosine kinase;accuracy=0.84;env_coords=1 123;Dbxref="Pfam:PF07714.13"

and I can see that "Name" and "Target" are already in the .json file

 "Name": {
   "source description": "Display name for the feature. This is the name to be displayed to the user. Unlike IDs, there is no requirement that the Name be unique within the file.",
   "target": "standard_name",
   "dev comment": ""
 },


 "Target": {
   "source description": "Indicates the target of a nucleotide-to-nucleotide or protein-to-nucleotide alignment. The format of the value is \"target_id start end [strand]\", where strand is optional and may be \"+\" or \"-\". If the target_id contains spaces, they must be escaped as hex escape %20.",
   "target": "",
   "dev comment": "for now this is un-mapped in the final EMBL"
 },

So what is the error? Why it is not picking up?
Thanks

from emblmygff3.

arsilan324 avatar arsilan324 commented on June 21, 2024

Okay - I got it... I have modified the .json file now as

{
 "_comment":{"source description": "The type of the feature (previously called the \"method\"). This is constrained to be either a term from the Sequence Ontology or an SO accession number. The latter alternative is distinguished using the syntax SO:000000. In either case, it must be sequence_feature (SO:0000110) or an is_a child of it."},
 "five_prime_UTR": {
   "target": "5'UTR"
 },
 "three_prime_UTR": {
   "target": "3'UTR"
 },
 "exon": {

 },
  "protein_hmm_match": {
   "target": "standard_name"
 },
 "transcript": {
 	"target": "mRNA"
 }
}

But it still giving the same error.. I assume that there is no mistake I am doing... Would you please comment?

from emblmygff3.

Juke34 avatar Juke34 commented on June 21, 2024

I understand, it's a bit confusing. You are mixing up feature type and attribute from the 3th column and the 9th column accordingly, GFF3 talking. You have to be clear with those terms and the corresponding ones in EMBL format.

Here one example of one feature (= 1 line) in gff3:

seqid(col1) source(col2) feature_type(col3) start(col4) end(col5) score(col6) strand(col7) phase(col8) attributes(col9)

Where attributes is a list of key_attribute_1=value_attribute_1;key_attribute_1=value_attribute_1.

Talking with the EMBL terms, the feature type and attribute from gff3 are called feature type and qualifier accordingly and will look like this in EMBL format:

> FT   feature_type            complement(1..2174)
> FT                           /key_attribute1=value_attribute1
> FT                           /key_attribute2=value_attribute2

If your modification (protein_hmm_match => standard_name) is taking into account by the tool, it will complain that standard_name is not an accepted feature by EMBL. Indeed, the feature type protein_hmm_match coming from your gff3 has to map against a feature_type from EMBL. Look again here: https://www.ebi.ac.uk/ena/WebFeat/ and you will see that standard_name is a qualifier (that can be used to map an attributes from the gff3 file) and not a feature.

As the qualifiers are linked to their feature type in EMBL, it means when we skip a feature (because the feature_type doesn't exist in EMBL) from the gff3 file, we skip all its attributes too.

A real problem I'm realising talking with you, it's not as easy as it was to modify the json files since we packaged the tool to make it easy to install. Indeed, in order to take in consideration the modification of the json file by the tool, you will need to modify the file in your EMBLmyGFF3 github repository and relaunch python setup.py install. OR launch a provided example like EMBLmyGFF3-maker-example to look at the path of the installation that is display for accessing the examples files and modify the file directly there (you will not need to recompile using "python setup.py install").

from emblmygff3.

arsilan324 avatar arsilan324 commented on June 21, 2024

I realised my mistake now! Thank you for the detailed reply.
But the next thing I am not able to understand is how to modify the .json file. I can see that it is in the "modules" folder inside "EMBLmyGFF3". I then re-run the python setup.py install command and did the command again. But again, same error. I am not able to understand "EMBLmyGFF3 github repository". Do you mean the directory? If yes, then I have already changed the file name there...

Also, I am not able to understand about how to get the installation path by launching the example EMBLmyGFF3-maker-example. What I know, i can see the .json files already so I know the path as well... I did modify the files but getting the same error.

Thanks

PS: Now i modified it as, to check if at least error for "protein_hmm_match" disappears, but it is still there


  "protein_hmm_match": {
   "target": "CDS"
 },

from emblmygff3.

Juke34 avatar Juke34 commented on June 21, 2024

Yes the EMBLmyGFF3 github repository is the folder called "EMBLmyGFF3" created when you do the command git clone https://github.com/NBISweden/EMBLmyGFF3.git.

What you have done should work. Could you copy past the output log from the terminal to show me (use the --output option to be sure that the EMBL output will be written in a file. Like that we have only the Warning/Error message displayed within the terminal).

Just for explanation:
Doing the python setup.py install it will apply the modification in the EMBLmyGFF3 "module" which is somewhere in your computer. When you call the EMBLmyGFF3 command this is this python "module" which is in fact called. If you want to modify directly the files of the module, you can localise it by launching an example like EMBLmyGFF3-maker-example because the first thing is doing the example is to show you the real command launched. Something like that:

Running the following command: EMBLmyGFF3 --rg REFERENCE_GROUP -i MY_LOCUS_TAG -p 17285 -m "genomic DNA" -r 1 -t linear -s "Drosophila melanogaster" -x INV -o EMBLmyGFF3-maker-example.embl /Users/UserName/Path/To/The/Python/Module/examples/maker.gff3 /Users/UserName/Path/To/The/Python/Module/examples/maker.fa

Where /Users/UserName/Path/To/The/Python/Module is the path to the real folder containing the code used by the EMBLmyGFF3 command. So the file will be /Users/UserName/Path/To/The/Python/Module/EMBLmyGFF3/modules/translation_gff_attribute_to_embl_qualifier.json.

This approach is useful when the installation is done using pip installand you don't have the "EMBLmyGFF3 github repository" on your computer.

The 2 ways to execute your modification easily:

  1. Do your modification in the EMBLmyGFF3 github repository, apply the modification with the command python setup.py install, use the tool EMBLmyGFF3
  2. Do your modification in the EMBLmyGFF3 github repository, then launch EMBLmyGFF3 from the EMBLmyGFF3 github repository folder called EMBLmyGFF3 using this specific syntax python -m EMBLmyGFF3.

from emblmygff3.

arsilan324 avatar arsilan324 commented on June 21, 2024

I have no success in doing first thing. I believe I am exactly following it. Here is what I have done.

  1. Modified .json file
  2. run the python setup.py install
  3. and run the command
    ubt80:EMBL arslan$ EMBLmyGFF3 for_embl.gff3 juncus-rp.fasta --topology linear --molecule_type 'genomic DNA' --transl_table 1 --species 'Juncus effusus' --locus_tag MY_LOCUS_TAG --project_id PRJXXXXXXX -o result.embl

the output i got is, still having this
11:58:46 ERROR feature: >>protein_hmm_match<< is not a valid EMBL feature type. You can ignore this message if you don't need it.
although I have added the info in .json file as


{
 "_comment":{"source description": "The type of the feature (previously called the \"method\"). This is constrained to be either a term from the Sequence Ontology or an SO accession number. The latter alternative is distinguished using the syntax SO:000000. In either case, it must be sequence_feature (SO:0000110) or an is_a child of it."},
 "five_prime_UTR": {
   "target": "5'UTR"
 },
 "three_prime_UTR": {
   "target": "3'UTR"
 },
 "exon": {

 },
  "protein_hmm_match": {
   "target": "CDS"
 },
 "transcript": {
 	"target": "mRNA"
 }
}

The procedure you mentioned for example case, I am not able to follow at all :(

from emblmygff3.

Juke34 avatar Juke34 commented on June 21, 2024

We will definitely ease the way to modify those json files in a close future. They are important because it is what make the tool universal.

So to come back to the problem.
Try to uninstall the tool pip uninstall EMBLmyGFF3 (twice to be sure). And relaunch python setup.py install. Then launch the tool again to see if this time it takes into account your modification.

Otherwise just launch your command with python -m in front from the EMBLmyGFF3 folder. It should use the local files (So the one modified too).

from emblmygff3.

arsilan324 avatar arsilan324 commented on June 21, 2024

This uninstallation and re-installation worked perfectly! Thanks a lot once again!

from emblmygff3.

Juke34 avatar Juke34 commented on June 21, 2024

You're welcome !

from emblmygff3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.