Giter Site home page Giter Site logo

middle_initial in submol.yaml about pgap HOT 10 CLOSED

ncbi avatar ncbi commented on August 19, 2024
middle_initial in submol.yaml

from pgap.

Comments (10)

azat-badretdin avatar azat-badretdin commented on August 19, 2024 1

Update:

middle_initial in contact_info leads to a crash

We added middle_initial support in contact_info in 2020-02-06.build4373 release

from pgap.

TheBigFatTony avatar TheBigFatTony commented on August 19, 2024

I accidentally published the issue before completing it. Do you have enough information now?

from pgap.

azat-badretdin avatar azat-badretdin commented on August 19, 2024

Yes. Thank you! I am adding internal ticket labels right now.

from pgap.

azat-badretdin avatar azat-badretdin commented on August 19, 2024

Also, it would be great if pgap gave specific error messages

Understood. Sometimes it is easy to fix them, sometimes it is harder. Feel free to open a new issue for particular error messages that need work.

from pgap.

thibaudnis avatar thibaudnis commented on August 19, 2024

The documentation regarding contact_info has been updated. Thank you for reporting the problem!
PS: we are working on the other issues you mentioned.

from pgap.

TheBigFatTony avatar TheBigFatTony commented on August 19, 2024

thanks for the rapid response!

it might be useful to programmatically formulate the requirements in pgap.py and to check them before running the docker pipeline. this would lead to immediate and clear error messages when the user is at fault. (inspired by design by contract.)

from pgap.

azat-badretdin avatar azat-badretdin commented on August 19, 2024

I also noticed that characters such as 'í' lead to crashes when used in names. (However, 'ä' in street seems to work.)

Could you please two corresponding submol.yaml files for these cases?

from pgap.

azat-badretdin avatar azat-badretdin commented on August 19, 2024

Meanwhile I tried to add i-acute and a-umlaut to submol.yaml and both cases failed at the step yaml2json.py

That py script is suspect number one now.

from pgap.

TheBigFatTony avatar TheBigFatTony commented on August 19, 2024

Could you please two corresponding submol.yaml files for these cases?

í in names -> failure:

the error message is:

(...)
    and not(contains(@code, "SEQ_PKG_ComponentMissingTitle")) 
    and not(contains(@code, "SEQ_DESCR_ChromosomeLocation")) 
    and not(contains(@code, "SEQ_DESCR_MissingLineage")) 
    and not(contains(@code, "SEQ_DESCR_NoTaxonID")) 
    and not(contains(@code, "SEQ_FEAT_ShortIntron")) 
]
'
Failer nodes:
<?xml version="1.0" encoding="UTF-8"?>
<message severity="ERROR" seq-id="lcl|L43967.2" code="SEQ_FEAT_BadCharInAuthorLastName">Bad characters in author Sch##nb##chler</message>

[2019-11-19 14:51:09] WARNING [job Prepare_Unannotated_Sequences_asnvalidate_evaluate] completed permanentFail
[2019-11-19 14:51:09] WARNING [step Prepare_Unannotated_Sequences_asnvalidate_evaluate] completed permanentFail
[2019-11-19 14:51:09] INFO [workflow standard_pgap] completed permanentFail
[2019-11-19 14:51:09] WARNING [step standard_pgap] completed permanentFail
[2019-11-19 14:51:09] INFO [workflow ] completed permanentFail
[2019-11-19 14:51:09] WARNING Final process status is permanentFail
{
    "gbk": null,
    "gff": null,
(...)

this is the corresponding submol.yaml:

topology: circular
comment: 'There is no really a biologist Arnold Schwarzenegger'
consortium: 'SkyNet consortium'
sra:
    - accession: 'ERR2193926'
tp_assembly: true
organism:
    genus_species: 'Mycoplasma genitalium' 
    strain: 'replaceme'
contact_info:
    last_name: 'Schönbächler'
    first_name: 'Jane'
    email: '[email protected]'
    organization: 'Institute of Klebsiella foobarensis research'
    department: 'Department of Using NCBI'
    phone: '301-555-0245'
    street: '1234 Main St'
    city: 'Docker'
    postal_code: '12345'
    country: 'Lappland'
    
authors:
    -     author:
            first_name: 'Arnold'
            last_name: 'Schwarzenegger'
    -     author:
            first_name: 'Linda'
            last_name: 'Hamilton'
bioproject: 'PRJNA9999999'
biosample: 'SAMN99999999'      
# -- Locus tag prefix - optional. Limited to 9 letters. Unless the locus tag prefix was officially assigned by NCBI, ENA, or DDBJ, it will be replaced upon submission of the annotation to NCBI and is therefore temporary and not to be used in publications. If not provided, pgaptmp will be used.
locus_tag_prefix: 'tmp'
publications:
    - publication:
        pmid: 16397293
        title: 'Discrete CHARMm of Klebsiella foobarensis. Journal of Improbable Results, vol. 34, issue 13, pages: 10001-100005, 2018'
        status: published  # this is enum: controlled vocabulary
        authors:
            - author:
                first_name: 'Arnold'
                last_name: 'Schwarzenegger'
            - author:
                  first_name: 'Linda'
                  last_name: 'Hamilton'

umlauts in street and author names -> success:

topology: circular
comment: 'There is no really a biologist Arnold Schwarzenegger'
consortium: 'SkyNet consortium'
sra:
    - accession: 'ERR2193926'
tp_assembly: true
organism:
    genus_species: 'Mycoplasma genitalium' 
    strain: 'replaceme'
contact_info:
    last_name: 'Hamilton'
    first_name: 'Jane'
    email: '[email protected]'
    organization: 'Institute of Klebsiella foobarensis research'
    department: 'Department of Using NCBI'
    phone: '301-555-0245'
    street: '1234 Mäín Ströüt'
    city: 'Docker'
    postal_code: '12345'
    country: 'Lappland'
    
authors:
    -     author:
            first_name: 'Arnold'
            last_name: 'Schwarzenegger'
    -     author:
            first_name: 'Linda'
            last_name: 'Hamilton'
bioproject: 'PRJNA9999999'
biosample: 'SAMN99999999'      
# -- Locus tag prefix - optional. Limited to 9 letters. Unless the locus tag prefix was officially assigned by NCBI, ENA, or DDBJ, it will be replaced upon submission of the annotation to NCBI and is therefore temporary and not to be used in publications. If not provided, pgaptmp will be used.
locus_tag_prefix: 'tmp'
publications:
    - publication:
        pmid: 16397293
        title: 'Discrete CHARMm of Klebsiella foobarensis. Journal of Improbable Results, vol. 34, issue 13, pages: 10001-100005, 2018'
        status: published  # this is enum: controlled vocabulary
        authors:
            - author:
                first_name: 'Arnold'
                last_name: 'Schönbächler'
            - author:
                  first_name: 'Linda'
                  last_name: 'Hamilton'

from pgap.

azat-badretdin avatar azat-badretdin commented on August 19, 2024

Thanks! That is very helpful. I see now that in your environment this case magically advances further than in my environment (where it breaks at yaml2json converter). I added European characters to our TeamCity case and we should be able to push this through soon.

from pgap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.