Comments (10)
Update:
middle_initial in contact_info leads to a crash
We added middle_initial support in contact_info in 2020-02-06.build4373 release
from pgap.
I accidentally published the issue before completing it. Do you have enough information now?
from pgap.
Yes. Thank you! I am adding internal ticket labels right now.
from pgap.
Also, it would be great if pgap gave specific error messages
Understood. Sometimes it is easy to fix them, sometimes it is harder. Feel free to open a new issue for particular error messages that need work.
from pgap.
The documentation regarding contact_info has been updated. Thank you for reporting the problem!
PS: we are working on the other issues you mentioned.
from pgap.
thanks for the rapid response!
it might be useful to programmatically formulate the requirements in pgap.py
and to check them before running the docker pipeline. this would lead to immediate and clear error messages when the user is at fault. (inspired by design by contract.)
from pgap.
I also noticed that characters such as 'í' lead to crashes when used in names. (However, 'ä' in street seems to work.)
Could you please two corresponding submol.yaml
files for these cases?
from pgap.
Meanwhile I tried to add i-acute and a-umlaut to submol.yaml
and both cases failed at the step yaml2json.py
That py script is suspect number one now.
from pgap.
Could you please two corresponding
submol.yaml
files for these cases?
í in names -> failure:
the error message is:
(...)
and not(contains(@code, "SEQ_PKG_ComponentMissingTitle"))
and not(contains(@code, "SEQ_DESCR_ChromosomeLocation"))
and not(contains(@code, "SEQ_DESCR_MissingLineage"))
and not(contains(@code, "SEQ_DESCR_NoTaxonID"))
and not(contains(@code, "SEQ_FEAT_ShortIntron"))
]
'
Failer nodes:
<?xml version="1.0" encoding="UTF-8"?>
<message severity="ERROR" seq-id="lcl|L43967.2" code="SEQ_FEAT_BadCharInAuthorLastName">Bad characters in author Sch##nb##chler</message>
[2019-11-19 14:51:09] WARNING [job Prepare_Unannotated_Sequences_asnvalidate_evaluate] completed permanentFail
[2019-11-19 14:51:09] WARNING [step Prepare_Unannotated_Sequences_asnvalidate_evaluate] completed permanentFail
[2019-11-19 14:51:09] INFO [workflow standard_pgap] completed permanentFail
[2019-11-19 14:51:09] WARNING [step standard_pgap] completed permanentFail
[2019-11-19 14:51:09] INFO [workflow ] completed permanentFail
[2019-11-19 14:51:09] WARNING Final process status is permanentFail
{
"gbk": null,
"gff": null,
(...)
this is the corresponding submol.yaml
:
topology: circular
comment: 'There is no really a biologist Arnold Schwarzenegger'
consortium: 'SkyNet consortium'
sra:
- accession: 'ERR2193926'
tp_assembly: true
organism:
genus_species: 'Mycoplasma genitalium'
strain: 'replaceme'
contact_info:
last_name: 'Schönbächler'
first_name: 'Jane'
email: '[email protected]'
organization: 'Institute of Klebsiella foobarensis research'
department: 'Department of Using NCBI'
phone: '301-555-0245'
street: '1234 Main St'
city: 'Docker'
postal_code: '12345'
country: 'Lappland'
authors:
- author:
first_name: 'Arnold'
last_name: 'Schwarzenegger'
- author:
first_name: 'Linda'
last_name: 'Hamilton'
bioproject: 'PRJNA9999999'
biosample: 'SAMN99999999'
# -- Locus tag prefix - optional. Limited to 9 letters. Unless the locus tag prefix was officially assigned by NCBI, ENA, or DDBJ, it will be replaced upon submission of the annotation to NCBI and is therefore temporary and not to be used in publications. If not provided, pgaptmp will be used.
locus_tag_prefix: 'tmp'
publications:
- publication:
pmid: 16397293
title: 'Discrete CHARMm of Klebsiella foobarensis. Journal of Improbable Results, vol. 34, issue 13, pages: 10001-100005, 2018'
status: published # this is enum: controlled vocabulary
authors:
- author:
first_name: 'Arnold'
last_name: 'Schwarzenegger'
- author:
first_name: 'Linda'
last_name: 'Hamilton'
umlauts in street and author names -> success:
topology: circular
comment: 'There is no really a biologist Arnold Schwarzenegger'
consortium: 'SkyNet consortium'
sra:
- accession: 'ERR2193926'
tp_assembly: true
organism:
genus_species: 'Mycoplasma genitalium'
strain: 'replaceme'
contact_info:
last_name: 'Hamilton'
first_name: 'Jane'
email: '[email protected]'
organization: 'Institute of Klebsiella foobarensis research'
department: 'Department of Using NCBI'
phone: '301-555-0245'
street: '1234 Mäín Ströüt'
city: 'Docker'
postal_code: '12345'
country: 'Lappland'
authors:
- author:
first_name: 'Arnold'
last_name: 'Schwarzenegger'
- author:
first_name: 'Linda'
last_name: 'Hamilton'
bioproject: 'PRJNA9999999'
biosample: 'SAMN99999999'
# -- Locus tag prefix - optional. Limited to 9 letters. Unless the locus tag prefix was officially assigned by NCBI, ENA, or DDBJ, it will be replaced upon submission of the annotation to NCBI and is therefore temporary and not to be used in publications. If not provided, pgaptmp will be used.
locus_tag_prefix: 'tmp'
publications:
- publication:
pmid: 16397293
title: 'Discrete CHARMm of Klebsiella foobarensis. Journal of Improbable Results, vol. 34, issue 13, pages: 10001-100005, 2018'
status: published # this is enum: controlled vocabulary
authors:
- author:
first_name: 'Arnold'
last_name: 'Schönbächler'
- author:
first_name: 'Linda'
last_name: 'Hamilton'
from pgap.
Thanks! That is very helpful. I see now that in your environment this case magically advances further than in my environment (where it breaks at yaml2json converter). I added European characters to our TeamCity case and we should be able to push this through soon.
from pgap.
Related Issues (20)
- [FEATURE REQUEST] direct S3 access to input datasets HOT 5
- Running PGAP with Metagenomic Assembled Genome HOT 11
- [BUG] PGAP analysis generates all files except .aa and .gbk HOT 35
- [FEATURE REQUEST] Unable to retrieve additional reference data from https://s3.amazonaws.com/pgap/input-[version].tgz. HOT 2
- Problem with running pgap annotation HOT 1
- [FEATURE REQUEST] Any plans about EGAP (Eukaryotic Genome Annotation Pipeline)? HOT 2
- [FEATURE REQUEST] <Turn Off GO annotation> HOT 7
- [BUG] Binding non-existent file HOT 6
- [BUG] <title>The length of the gene exceeds the length of the conitg HOT 10
- PIPELINE TEST w/ M. genitalium - PermissionError: [Errno 13] Permission denied: '/pgap/output/debug/tmpdir HOT 38
- [BUG] <title>How to run pgap.py with qsub file. HOT 5
- [BUG] WARNING Final process status is permanentFail HOT 9
- [FEATURE REQUEST] Quit if no SSE4.2 support is detected HOT 6
- [BUG] checkm dies with "OSError: AF_UNIX path too long" HOT 5
- [FEATURE REQUEST] Workaround for "taskset: failed to set pid 0's affinity"-Bug HOT 2
- [BUG] Failing to run my own sequence HOT 19
- [FEATURE REQUEST] Support for Charlie Cloud Docker compatible (but more secure) container system for HPC HOT 1
- pgap --update showing huge file size during installation HOT 19
- [BUG] A YAML file argument cannot be used in combination with either the -s/--organism or -g/--genome arguments HOT 8
- [BUG] -c flag not received: /mnt/shared/scratch/theaven/uncompressed/hogenhout/pgap-s7 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pgap.