Comments (12)
This could be related to the docker issue we discovered recently: sometimes docker instances lose connection to the Internet because ipv4 forwarding is not explicitly set. Do you have something like WARNING: IPv4 forwarding is disabled. Networking will not work.
in your output as well?
from pgap.
It correctly works on test genome (MG37), but failed on our assemblies and NCBI complete genomes.
from pgap.
We do not have the warning: WARNING: IPv4 forwarding is disabled. Networking will not work.
from pgap.
Could be something else then.
It correctly works on test genome (MG37), but failed on our assemblies and NCBI complete genomes
Is it reproducible or happened only once for each genome so far?
from pgap.
It's happening regularly with all genomes we try to work with. But test genome file works fine.
from pgap.
I just got a chance to have a look at the cwtool.log
you have posted (thank you very much, very thoughtful of you).
It looks like in this particular case Error messages you are quoting are red herrings: the step(s) that have them actually completed success
.
What was a real problem here is that step Prepare_Unannotated_Sequences_asnvalidate_evaluate
discovered that our standard validation produced a fatal level diagnostics:
<?xml version="1.0" encoding="UTF-8"?>
<message severity="ERROR" seq-id="lcl|Razi_Pm0001" code="SEQ_DESCR_DBLinkBadSRAaccession">Bad Sequence Read Archive format - not</message>
Does your input.yaml
file have by any chance locus_tag_prefix
set to Razi_Pm
?
Meanwhile, I am going to investigate further what exactly might have caused this diagnostics.
from pgap.
Internet search leads me to our internal Toolkit code:
https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/doxyhtml/validerror__desc_8cpp_source.html
Line 689,
} else if (NStr::EqualNocase(label_str, "Sequence Read Archive")) {
if (fld.IsSetData() && fld.GetData().IsStrs()) {
const CUser_field::C_Data::TStrs& strs = fld.GetData().GetStrs();
ITERATE(CUser_field::C_Data::TStrs, st_itr, strs) {
const string& str = *st_itr;
if (x_IsBadSRAFormat (str)) {
PostErr(eDiag_Error, eErr_SEQ_DESCR_DBLinkBadSRAaccession,
"Bad Sequence Read Archive format - " + str, *m_Ctx, desc);
}
}
}
which indicates that the error happens when the reader scans through Sequence Read Archive
user
sequence descriptor.
Most likely, you have an invalid SRA accession in sra:
field of your submol.yaml
input file.
I would presume that for testing purposes you would just use the one in the example submol.yaml
(ERR2193926
), which should not have caused problems. Please let me know what is exactly the value of this file in your submol.yaml
file.
We will also try to alleviate the absence of information on how to set sra:
field in our Wiki documentation here:
https://github.com/ncbi/pgap/wiki/Input-Files
from pgap.
Wait, I just noticed not
in Bad Sequence Read Archive format - not
message.
It looks like you intended to use a dummy SRA number. Note that this field is optional and you do not have to include it in your input submol.yaml
files.
Please let me know how this works for you. Thanks!
from pgap.
Azat thank you so much, you are right, we put a random name in bioproject and biosample fields (because in .yaml test file we noticed not real number for bioproject and biosample ) Do we need to leave them these fields empty?
We did not fined any SRA field in submol.yaml file
from pgap.
biosample and bioproject fields are indeed optional as indicated here:
https://github.com/ncbi/pgap/wiki/Input-Files
biosample - optional. BioSample ID (SAMXXX) for the sequenced sample, if available
There is still something strange: you said that you haven't used an sra:
field in your input submol.yaml
files, but the diagnostics mentions specifically SRA:
SEQ_DESCR_DBLinkBadSRAaccession
I would suggest to double check if you do not have sra:
fields in the input submol.yaml
files and in case if omitting biosample and bioproject fields still does not help and you get the same diagnostics in your cwltool.log
, please run pgap.py
with adding --debug
option. This will keep intermediate files for further analysis and we would like to see in this case what sequences.text.asn
(find . -name sequences.text.asn
) has for BioSample, BioProject or Sequence Read Archive fields in user objects.
from pgap.
Dear Azat,
Thank you very much! Everything is working just fine, we left bioproject and biosample fields empty and looks like it works.
from pgap.
You are welcome, Vlad! If you are not aware, now you can submit the results of your calculations to GenBank directly, and it will bypass internal PGAP calculation.
from pgap.
Related Issues (20)
- [BUG] <title>How to run pgap.py with qsub file. HOT 5
- [BUG] WARNING Final process status is permanentFail HOT 9
- [FEATURE REQUEST] Quit if no SSE4.2 support is detected HOT 6
- [BUG] checkm dies with "OSError: AF_UNIX path too long" HOT 5
- [FEATURE REQUEST] Workaround for "taskset: failed to set pid 0's affinity"-Bug HOT 2
- [BUG] Failing to run my own sequence HOT 19
- [FEATURE REQUEST] Support for Charlie Cloud Docker compatible (but more secure) container system for HPC HOT 1
- pgap --update showing huge file size during installation HOT 19
- [BUG] A YAML file argument cannot be used in combination with either the -s/--organism or -g/--genome arguments HOT 8
- [BUG] -c flag not received: /mnt/shared/scratch/theaven/uncompressed/hogenhout/pgap-s7 HOT 1
- [BUG] Fail in GenBank output file HOT 9
- product protein name issues HOT 3
- [BUG] Final process status is permanentFail HOT 1
- [Error] Docker exited with rc = 1 HOT 3
- [Error] Docker exited with rc =1 HOT 2
- Get_Proteins_app issues HOT 14
- Error: Final process status is permanentFail HOT 7
- Error: Final process status is permanentFail HOT 17
- source code for gc_get_molecules HOT 2
- PGAP for multiple users HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pgap.