Dear All, I have an issue to run the : sudo ./pgap

Internet search leads me to our internal Toolkit code: <a href="http

Wait, I just noticed not in <code class="notranslate"

Failed to run the script Error: (302.26) about pgap HOT 12 CLOSED

ncbi commented on August 19, 2024

Failed to run the script Error: (302.26)

from pgap.

Comments (12)

azat-badretdin commented on August 19, 2024

This could be related to the docker issue we discovered recently: sometimes docker instances lose connection to the Internet because ipv4 forwarding is not explicitly set. Do you have something like WARNING: IPv4 forwarding is disabled. Networking will not work. in your output as well?

from pgap.

Vladislav-Shevtsov commented on August 19, 2024

It correctly works on test genome (MG37), but failed on our assemblies and NCBI complete genomes.

from pgap.

Vladislav-Shevtsov commented on August 19, 2024

We do not have the warning: WARNING: IPv4 forwarding is disabled. Networking will not work.

from pgap.

azat-badretdin commented on August 19, 2024

Could be something else then.

It correctly works on test genome (MG37), but failed on our assemblies and NCBI complete genomes

Is it reproducible or happened only once for each genome so far?

from pgap.

Vladislav-Shevtsov commented on August 19, 2024

It's happening regularly with all genomes we try to work with. But test genome file works fine.

from pgap.

azat-badretdin commented on August 19, 2024

I just got a chance to have a look at the cwtool.log you have posted (thank you very much, very thoughtful of you).

It looks like in this particular case Error messages you are quoting are red herrings: the step(s) that have them actually completed success.

What was a real problem here is that step Prepare_Unannotated_Sequences_asnvalidate_evaluate discovered that our standard validation produced a fatal level diagnostics:

<?xml version="1.0" encoding="UTF-8"?>
<message severity="ERROR" seq-id="lcl|Razi_Pm0001" code="SEQ_DESCR_DBLinkBadSRAaccession">Bad Sequence Read Archive format - not</message>

Does your input.yaml file have by any chance locus_tag_prefix set to Razi_Pm?

Meanwhile, I am going to investigate further what exactly might have caused this diagnostics.

from pgap.

azat-badretdin commented on August 19, 2024

Internet search leads me to our internal Toolkit code:

https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/doxyhtml/validerror__desc_8cpp_source.html

Line 689,

             } else if (NStr::EqualNocase(label_str, "Sequence Read Archive")) {
                 if (fld.IsSetData() && fld.GetData().IsStrs()) {
                     const CUser_field::C_Data::TStrs& strs = fld.GetData().GetStrs();
                     ITERATE(CUser_field::C_Data::TStrs, st_itr, strs) {
                         const string& str = *st_itr;
                         if (x_IsBadSRAFormat (str)) {
                             PostErr(eDiag_Error, eErr_SEQ_DESCR_DBLinkBadSRAaccession,
                                 "Bad Sequence Read Archive format - " + str, *m_Ctx, desc);
                         }
                     }
                 }

which indicates that the error happens when the reader scans through Sequence Read Archive user sequence descriptor.

Most likely, you have an invalid SRA accession in sra: field of your submol.yaml input file.

I would presume that for testing purposes you would just use the one in the example submol.yaml (ERR2193926), which should not have caused problems. Please let me know what is exactly the value of this file in your submol.yaml file.

We will also try to alleviate the absence of information on how to set sra: field in our Wiki documentation here:

https://github.com/ncbi/pgap/wiki/Input-Files

from pgap.

azat-badretdin commented on August 19, 2024

Wait, I just noticed not in Bad Sequence Read Archive format - not message.

It looks like you intended to use a dummy SRA number. Note that this field is optional and you do not have to include it in your input submol.yaml files.

Please let me know how this works for you. Thanks!

from pgap.

Vladislav-Shevtsov commented on August 19, 2024

Azat thank you so much, you are right, we put a random name in bioproject and biosample fields (because in .yaml test file we noticed not real number for bioproject and biosample ) Do we need to leave them these fields empty?

We did not fined any SRA field in submol.yaml file

from pgap.

azat-badretdin commented on August 19, 2024

biosample and bioproject fields are indeed optional as indicated here:

https://github.com/ncbi/pgap/wiki/Input-Files

biosample - optional. BioSample ID (SAMXXX) for the sequenced sample, if available

There is still something strange: you said that you haven't used an sra: field in your input submol.yaml files, but the diagnostics mentions specifically SRA:

SEQ_DESCR_DBLinkBadSRAaccession

I would suggest to double check if you do not have sra: fields in the input submol.yaml files and in case if omitting biosample and bioproject fields still does not help and you get the same diagnostics in your cwltool.log, please run pgap.py with adding --debug option. This will keep intermediate files for further analysis and we would like to see in this case what sequences.text.asn (find . -name sequences.text.asn) has for BioSample, BioProject or Sequence Read Archive fields in user objects.

from pgap.

Vladislav-Shevtsov commented on August 19, 2024

Dear Azat,
Thank you very much! Everything is working just fine, we left bioproject and biosample fields empty and looks like it works.

from pgap.

azat-badretdin commented on August 19, 2024

You are welcome, Vlad! If you are not aware, now you can submit the results of your calculations to GenBank directly, and it will bypass internal PGAP calculation.

from pgap.

Failed to run the script Error: (302.26) about pgap HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent