Giter Site home page Giter Site logo

Comments (12)

azat-badretdin avatar azat-badretdin commented on August 19, 2024

This could be related to the docker issue we discovered recently: sometimes docker instances lose connection to the Internet because ipv4 forwarding is not explicitly set. Do you have something like WARNING: IPv4 forwarding is disabled. Networking will not work. in your output as well?

from pgap.

Vladislav-Shevtsov avatar Vladislav-Shevtsov commented on August 19, 2024

It correctly works on test genome (MG37), but failed on our assemblies and NCBI complete genomes.

from pgap.

Vladislav-Shevtsov avatar Vladislav-Shevtsov commented on August 19, 2024

We do not have the warning: WARNING: IPv4 forwarding is disabled. Networking will not work.

from pgap.

azat-badretdin avatar azat-badretdin commented on August 19, 2024

Could be something else then.

It correctly works on test genome (MG37), but failed on our assemblies and NCBI complete genomes

Is it reproducible or happened only once for each genome so far?

from pgap.

Vladislav-Shevtsov avatar Vladislav-Shevtsov commented on August 19, 2024

It's happening regularly with all genomes we try to work with. But test genome file works fine.

from pgap.

azat-badretdin avatar azat-badretdin commented on August 19, 2024

I just got a chance to have a look at the cwtool.log you have posted (thank you very much, very thoughtful of you).

It looks like in this particular case Error messages you are quoting are red herrings: the step(s) that have them actually completed success.

What was a real problem here is that step Prepare_Unannotated_Sequences_asnvalidate_evaluate discovered that our standard validation produced a fatal level diagnostics:

<?xml version="1.0" encoding="UTF-8"?>
<message severity="ERROR" seq-id="lcl|Razi_Pm0001" code="SEQ_DESCR_DBLinkBadSRAaccession">Bad Sequence Read Archive format - not</message>

Does your input.yaml file have by any chance locus_tag_prefix set to Razi_Pm?

Meanwhile, I am going to investigate further what exactly might have caused this diagnostics.

from pgap.

azat-badretdin avatar azat-badretdin commented on August 19, 2024

Internet search leads me to our internal Toolkit code:

https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/doxyhtml/validerror__desc_8cpp_source.html

Line 689,

             } else if (NStr::EqualNocase(label_str, "Sequence Read Archive")) {
                 if (fld.IsSetData() && fld.GetData().IsStrs()) {
                     const CUser_field::C_Data::TStrs& strs = fld.GetData().GetStrs();
                     ITERATE(CUser_field::C_Data::TStrs, st_itr, strs) {
                         const string& str = *st_itr;
                         if (x_IsBadSRAFormat (str)) {
                             PostErr(eDiag_Error, eErr_SEQ_DESCR_DBLinkBadSRAaccession,
                                 "Bad Sequence Read Archive format - " + str, *m_Ctx, desc);
                         }
                     }
                 }

which indicates that the error happens when the reader scans through Sequence Read Archive user sequence descriptor.

Most likely, you have an invalid SRA accession in sra: field of your submol.yaml input file.

I would presume that for testing purposes you would just use the one in the example submol.yaml (ERR2193926), which should not have caused problems. Please let me know what is exactly the value of this file in your submol.yaml file.

We will also try to alleviate the absence of information on how to set sra: field in our Wiki documentation here:

https://github.com/ncbi/pgap/wiki/Input-Files

from pgap.

azat-badretdin avatar azat-badretdin commented on August 19, 2024

Wait, I just noticed not in Bad Sequence Read Archive format - not message.

It looks like you intended to use a dummy SRA number. Note that this field is optional and you do not have to include it in your input submol.yaml files.

Please let me know how this works for you. Thanks!

from pgap.

Vladislav-Shevtsov avatar Vladislav-Shevtsov commented on August 19, 2024

Azat thank you so much, you are right, we put a random name in bioproject and biosample fields (because in .yaml test file we noticed not real number for bioproject and biosample ) Do we need to leave them these fields empty?

We did not fined any SRA field in submol.yaml file

from pgap.

azat-badretdin avatar azat-badretdin commented on August 19, 2024

biosample and bioproject fields are indeed optional as indicated here:

https://github.com/ncbi/pgap/wiki/Input-Files

biosample - optional. BioSample ID (SAMXXX) for the sequenced sample, if available

There is still something strange: you said that you haven't used an sra: field in your input submol.yaml files, but the diagnostics mentions specifically SRA:

SEQ_DESCR_DBLinkBadSRAaccession

I would suggest to double check if you do not have sra: fields in the input submol.yaml files and in case if omitting biosample and bioproject fields still does not help and you get the same diagnostics in your cwltool.log, please run pgap.py with adding --debug option. This will keep intermediate files for further analysis and we would like to see in this case what sequences.text.asn (find . -name sequences.text.asn) has for BioSample, BioProject or Sequence Read Archive fields in user objects.

from pgap.

Vladislav-Shevtsov avatar Vladislav-Shevtsov commented on August 19, 2024

Dear Azat,
Thank you very much! Everything is working just fine, we left bioproject and biosample fields empty and looks like it works.

from pgap.

azat-badretdin avatar azat-badretdin commented on August 19, 2024

You are welcome, Vlad! If you are not aware, now you can submit the results of your calculations to GenBank directly, and it will bypass internal PGAP calculation.

from pgap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.