Often I lose connection to NCBI server (for some mysterious reason related with my rou

In <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id=

Error handling when NCBI connection is lost about bionode-ncbi HOT 4 CLOSED

bionode commented on June 26, 2024

Error handling when NCBI connection is lost

from bionode-ncbi.

Comments (4)

thejmazz commented on June 26, 2024

I think a standard JSON schema for an "error chunk" could be beneficial, and we could use it across multiple bionode-* modules, maybe something like:

{
   module: 'bionode-ncbi',
   message: 'cannot download assembly Stenotrophomonas ginsengisoli because it was unable to establish a connection to NCBI link',
   sra_id: 12345,
   reason: 'network'
}

and it could be written to stderr while regular output sent to stdout. This would help greatly when handling errors in a big pipeline if each tool emits errors with a standard object schema. Or maybe each "error type" could declare what it's object schema is.

However, not sure how this plays with the paradigm where stderr is logging and stdout is data. Perhaps can emit from same stream (logging stream - stderr), but error object has isError: true or something.

Not sure what exactly this schema should contain (perhaps a simple { tool, message } is enough); might take some experimentation for us to see what is useful/required in a generic error logging utility. ndjson for errors makes perfect sense for reasons of consistency, parsability. Similar to the loading progress, there could be a "--pretty" option for transforming ndjson errors when running commands on CLI (e.g. formatting and red text).

A good place to test this out could be a pipeline that runs the same tasks on a bunch of SRA IDs, and some ideas are preemptively made incorrect (won't resolve in search), and so a bunch of reads for the pipeline won't get downloaded, but we should gracefully handle that case and give as much information to the user as possible.

There is also the issue of not "swallowing" the original errors from a module we imported (just something to remember and try to avoid), for example,

// `errorCb` is in scope

someModule.doStuff(function (err, data) {
   if (err) {
      // the true `err` is lost
      return errorCb(new Error(`Failed downloading ${sraId}`))
  }
 
  console.log(data)
})

from bionode-ncbi.

tiagofilipe12 commented on June 26, 2024

If we make this a general feature (error handling) I think the easier implementation would be something like {module, message}, since not all modules will have the same identifiers to output to json, however for this specific case, parsing the sra id might be important. For instance, one can be interested in storing

sra_id: 12345

in some text file for posterior checking. So maybe we could append an identifier to a standard NDJSON (global to all bionode) in this module (bionode-ncbi) that has the sra_id (which could be the input provided by the user, e.g., UID or species name) and the type of run that was being performed (download, fetch and so on...).

Then a --pretty option might be easily implemented for those that do not care less for the parsing of downloading error messages, like @thejmazz said.

However, I am concerned if we should move this to a bionode module thread, since it might be considered a general issue for all bionode, and then implement all specific error handling on each module.

from bionode-ncbi.

bmpvieira commented on June 26, 2024

Bioinformatic tools fail a lot for many reasons. Sometimes it's just because of one badly formatted input file out of many that we left running for the weekend while expecting to see some results by Monday. Most pipeline tools behave in a "if-error-crash" way and then wait for a human to fix and rerun them. Because of this, we want our pipelines to be robust and keep running like a web server instead of throwing errors that cause a crash.

So I think that having errors outputted as NDJSON to stderr would make it a lot easier to deal with them. While keeping the pipeline running, we could internally do things like try again, change parameters on the fly, email human, or use an external tool like a realtime GUI dashboard that reads those NDJSONs.

I think this reinforces that --pretty should just in fact be a bionode-pretty module that takes stderr and stdout NDJSON and print them in a human friendly format with colors. So bionode-pretty would be the only module that doesn't output NDJSON. We can still keep --pretty as a a shortcut for each module CLI by just require('bionode-pretty') and piping to it internally if needed, but this means having bionode-pretty as an extra dependency.

from bionode-ncbi.

bmpvieira commented on June 26, 2024

In #32 I've changed the code so that in this case we just emit one error and then give a more user friendly message. Also, now --pretty indents the JSON output (similar to bionode-ncbi | json). However, I still think we need to improve a lot how errors are handled in general in all modules, so I think we should continue this discussion in bionode/bionode#41

from bionode-ncbi.

Error handling when NCBI connection is lost about bionode-ncbi HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent