Giter Site home page Giter Site logo

Comments (4)

thejmazz avatar thejmazz commented on June 26, 2024

I think a standard JSON schema for an "error chunk" could be beneficial, and we could use it across multiple bionode-* modules, maybe something like:

{
   module: 'bionode-ncbi',
   message: 'cannot download assembly Stenotrophomonas ginsengisoli because it was unable to establish a connection to NCBI link',
   sra_id: 12345,
   reason: 'network'
}

and it could be written to stderr while regular output sent to stdout. This would help greatly when handling errors in a big pipeline if each tool emits errors with a standard object schema. Or maybe each "error type" could declare what it's object schema is.

However, not sure how this plays with the paradigm where stderr is logging and stdout is data. Perhaps can emit from same stream (logging stream - stderr), but error object has isError: true or something.

Not sure what exactly this schema should contain (perhaps a simple { tool, message } is enough); might take some experimentation for us to see what is useful/required in a generic error logging utility. ndjson for errors makes perfect sense for reasons of consistency, parsability. Similar to the loading progress, there could be a "--pretty" option for transforming ndjson errors when running commands on CLI (e.g. formatting and red text).

A good place to test this out could be a pipeline that runs the same tasks on a bunch of SRA IDs, and some ideas are preemptively made incorrect (won't resolve in search), and so a bunch of reads for the pipeline won't get downloaded, but we should gracefully handle that case and give as much information to the user as possible.

There is also the issue of not "swallowing" the original errors from a module we imported (just something to remember and try to avoid), for example,

// `errorCb` is in scope

someModule.doStuff(function (err, data) {
   if (err) {
      // the true `err` is lost
      return errorCb(new Error(`Failed downloading ${sraId}`))
  }
 
  console.log(data)
})

from bionode-ncbi.

tiagofilipe12 avatar tiagofilipe12 commented on June 26, 2024

If we make this a general feature (error handling) I think the easier implementation would be something like {module, message}, since not all modules will have the same identifiers to output to json, however for this specific case, parsing the sra id might be important. For instance, one can be interested in storing

sra_id: 12345

in some text file for posterior checking. So maybe we could append an identifier to a standard NDJSON (global to all bionode) in this module (bionode-ncbi) that has the sra_id (which could be the input provided by the user, e.g., UID or species name) and the type of run that was being performed (download, fetch and so on...).

Then a --pretty option might be easily implemented for those that do not care less for the parsing of downloading error messages, like @thejmazz said.

However, I am concerned if we should move this to a bionode module thread, since it might be considered a general issue for all bionode, and then implement all specific error handling on each module.

from bionode-ncbi.

bmpvieira avatar bmpvieira commented on June 26, 2024

Bioinformatic tools fail a lot for many reasons. Sometimes it's just because of one badly formatted input file out of many that we left running for the weekend while expecting to see some results by Monday. Most pipeline tools behave in a "if-error-crash" way and then wait for a human to fix and rerun them. Because of this, we want our pipelines to be robust and keep running like a web server instead of throwing errors that cause a crash.

So I think that having errors outputted as NDJSON to stderr would make it a lot easier to deal with them. While keeping the pipeline running, we could internally do things like try again, change parameters on the fly, email human, or use an external tool like a realtime GUI dashboard that reads those NDJSONs.

I think this reinforces that --pretty should just in fact be a bionode-pretty module that takes stderr and stdout NDJSON and print them in a human friendly format with colors. So bionode-pretty would be the only module that doesn't output NDJSON. We can still keep --pretty as a a shortcut for each module CLI by just require('bionode-pretty') and piping to it internally if needed, but this means having bionode-pretty as an extra dependency.

from bionode-ncbi.

bmpvieira avatar bmpvieira commented on June 26, 2024

In #32 I've changed the code so that in this case we just emit one error and then give a more user friendly message. Also, now --pretty indents the JSON output (similar to bionode-ncbi | json). However, I still think we need to improve a lot how errors are handled in general in all modules, so I think we should continue this discussion in bionode/bionode#41

from bionode-ncbi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.