Comments (4)
I think a standard JSON schema for an "error chunk" could be beneficial, and we could use it across multiple bionode-* modules, maybe something like:
{
module: 'bionode-ncbi',
message: 'cannot download assembly Stenotrophomonas ginsengisoli because it was unable to establish a connection to NCBI link',
sra_id: 12345,
reason: 'network'
}
and it could be written to stderr
while regular output sent to stdout
. This would help greatly when handling errors in a big pipeline if each tool emits errors with a standard object schema. Or maybe each "error type" could declare what it's object schema is.
However, not sure how this plays with the paradigm where stderr
is logging and stdout
is data. Perhaps can emit from same stream (logging stream - stderr
), but error object has isError: true
or something.
Not sure what exactly this schema should contain (perhaps a simple { tool, message }
is enough); might take some experimentation for us to see what is useful/required in a generic error logging utility. ndjson for errors makes perfect sense for reasons of consistency, parsability. Similar to the loading progress, there could be a "--pretty" option for transforming ndjson errors when running commands on CLI (e.g. formatting and red text).
A good place to test this out could be a pipeline that runs the same tasks on a bunch of SRA IDs, and some ideas are preemptively made incorrect (won't resolve in search), and so a bunch of reads for the pipeline won't get downloaded, but we should gracefully handle that case and give as much information to the user as possible.
There is also the issue of not "swallowing" the original errors from a module we imported (just something to remember and try to avoid), for example,
// `errorCb` is in scope
someModule.doStuff(function (err, data) {
if (err) {
// the true `err` is lost
return errorCb(new Error(`Failed downloading ${sraId}`))
}
console.log(data)
})
from bionode-ncbi.
If we make this a general feature (error handling) I think the easier implementation would be something like {module, message}, since not all modules will have the same identifiers to output to json, however for this specific case, parsing the sra id might be important. For instance, one can be interested in storing
sra_id: 12345
in some text file for posterior checking. So maybe we could append an identifier to a standard NDJSON (global to all bionode) in this module (bionode-ncbi) that has the sra_id (which could be the input provided by the user, e.g., UID or species name) and the type of run that was being performed (download, fetch and so on...).
Then a --pretty option might be easily implemented for those that do not care less for the parsing of downloading error messages, like @thejmazz said.
However, I am concerned if we should move this to a bionode module thread, since it might be considered a general issue for all bionode, and then implement all specific error handling on each module.
from bionode-ncbi.
Bioinformatic tools fail a lot for many reasons. Sometimes it's just because of one badly formatted input file out of many that we left running for the weekend while expecting to see some results by Monday. Most pipeline tools behave in a "if-error-crash" way and then wait for a human to fix and rerun them. Because of this, we want our pipelines to be robust and keep running like a web server instead of throwing errors that cause a crash.
So I think that having errors outputted as NDJSON to stderr
would make it a lot easier to deal with them. While keeping the pipeline running, we could internally do things like try again, change parameters on the fly, email human, or use an external tool like a realtime GUI dashboard that reads those NDJSONs.
I think this reinforces that --pretty
should just in fact be a bionode-pretty
module that takes stderr
and stdout
NDJSON and print them in a human friendly format with colors. So bionode-pretty
would be the only module that doesn't output NDJSON. We can still keep --pretty
as a a shortcut for each module CLI by just require('bionode-pretty')
and piping to it internally if needed, but this means having bionode-pretty
as an extra dependency.
from bionode-ncbi.
In #32 I've changed the code so that in this case we just emit one error and then give a more user friendly message. Also, now --pretty
indents the JSON output (similar to bionode-ncbi | json
). However, I still think we need to improve a lot how errors are handled in general in all modules, so I think we should continue this discussion in bionode/bionode#41
from bionode-ncbi.
Related Issues (20)
- Implement efetch API
- Download the GFF file of the Cycas taitungensis mitochondrion HOT 9
- Uncaught, unspecified "error" event HOT 1
- How to look up metadata for an SRR file HOT 1
- Tests are broken because of NCBI side metadata changes (again) HOT 2
- rna_from_genomic instead of genomic.fna
- geo is not searchable HOT 5
- download in the examples printing too much info HOT 3
- We need a smaller assembly from NCBI for testing HOT 3
- Git rid of git-lfs
- User-friendly CLI
- Modularize lib structure.
- Output some warning to terminal when there is no output
- Add Blast API HOT 2
- Use progress bar on download HOT 5
- Downloading one run download the whole Bioproject
- TypeError: Cannot create property 'Run' on string ' ' HOT 1
- Dependencies error: Prototype Pollution (lodash dependencies)
- Unexpected error when running 'bionode' HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bionode-ncbi.