Giter Site home page Giter Site logo

broadinstitute / cromshell Goto Github PK

View Code? Open in Web Editor NEW
52.0 29.0 14.0 700 KB

CLI for interacting with Cromwell servers

License: BSD 3-Clause "New" or "Revised" License

Shell 16.53% WDL 25.05% Python 58.42%
cromwell wdl cli bioinformatics workflow

cromshell's Introduction

                  __                                                            __
       .,-;-;-,. /'_\     +-----------------------------------------------+    /_'\.,-;-;-,.
     _/_/_/_|_\_\) /      |  CROMSHELL : run Cromwell jobs from the shell |     \ (/_/__|_\_\_
   '-<_><_><_><_>=/\      +-----------------------------------------------+     /\=<_><_><_><_>-'
     `/_/====/_/-'\_\                                                          /_/'-\_\====\_\'
      ""     ""    ""                                                          ""    ""     ""

Cromshell

GitHub version Integration Test Workflow Unit Test Workflow License: MIT

Cromshell is a CLI for submitting workflows to a Cromwell server and monitoring/querying their results.

Examples:

         cromshell submit workflow.wdl inputs.json options.json dependencies.zip
         cromshell status
         cromshell -t 20 metadata
         cromshell logs -2

Supported Options:

  • --no_turtle or --I_hate_turtles
    • Hide turtle logo
  • --cromwell_url [TEXT]
    • Specifies Cromwell URL used.
    • TEXT Example: http://65.61.654.8:8000
    • Note: Depending on your Cromwell server configuration, you may not need to specify the port.
  • -t [TIMEOUT]
    • Specifies the server connection timeout in seconds.
    • Default is 5 sec.
    • TIMEOUT must be a positive integer.
  • --gcloud_token_email [TEXT]
    • Call gcloud auth print-access-token with this email and add the token as an auth header to requests.
  • --referer_header_url [TEXT]
    • For servers that require a referer, supply this URL in the Referer: header.
  • One of --machine_processable or --colorful_output
    • Override the automatically determined output coloring setting.
    • Otherwise the output will be colored if it detects that it's connected to an interactive terminal.

Supported Subcommands:

Start/Stop workflows

  • submit [-w] <wdl> <inputs_json> [options_json] [included_wdl_zip_file]
    • Automatically validates the WDL and JSON file.
    • Submit a new workflow to the Cromwell server.
    • -w [COMING SOON] Wait for workflow to transition from 'Submitted' to some other status before ${SCRIPTNAME} exits.
    • included_wdl_zip_file Zip file containing any WDL files included in the input WDL
  • abort [workflow-id] [[workflow-id]...]
    • Abort a running workflow.

Workflow information:

  • alias <workflow-id> <alias_name>
    • Label the given workflow ID with the given alias_name. Aliases can be used in place of workflow IDs to reference jobs.
    • Remove an alias by passing empty double quotes as alias_name (e.g. alias <workflow-id> "")

Query workflow status:

  • status [workflow-id] [[workflow-id]...]
    • Check the status of a workflow.
  • metadata [workflow-id] [[workflow-id]...]
    • Get the full metadata of a workflow.
  • slim-metadata [workflow-id] [[workflow-id]...]
    • Get a subset of the metadata from a workflow.
  • counts [-j] [-x] [workflow-id] [[workflow-id]...]
    • Get the summarized status of all jobs in the workflow.
    • -j prints a JSON instead of a pretty summary of the execution status (compresses subworkflows)
    • -x compress sub-workflows for less detailed summarization
  • timing [workflow-id] [[workflow-id]...]
    • Open the timing diagram in a browser.

Logs

  • logs [workflow-id] [[workflow-id]...]
    • List the log files produced by a workflow.
  • [COMING SOON] fetch-logs [workflow-id] [[workflow-id]...]
    • Download all logs produced by a workflow.

Job Outputs

  • list-outputs [workflow-id] [[workflow-id]...]
    • List all output files produced by a workflow.
  • [COMING SOON] fetch-all [workflow-id] [[workflow-id]...]
    • Download all output files produced by a workflow.

Display a list jobs submitted through cromshell

  • list [-c] [-u]
    • -c Color the output by completion status.
    • -u Check completion status of all unfinished jobs.

Clean up local cached list

  • [COMING SOON] cleanup [-s STATUS]
    • Remove completed jobs from local list. This command removes all jobs from the local list that are in a completed state, where a completed state is one of: Succeeded, Failed, Aborted
    • -s [STATUS] If provided, will only remove jobs with the given [STATUS] from the local list.

Update cromwell server

  • update-server
    • Change the cromwell server that new jobs will be submitted to.

Get cost for a workflow

  • cost [-c] [-d] [workflow-id] [[workflow-id]...]
    • Get the cost for a workflow.
    • Only works for workflows that completed more than 24 hours ago on GCS. See Google Cost Exporting Documentation
    • Billing export to BigQuery must be enabled for your GCP billing project. See Setup billing data export to BigQuery.
    • Requires the bq_cost_table key to exist in the cromshell configuration file and have a value equal to the BigQuery cost table for your GCP billing project.
      • For example, your ~/.cromshell/cromshell_config.json should contain:

        {
          "cromwell_server": "<cromwell_server_url>",
          "requests_timeout": 5,
          "bq_cost_table": "<table_name>"
        }

        where <table_name> can be found by navigating to BigQuery, selecting the appropriate google project, and locating the table containing cost information.

        BigQuery example image

        Clicking on the table and opening the "DETAILS" tab, you'll find the exact path to the table in the "Table ID" section. Everything after the google project name (after the first .) should be included in <table_name>.

    • -c/--color Color outliers in task level cost results.
    • -d/--detailed Get the cost for a workflow at the task level.

Validate WDL

  • validate [wdl] [input json] --dependencies-zip [wdl_zip_file]
    • Validate a WDL file.
    • Runs both miniwdl and womtool validation by default, but can be configured to run only one or the other.
    • Womtool validation via Cromwell server API does not support validation of imported files, however miniwdl does.
    • --dependencies-zip MiniWDL option: ZIP file or directory containing workflow source files that are used to resolve local imports.

Features:

  • Running submit will create a new folder in the ~/.cromshell/${CROMWELL_URL}/ directory named with the cromwell job id of the newly submitted job.
    It will copy your wdl and json inputs into the folder for reproducibility.
  • It keeps track of your most recently submitted jobs by storing their ids in ./cromshell/
    You may omit the job ID of the last job submitted when running commands, or use negative numbers to reference previous jobs, e.g. "-1" will track the last job, "-2" will track the one before that, and so on.
  • You can override the default cromwell server by setting the argument --cromwell_url to the appropriate URL.
  • You can override the default cromshell configuration folder by setting the environmental variable CROMSHELL_CONFIG to the appropriate directory.
  • Most commands takes multiple workflow-ids, which you can specify both in relative and absolute ID value (i.e. cromshell status -- -1 -2 -3 c2db2989-2e09-4f2c-8a7f-c3733ae5ba7b).
  • Assign aliases to workflow ids using the alias command (i.e. cromshell alias -- -1 myAliasName). Once the Alias command is used to attach an alias to a workflow id, the alias name can be used instead of the id (i.e. cromshell status myAliasName).

Installation

From brew

brew tap broadinstitute/dsp
brew install cromshell

From pypi

pip install cromshell

From source

git clone [email protected]:broadinstitute/cromshell.git
cd cromshell
pip install .

cromshell --help

Uninstallation

From brew

brew uninstall cromshell

From pypi/source

pip uninstall cromshell

Development

See the Developer Docs

Legacy Cromshell

The original Cromshell shell script is still available in the legacy_cromshell folder and in the cromshell1 branch of this repository. It is no longer maintained, but is still available for use. The original Cromshell contains some commands not yet available in Cromshell2, such as fetch-logs, fetch-all, notify, and cleanup. These commands will be added to Cromshell2 in the future.

cromshell's People

Contributors

aednichols avatar bshifaw avatar cjllanwarne avatar evantheb avatar eviewan avatar gileshall avatar huangzhibo avatar jamesemery avatar jonn-smith avatar kshakir avatar kvg avatar lbergelson avatar meganshand avatar mwalker174 avatar shuang-broad avatar sjfleming avatar tedbrookings avatar tlangs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cromshell's Issues

Cromshell should check if there is a new release when run

When cromshell is run it should check github for the latest release. If that release is newer than the cromshell that is being run, a message should appear to the user to get the latest.

This primarily applies to when it is installed via brew or bioconda, but could be generally useful.

cromshell notify isn't

Cromshell notify doesn't seem to be working.

ex:

$ cromshell notify -1 gsa5.broadinstitute.org [email protected]
Sub-Command: notify
                  __
       .,-;-;-,. /'_\
     _/_/_/_|_\_\) /
   '-<_><_><_><_>=/\
     `/_/====/_/-'\_\
      ""     ""    ""
Creating notification daemon on host gsa5.broadinstitute.org ...
ERROR: Could not copy cromshell to server gsa5.broadinstitute.org

alternate failure mode:

cromshell notify -1 [email protected] [email protected]
Sub-Command: notify
Spinning off notification to [email protected] thread for
    workflow:             5daec840-f4b2-4c26-9fae-da8122920919
    from Cromwell server: https://cromwell-v36.dsde-methods.broadinstitute.org
...
Spun off thread on PID 99253

which looks like a success but doesn't ever send an email

Keep track of which cromwell server a job was run on

Since people use multiple cromwell servers it would be good if we kept track of which one each job ran on. That way when you run a command like status -2 you don't have to manually switch the server environment variable if you're running in multiple different environments.

bioconda recipe

Thanks for cromshell. I added a bioconda recipe for it -- maybe mention in the docs it's installable through bioconda. @lbergelson

Bug Report: Cromshell is ignorant of "RetryableFailure"s in WILL_FAIL status

I have encountered an issue with cromshell where I am running a job with a scatter (and consequently a subworkflow being spawned) that according to a cromshell list -uc command is in the following state:

20191202_141144 <CROMWELL SERVER>  2d987529-7803-48af-9534-4960b85c1ef6                                                                                                                         evaluate.wdl                             WILL_FAIL

Upon investigation running cromshell execution-status-count 2d987529-7803-48af-9534-4960b85c1ef6 returns "Running": 1 as it is still hacking away at the subworkflow: 5e9f7dc2-972b-40dc-91d7-c4538eae7076. When I run the cromshell execution-status-count 5e9f7dc2-972b-40dc-91d7-c4538eae7076 command I find that the subworkflow has the following status:

    "Done": 42,
    "RetryableFailure": 14,
    "Running": 8

Upon inspection all of the failures in the metadata are attributed to PAPI error code 10, which is a known headache inducing error condition where the job is killed that can be circumvented by adding a maxRetries argument to your wdl tasks to force cromwell to rerun certain PAPI error codes. Unfortunately the WILL_FAIL status seems to be interpreting these failures that cromwell is able to disambiguate to "RetryableFailure" as a failure state making my cromshell list output confusing. Perhaps there is a way to detect "RetryableFailure" and exclude those from the WILL_FAIL logic?

Feature request: print/download FAILED logs only

cromshell really needs a command that can print and/or download logs for JUST the failed tasks. Currently we have to either download all logs or grep through the metadata for failures and log file links.

list-outputs doesn't list all outputs

List outputs currently compares with the local disk to see what files have yet to be downloaded.

If you partially download outputs then cancel, the displayed list will be only the files left to be downloaded.

This is bad.

add tests

This is totally untested. If we actually want to use this thing, we should make sure it has tests.

Replace more `echo` calls with `error` calls

Ideally, most of the output of this script should go to stderr, so you can script it out easily.

There are a couple of calls to echo embedded in the notify and related functions. These should be removed while making sure to preserve the reports back to the user of what process was spun off and what it is notifying.

Allow dependencies file without an options.json

Currently you can't submit a job that has a dependencies.zip without also including an options.json.

Currently you can work around this issue by submitting an options.json with no keys.

options.json

{}

It would be good if you could skip the options.json when you don't need it.

Extra column for 'list' specifying the input JSON

While developing new workflows, I often run one workflow per sample. Checking on the status of each sample is currently a bit painful because if a workflow fails, I need to dig through the metadata to figure out which sample it was.

For example, while trying to analyze four test samples and then running cromshell list -u -c, I get this:

DATE CROMWELL_SERVER RUN_ID WDL_NAME STATUS
20190929_231921 https://cromwell-v45.dsde-methods.broadinstitute.org eb5d37a7-b518-4cff-be8f-2ce8655881a4 NanoporeRNA.wdl Running
20190929_231946 https://cromwell-v45.dsde-methods.broadinstitute.org c35da98c-b8e9-40bf-9670-afab75a9bc5b NanoporeRNA.wdl Failed
20190929_231959 https://cromwell-v45.dsde-methods.broadinstitute.org e626c920-838b-4917-9582-3e42cffcbc03 NanoporeRNA.wdl Succeeded
20190929_232029 https://cromwell-v45.dsde-methods.broadinstitute.org 179fb6fd-71d8-4b16-8fbc-7252a8897969 NanoporeRNA.wdl Succeeded

My goal is to see at a glance which workflow specifically failed, but the current status list does not immediately tie a workflow to the input I ran it with. I currently run cromshell metadata on the particular RUN_ID and then work out from the metadata which sample I must be looking at.

Instead, I'd love to see another column here that tells me what I supplied as the input JSON (which I generally name things like SampleA.json, SampleB.json, etc.). That way, I'd know exactly which sample was completed, still running, or had failed without having to look anything else up.

Since the table is already pretty wide, adding a column may overflow the line, and that wouldn't look very nice. But I'd be happy to not see the CROMWELL_SERVER column anymore; I typically don't care about that column in my day-to-day work.

Tab completion for commands

We desperately need this. I'm lazy and typing the same commands over and over is hard on my poor fingers.

Allow for cromwell server configurations to contain non-default ports

Currently cromshell assumes that the default ports are the ones used for communications with the cromwell server.

The script should allow for an alternate port to be specified ala URL:port.

The primary updates will be required in assertCanCommunicateWithServer and other places that use ping. The curl calls will support this without issue.

Add support for cromwell behind authorization layer

Cromshell is currently useless for use against a cromwell server that requires authentication. Adding an ability to use a bearer token, by either passing it in as input specifically or using some env variable, would make allow us to use Cromshell against more of our servers.

conda installs outdated version of womtool

Installation of cromshell with conda causes all submissions of WDL 1.0 scripts to fail. This is because the distribution of cromwell installed as a dependency is outdated.

Timing result seemingly off for methods server 47

Example is

https://cromwell-v47.dsde-methods.broadinstitute.org/api/workflows/v1/2c14a767-176e-458a-a32b-43fe494ace17/timing

which shows computing time less than the actual computing time, e.g. here.

The UI is also not showing localization/delocalization time either.

Not sure if it is an issue of cromshell or something else.

running cromshell with no argument produces "invalid subcommand"

Running cromshell with no subcommand produces and error message instead of a simple usage. It should probably be switched back to it's old behavior.

$ cromshell

cromshell : invalid sub-command:

Usage:    cromshell SUB-COMMAND [options]
Run and inspect workflows on a Cromwell server.
Try `cromshell -h' for more information.

old behavior:

Usage:    cromshell <subcommand> [options]
Run and inspect workflows on a Cromwell server.

Provide "Failing" status

When one job failed, but another is still running, the status of the workflow is "Running". This can be misleading and hide the fact that a task has failed and that the workflow will likely fail.

this jq filter

[.calls|.[]|map(.executionStatus)|.[-1]]|any(.=="Failed")

gets me part of the way there, but I don't know how to modify the printTaskStatus script so that it converts a "Running" status to a "Failing" status in that case....

list-outputs functionality is broken

list-outputs seems to be broken:

Sub-Command: list-outputs
Using workflow-id == e9da649f-807e-4ddc-8e80-4b63192ae1d3
Using workflow server URL == https://cromwell-v39.dsde-methods.broadinstitute.org
                  __     
       .,-;-;-,. /'_\    
     _/_/_/_|_\_\) /     
   '-<_><_><_><_>=/\     
     `/_/====/_/-'\_\    
      ""     ""    ""    
                  __     
       .,-;-;-,. /'_\    
     _/_/_/_|_\_\) /     
   '-<_><_><_><_>=/\     
     `/_/====/_/-'\_\    
      ""     ""    ""    
Could not parse JSON output from cromwell server.
Output files from job https://cromwell-v39.dsde-methods.broadinstitute.org:e9da649f-807e-4ddc-8e80-4b63192ae1d3 : 

better error messages - empty json files

Currently if you try to submit a workflow with empty json files (because they are just being used as placeholders), you get "There was an internal server error." Making the jsons non-empty corrects this problem and it would be nice if the error message could reflect that.

Compatibility issues with newer version of curl

After some testing, it seems that the curl command for submit silently fails with the message (26) read function returned funny value for some recent versions of curl which causes the script to get back an empty workflowID. It turns out that when no workflowOptions file is submitted that removing -F workflowOptions=@ from the curl command seems to fix the issue. The script should be updated to avoid empty files if there are none specified.

add command to switch cromwell server

It would be useful to be able to switch cromwell servers with a command. cromshell set-server

It would be useful to be able to create a list of named aliases for servers.

ie.

cromshell add-server methods36 https://cromwell-v36.dsde-methods.broadinstitute.org
cromshell set-server methods36

Add an option to alias commands when running cromshell submit.

Right now if I submit a job as part of some test regime using cromwell or cromshell, I get back a UUID for the task I am running which gets saved for internal cromwell requests. If I am trying to submit many jobs at once it is incumbent on me to keep track of these UUIDs and remember which ones correspond to which trials. This is error prone and trying to keep track of a list of IDs in the format XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXX becomes cumbersome quickly. I would like to be able to specific an argument on the command line (eg. cromshell submit --alias "myTaskName" wdl.wdl json.json) with the idea being that I could then use the specified "myTaskName" to request that task in the future (eg. cromshell logs "myTaskName" as opposed to cromshell logs XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXX).

Add the ability to clear out results from the list command

Currently the list command prints out all of the submitted tasks in your history. Unfortunately that is unbounded and as time goes on I suspect it will start to print out hundreds of forgotten job statuses that nobody wants anymore. A better solution would be to make the list default to returning the last n commands with an argument to print out more if necessary. An alternative would be to add a cleanup command to the history that removes old jobs. Alternatively still the list could be made scrollable though that sounds like a more serious change.

rename cromwell script to cromshell

The name Cromwell conflicts with the actual cromwell command line tools. Would people object to renaming it cromshell? You could always alias it back if you don't also use a local cromwell.

execution-status-count ignores -x if output isn't prettified

Executions status count should only check subworkflows if -x is enabled, but by default it downloads all the metadata for subworkflows.

Because of the bug in #107 it wasn't displaying subworkflow data in unpretty form with or with -x but it always requests it. This could be expensive for very large workflows.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.