Giter Site home page Giter Site logo

broadinstitute / pipelines-tools Goto Github PK

View Code? Open in Web Editor NEW

This project forked from googlegenomics/pipelines-tools

0.0 5.0 0.0 97 KB

Tools for developing and running pipelines with the Genomics API

License: Apache License 2.0

Dockerfile 4.12% Go 93.01% Shell 2.87%

pipelines-tools's Introduction

Google Genomics Pipelines Tools

Build Status

This repository contains various tools that are useful when running pipelines with the Google Genomics API.

Quick Start Using Cloud Shell

  1. Enable the Genomics API and the Compute Engine API in a new or existing Google Cloud project.

  2. Start a Cloud Shell inside your project.

  3. Inside the Cloud Shell, run the command

     go get github.com/googlegenomics/pipelines-tools/...
    

    This command downloads and installs the pipelines tools. Note that to build these tools outside the Cloud Shell you will need the Go tool chain.

  4. Make a bucket on GCS to store the output from the pipeline:

     export BUCKET=gs://${GOOGLE_CLOUD_PROJECT}-pipelines
     gsutil mb ${BUCKET}
    
  5. Put some test data into the bucket:

     echo "Hello World" | gsutil cp - ${BUCKET}/input
    
  6. Make a pipeline script that computes the SHA1 sum of a file:

     echo 'sha1sum ${INPUT0} > ${OUTPUT0}' > sha1.script
    
  7. Run the script using the pipelines API:

     pipelines run --inputs=${BUCKET}/input --outputs=${BUCKET}/output sha1.script
    
  8. Check the generated output file:

     gsutil cat ${BUCKET}/output
    

That's it: you've run your first pipeline. For more information about the input formats supported by the pipelines tool, check out the source code. To learn more about the Pipelines API, consult the reference documentation.

Usage

The pipelines tool

This tool provides support for running, cancelling and inspecting pipelines.

As a simple example, to run a pipeline that prints 'hello world':

$ cat <<EOF > hello.script
echo "hello world"
EOF
$ pipelines --project=my-project run hello.script --output=gs://my-bucket/logs

After the pipeline finishes, you can inspect the output using gsutil:

$ gsutil cat gs://my-bucket/logs/output

The script file format is described in the source code for the command.

Using gcsfuse with the pipelines tool

Use --fuse flag to allow the pipelines tool to use gcsfuse to localize input files instead of copying them one by one with gsutil.

Note: Files other than those directly mentioned by the --inputs flag will be available to container, since the entire bucket is mounted.

SSH into the worker machine

The --ssh flag supported by the pipelines tool will start an ssh container in the background to allow you to log in using SSH and view logs in real time.

The migrate-pipeline tool

This tool takes a JSON encoded v1alpha2 run pipeline request and attempts to emit a v2alpha1 request that replicates the same behaviour.

For example, given a file v1.jsonpb that has a request containing a v1alpha2 ephemeral pipeline and arguments, running:

$ migrate-pipeline < v1.jsonpb

will produce a v2alpha1 request that performs the same action on standard output.

Support

Please report problems using the issue tracker.

pipelines-tools's People

Contributors

anamanolache avatar gkelly avatar sbabitz avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.