Giter Site home page Giter Site logo

bpipe's Introduction

Welcome to Bpipe Tests

<style type='text/css'> .col-md-3 { display: none; } </style>

Bpipe provides a platform for running data analytic workgflows that consist of a series of processing stages - known as 'pipelines'. Bpipe has special features to help with specific challenges in Bioinformatics and computational biology.

Bpipe has been published in Bioinformatics! If you use Bpipe, please cite:

Sadedin S, Pope B & Oshlack A, Bpipe: A Tool for Running and Managing Bioinformatics Pipelines, Bioinformatics

Example

 hello = {
    exec """
        echo "hello world" > $output.txt
    """
 }
 run { hello }

Why Bpipe?

Many people working in data science end up running jobs as custom shell (or similar) scripts. While this makes running them easy it has a lot of limitations. By turning your shell scripts into Bpipe scripts, here are some of the features you can get:

  • Dependency Tracking - Like make and similar tools, Bpipe knows what you already did and won't do it again
  • Simple definition of tasks to run - Bpipe runs shell commands almost as-is : super low friction between what works in your command line and what you need to put into your script
  • Transactional management of tasks - commands that fail get outputs cleaned up, log files saved and the pipeline cleanly aborted. No out of control jobs going crazy.
  • Automatic Connection of Pipeline Stages - Bpipe manages the file names for input and output of each stage in a systematic way so that you don't need to think about it. Removing or adding new stages "just works" and never breaks the flow of data.
  • Job Management - know what jobs are running, start, stop, manage whole workflows with simple commands
  • Easy Parallelism - split jobs into many pieces and run them all in parallel whether on a cluster, cloud or locally. Separate configuration of parallelism from the definition of the tasks.
  • Audit Trail - keeps a journal of exactly which commands executed, when and what their inputs and outputs were.
  • Integration with Compute Providers - pure Bpipe scripts can run unchanged whether locally, on your server, or in cloud or traditional HPC back ends such as Torque, SLURM GridEngine or others.
  • Deep Integretation Options - Bpipe integrates well with other systems: receive alerts to tell you when your pipeline finishes or even as each stage completes, call REST APIs, send messages to queueing systems and easily use any type of integration available within the Java ecosystem.
  • See how Bpipe compares to similar tools

Ready for More?

Take a look at the Overview to see Bpipe in action, work through the Basic Tutorial for simple first steps, see a step by step example of a realistic pipeline made using Bpipe, or take a look at the Reference to see all the documentation.

bpipe's People

Contributors

bw2 avatar dad avatar david-ma avatar doctormo avatar drtconway avatar druvus avatar gdevenyi avatar lonsbio avatar pditommaso avatar sajp avatar slugger70 avatar ssadedin avatar ssayols avatar tommyli avatar tucano avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bpipe's Issues

Bpipe should support OAuth for Accessing Google Services

From [email protected] on 2012-07-17T09:32:06Z

Bpipe's ability for sending email and instant messages via Google Talk and Gmail currently requires the user to enter their Google password into a bpipe.config file. This is insecure for many users. Since Google offers OAuth for these services, Bpipe should support that for users who are unable to put their passwords in plain text.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=46

Support for Child Directories for Outputs of Pipeline Stages

From [email protected] on 2012-07-12T10:04:15Z

Currently Bpipe relies on all the outputs that are generated by your pipeline being produced appearing in the same directory as the pipeline script (it's not compulsory, but some Bpipe features don't work or you can get odd results if you put output files elsewhere).

This really starts to break down when you have a huge number of files in a project or job, especially intermediate files that are just computational byproducts.

Bpipe should fully support tasks creating outputs in child directories.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=43

Deadlock and Corrupted Command Log when running huge number of Parallel Stages (>100)

From [email protected] on 2012-02-08T12:49:12Z

What steps will reproduce the problem? 1. Create pipeline with 200 input files feeding into a single parallel stage
2. Execute Pipeline What is the expected output? What do you see instead? All 200 files should be processed and script should complete.

Actual behavior - sometimes Bpipe hangs, and the command log contains statements overwriting each other and even misses some commands.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=12

bpipe log executed as different user to the one who started the pipeline incorrectly reports that it is finished

From [email protected] on 2012-07-17T09:37:52Z

What steps will reproduce the problem? 1. Start a long running pipeline
2. Log in as different user
3. Go to pipeline directory and execute "bpipe log" What is the expected output? What do you see instead? Expect to see ongoing log. Instead, Bpipe says it finished.

Guesswork: Bpipe is querying the status of the process id from the OS and is confusing a permission denied error with the process not existing.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=47

Merged job logging of parallel commands.

From [email protected] on 2012-07-10T07:53:45Z

What steps will reproduce the problem? 1. Run a pipeline with parallel commands.
2. Look at the screen log output. 3. What is the expected output? What do you see instead? I expect to see the output of each command in a single file somewhere in the .bpipe directory. Instead, I see only one log file for the entire run, with output of all of the commands. This is fine for sequential pipelines, but for pipelines that have jobs running in parallel, this is a problem. For example, a pipeline that runs Cufflinks as well as OverdispersedRegionScanSeqs (USeq package) in parallel, I see this type of thing when the outputs intersect with each other:

[16:18:02] Modeling fragment count overdispersion.
[16:18:02] Calculating initial abundance estimates for bias correction.<----- [From Cufflinks]

Processed 16024 loci. [*************************] 100% chr17random chr14random chrY chrX<----- [From OverdispersedRegionScanSeqs]
[16:38:19] Learning bias parameters.<----- [From Cufflinks]
chr16 chr19 chr5_random chr18<----- [From OverdispersedRegionScanSeqs]

There are many situations where this is problematic. For example, if a particular application gives similar output to another application running in parallel, it is impossible to distinguish them. Or, if multiple commands of the same application are running, it looks something like this:
Calculating read count stats...
444621 Queried Treatment Observations
Sample_43_CNT_24hr.bam
313186 Queried Control Observations
Sample_58_Hep_cnt_72hr.bam

Calculating negative binomial p-values and FDRs in R using DESeq ( http://www-huber.embl.de/users/anders/DESeq/).. .
chr19 chr5_random chr18

Calculating read count stats...
396066 Queried Treatment Observations
Sample_43_CNT_24hr.bam
279005 Queried Control Observations
Sample_58_Hep_cnt_72hr.bam

Calculating negative binomial p-values and FDRs in R using DESeq ( http://www-huber.embl.de/users/anders/DESeq/).. .
chr5_random chr18 chrM

Calculating read count stats...
445143 Queried Treatment Observations
Sample_43_CNT_24hr.bam
308133 Queried Control Observations
Sample_60_3T3_cnt_72hr.bam

There is no way to tell which initial command produced which particular output stats (except that in this case, the application was kind enough to provide us with the input file names, but not all applications will do this). Likewise, If multiple commands are running in parallel and one reports an error, it is difficult to know which command caused the error.

Ideally, the output could be saved within separate files for each command that is issued, then errors, etc. can be associated with originating commands. What version of the product are you using? On what operating system? bpipe-0.9.5, Linux Please provide any additional information below. Maybe one way to do this would be to have a "command id" in the commandlog.txt file, after each command. Then within .bpipe directory, you could have a directory containing files named .something containing 1) the present working directory, 2) the command as well as 3) the stderr and stdout for the command. And furthermore you could use this commandid as a job name for submissions to the cluster, and then when we get an email notification of an error (e.g., non-zero exit status), we can go directly to this command output file to see the full screen output/error message.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=42

Simple stage execution make Bpipe crash when launched twice in row

From paolo.ditommaso on 2012-07-13T23:03:15Z

What steps will reproduce the problem? 1. use the simple stage attached
2. execute it with: "bpipe testcase.bpipe"
3. without deleting the produced file 'foo.txt', execute a second time: "bpipe testcase.bpipe"

Bpipe crashes with the following stack trace:

$ bpipe testcase.bpipe
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.codehaus.groovy.tools.GroovyStarter.rootLoader(GroovyStarter.java:108)
at org.codehaus.groovy.tools.GroovyStarter.main(GroovyStarter.java:130)
Caused by: java.lang.NullPointerException: Cannot get property 'class' on null object
at org.codehaus.groovy.runtime.NullObject.getProperty(NullObject.java:56)
at org.codehaus.groovy.runtime.InvokerHelper.getProperty(InvokerHelper.java:156)
at org.codehaus.groovy.runtime.callsite.NullCallSite.getProperty(NullCallSite.java:44)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callGetProperty(AbstractCallSite.java:227)
at bpipe.Utils$_isNewer_closure3.doCall(Utils.groovy:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:272)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:877)
at groovy.lang.Closure.call(Closure.java:412)
at groovy.lang.Closure.call(Closure.java:425)
at org.codehaus.groovy.runtime.DefaultGroovyMethods.every(DefaultGroovyMethods.java:1502)
at org.codehaus.groovy.runtime.dgm$215.doMethodInvoke(Unknown Source)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1047)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:877)
at org.codehaus.groovy.runtime.callsite.PojoMetaClassSite.call(PojoMetaClassSite.java:44)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at bpipe.Utils.isNewer(Utils.groovy:70)
at bpipe.Utils$isNewer.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:120)
at bpipe.PipelineContext.produce(PipelineContext.groovy:591)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1047)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:877)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:697)
at bpipe.PipelineContext.invokeMethod(PipelineContext.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1047)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:877)
at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:39)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:120)
at bpipe.PipelineDelegate.methodMissing(PipelineDelegate.groovy:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaClassImpl.invokeMissingMethod(MetaClassImpl.java:804)
at groovy.lang.MetaClassImpl.invokePropertyOrMissing(MetaClassImpl.java:1096)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1049)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:877)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:697)
at bpipe.PipelineDelegate.invokeMethod(PipelineDelegate.groovy)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeOnDelegationObjects(ClosureMetaClass.java:423)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:346)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:877)
at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.callCurrent(PogoMetaClassSite.java:66)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:46)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:133)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:145)
at script134217889892558917682$_run_closure1.doCall(script134217889892558917682.groovy:2)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:233)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:272)
at groovy.lang.MetaClassImpl.invokePropertyOrMissing(MetaClassImpl.java:1086)
at groovy.lang.MetaClassImpl.invokeMethod(...

Attachment: testcase.bpipe

Original issue: http://code.google.com/p/bpipe/issues/detail?id=44

Returning an Object from a Pipeline Stage Should not be Treated as an Output

From [email protected] on 2012-05-06T12:52:43Z

This behavior produces surprising results and forces confusing work arounds.

For example, this pipeline stage:

hello = {
exec "cp $input $output"
x = "foo"
}

Produces an error:

Pipeline failed!

Expected output file ./foo could not be found

This happens because the expression 'x = "foo"' evaluates to a String object, which, being the last expression in the pipeline stage is treated by Groovy as a return value. Bpipe then sees it and considers it an output.

The "return value as output" is in fact a relic from when Bpipe had no "produce", "transform" or "filter" constructs and is now really needed only by internal parts of Bpipe. Removing it will make Bpipe clearer.

Note: there will still be occasions when the user needs to specify that a different output from the default one assumed by Bpipe should be forwarded to the next stage. This should be a separate construct, called "forward".

Original issue: http://code.google.com/p/bpipe/issues/detail?id=22

Errors printed out when Bpipe run in Directory for First Time

From [email protected] on 2012-04-13T11:13:35Z

What steps will reproduce the problem? 1. make a new directory
2. create a dummy bpipe script (empty)
3. run Bpipe using bpipe run dummy.groovy What is the expected output? What do you see instead? Script should run normally

Instead, see errors before the normal output:

basename: missing operand
Try `basename --help' for more information.

These errors don't appear after the first run.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=20

Support for an easy way to parallelize based on region

From [email protected] on 2012-04-04T11:16:49Z

Currently Bpipe lets you easily parallelize if you split processing by sample or by splitting your input files up into pieces. However many tools can operate on regions independently without splitting files up. This is kind of tricky to do in Bpipe right now. It would be nice to have some support for a direct syntax to say "run this pipeline with every chromosome in parallel".

Syntax:

chr(1..22) * [ call_variants ]

Will create a $chr variable that the call_variants stage can use.

More detailed regions could be created using an organism specific database:

hg19.split(60) * [ calculate_coverage_depth ]

This latter would figure out how to split the human genome into 60 roughly even parts for you and pass variables $chr, $start, $end to the calculate_coverage_depth pipeline stage, making it really easy to parallelize data processing.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=18

Job id missing from commandlog.txt

From [email protected] on 2012-07-10T07:24:47Z

What steps will reproduce the problem? 1. Run the "helloworld" pipeline from the tutorial in the documentation.
2. Look in .bpipe/logs directory
3. Look in commandlog.txt file. What is the expected output? What do you see instead? In the documentation ( https://code.google.com/p/bpipe/wiki/Logging ), it states "If you wish to see the output log for an older job you need to find its Job Id, which you can see in the command log. Then you can find the log archived in directory called '.bpipe/logs/[Job Id].log'."

An example of a commandlog.txt file is shown at that website, with these lines:

Starting pipeline at Wed Feb 01 15:22:01 EST 2012 as Job 4225

Input files: test.txt

However, my commandlog.txt file looks like this:

Starting pipeline at Mon Jul 09 17:12:01 EDT 2012

Input files: []

Stage hello

echo Hello

Stage world

echo World

There is no "as Job " in the line beginning "# Starting pipeline ..." What version of the product are you using? On what operating system? bpipe-0.9.5.3 Please provide any additional information below. Also, there is no single job id associated with this job; apparently, there are two job ids. There are two files in .bpipe/logs associated with the single job that I ran:
ls .bpipe/logs/
20038.log 20045.bpipe.log

I expected to see two files with the same id, e.g.,
20038.log, 20038.bpipe.log

Attachment: 20045.bpipe.log 20038.log commandlog.txt

Original issue: http://code.google.com/p/bpipe/issues/detail?id=41

Bpipe configuration defaults

From [email protected] on 2012-05-20T16:37:04Z

At the moment you can put a bpipe.config file in the local pipeline directory to control certain behaviors of bpipe. A lot of times however these are shared across pipelines so it would be helpful to have Bpipe load defaults from a common location (eg: home directory) and then merge those with a bpipe.config found in the local directory.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=26

Error in Parallelizing same task with different inputs

From [email protected] on 2012-05-11T06:39:33Z

I am trying Bpipe for one of our in-house software. The program runs fine when I run a single version of the "exec" command but when I try and run the same command by changing one input parameter and parallelizing the two shell commands, I get an error :

org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
script_from_command_line: 4: expecting anything but ''\n''; got it anyway @ line 4, column 192.

1 error

Any help would be appreciated.

Thanks
Jess

Original issue: http://code.google.com/p/bpipe/issues/detail?id=24

RealPipelineTutorial should include Bpipe.run line in each example

From [email protected] on 2012-02-07T17:36:40Z

I naively copied the last version of the example pipeline from https://code.google.com/p/bpipe/wiki/RealPipelineTutorial It does not include the Bpipe.run line (which is mentioned earlier on the same page).

Forgetting to put this in the script means it will not do anything.

So at the expense of extra redundancy it might be worth repeating the run line in each iteration of the running example.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=7

Transform and Filter should accept multiple arguments

From [email protected] on 2012-05-06T12:55:32Z

Currently you can specify that a section of a pipeline stage transforms an input like so:

foo = {
transform("csv") {
....
}
}

However if the code performs multiple transformations then you can't easily specify them both together. Bpipe should support syntax such as:

foo = {
transform("csv","xml") {
...
}
}

The same would go for filtering operations.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=23

Pipeline file can't have same name as pipeline stage

From [email protected] on 2012-02-07T17:42:09Z

I had a simple pipeline with a stage called "hello" and I saved the whole thing in a file called hello.pipeline.

This gave me an error about assigning to the variable called hello:

org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
/vlsci/VLSCI/bjpop/code/bpipe-0.2/test/hello.pipeline: 1: you tried to assign a value to the class 'hello'. Do you have a script with this name?
@ line 1, column 1.
hello = {
^

1 error

Original issue: http://code.google.com/p/bpipe/issues/detail?id=9

'bpipe log' shows output from runs even when Bpipe didn't successfully start

From [email protected] on 2012-02-08T10:46:47Z

What steps will reproduce the problem? 1. execute 'bpipe run some_pipeline.groovy'
2. execute 'bpipe log'
3. after run finishes, execute 'bpipe' (no arguments)
4. now execute 'bpipe log' What is the expected output? What do you see instead? Expect to see the output from the last run pipeline.

Instead see help output.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=11

Transitive Dependencies / Dealing with Large Intermediate Files

From [email protected] on 2012-01-30T20:55:59Z

Bpipe needs to have a way of allowing intermediate results to be deleted or dispensed with that won't trigger recomputing of them if the pipeline is re-executed.

Suppose you have 3 files in a dependency chain where A is used to compute B and B is used to compute C.

Bpipe knows that A produces B and that B produces C. However it does not know that at a higher level, A produces C. Thus if one deletes the file B it will recompute B and then C even if it had C newer than A to start with.

This problem manifests when dealing, for example, with SAM files which are much larger than, but equivalent to a BAM file. After you have the BAM file you really don't need the SAM file, but with Bpipe you have to keep it anyway, wasting disk space, because otherwise Bpipe will try to recompute it.

Workarounds:

  1. don't allow such an intermediate files to be recognised as an output (that means, don't put them in produce, transform, filter, statements, etc.). The downside of this is that Bpipe is less likely to clean them up if the pipeline fails.
  2. After making the final file, remove the intermediate file (eg. the SAM file) and create a 'dummy' intermediate file to trick Bpipe. Then touch the final file to make it "newer" than the intermediate file. This is a hack.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=2

Recompute from Arbitrary Stage

From [email protected] on 2012-01-30T21:04:59Z

At the moment if you want Bpipe to recompute all results from a particular stage forward when Bpipe thinks they are up to date you need to physically delete or touch a file that will make Bpipe think a dependency is out of date. This makes it hard to reliably ensure results are properly recomputed when you change something that Bpipe is unaware of that affects your pipeline.

This could be implemented in the form of a 'clean' command that allowed results to be purged from a specified stage forward, or it could be simply be an explicit

bpipe rerun <input files ...>

Original issue: http://code.google.com/p/bpipe/issues/detail?id=3

Support for Report of Run in HTML Form

From [email protected] on 2012-05-20T16:33:55Z

It would be very useful in multiple scenarios to be able to get a report out of Bpipe that is readable by normal humans (eg: that you can email to somebody, etc.).

Ideally this will be HTML form and will show all the stages that ran, whether they succeeded or failed, the timings, the inputs and outputs, and will also allow the user to add other documentation.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=25

Nohup Issue on Slackware < 13.0

From [email protected] on 2012-01-27T09:53:46Z

Information from user:


Bpipe depends on nohup. But the "nohup" command runs on the different
slackware versions (slackware 12.1) vs other Linux distros differently. The
"nohup" command is used to execute each section of the pipe script in the
background. If you edit the bpipe script in the bin/ directory and search
for "nohup" you'll see how it is used for bpipe.

The "nohup" command insists on having everything directed to a file (either
specified by "> filename" or by default it will go to a file "nohup.out") so
nothing gets sent to the terminal. Funny thing is this behavior is only
limited to slackware versions < 13.0.

If you edit the bpipe script, and add the following lines at line 184 (just
before the nohup line) and comment out the nohup line:

j=$$
touch .bpipe/logs/$j.log
java -classpath "$CP" -Dbpipe.pid=$j bpipe.Runner $TESTMODE $* 2>&1 &
disown

then you'll get the output to the screen and it will still run in the
background if you logout. I'm sure if we spend more time researching "nohup"
we'll find away to get it to display to stdout, but I'm not too well to do
that.

That being said, since you may be running complex, long jobs that won't
finish after a few seconds, you can just use the original bpipe script with
the "nohup" command as it is. The output will be appended in
bin/.bpipe/logs/$$.log which you can tail to see the progress output of
your pipe script.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=1

Bpipe log command hangs requiring Ctrl-C when Job is Finished

From [email protected] on 2012-01-31T09:28:22Z

After Bpipe runs a pipeline one can see the running log of outputs using the Bpipe log command, executed using 'tail -f'.

However Bpipe still uses 'tail -f' even when a pipeline is finished, creating a confusing situation for the user who may still wait for the command to finish.

It would be better to execute tail without '-f' when the pipeline has finished running.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=4

Not Work for Symbolic link to bpipe

From yanlinlin82 on 2012-06-08T01:03:36Z

What steps will reproduce the problem? 1. PATH is set to a symbolic link to the unzipped bpipe.
2. 'bpipe run some.pipe' do not work. What is the expected output? What do you see instead? It should work for symbolic link What version of the product are you using? On what operating system? I was using bpipe-0.9.5 on Gentoo-Linux-3.2.12 system. Please provide any additional information below. Changing bpipe file's line 287 as:
BPIPE_HOME=$(dirname $(realpath $0))/..
should fix the problem.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=34

Execute multiline commands

From [email protected] on 2012-07-25T17:44:42Z

Hi Simon,

I have started using Bpipe.

When I initially attempted a multiline command with exec, e.g.
exec "some
command"
I got the error:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
script_from_command_line: 10: expecting anything but ''\n''; got it anyway @ line 10, column 90.

The problem was easily fixable by rewriting the command with triple double-quotes:
exec """some
command"""

I believe that the documentation only mentions triple double-quotes in the context of commands that contain quotes, so I am not sure whether this is a bug or not.

Cheers,

Florent

Original issue: http://code.google.com/p/bpipe/issues/detail?id=50

RealPipelineTutorial should make use of variables to define reference.fa and MarkDuplicates.jar

From [email protected] on 2012-02-07T17:39:04Z

https://code.google.com/p/bpipe/wiki/RealPipelineTutorial refers to reference.fa multiple times and uses an absolute path for MarkDuplicates.jar.

In both of these cases it might be valuable to assign them to variables at the top of the script.

It would create a single point of control and show the user how to use variables in their script.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=8

Output from commands like 'bpipe test' or printing help should not become the most recent log

From [email protected] on 2012-02-26T17:55:50Z

Say you have a long running pipeline, and you look at the output with

bpipe log

Then if you execute

bpipe

The help is printed out. But then after that

bpipe log

Shows the help as the log output from the previous command. This is very annoying because you can't see the actual log file from your long running command.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=15

More generalized form of Parallelization than Splitting by Chromosome

From [email protected] on 2012-04-20T23:49:11Z

Currently Bpipe supports splitting up work in a pipeline into parallel execution threads separated by chromosome. However actually there are numerous other ways to split up work, so this would make sense as a generalised feature. For example:

@-Transform("sam")
align_stampy = {
exec """
python $STAMPY_HOME/stampy.py
--bwaoptions="-q10 $REFERENCE"
-g $STAMPY_GENOME_INDEX
-h $STAMPY_HASH_FILE
-M $input1,$input2
-o $output
--readgroup=ID:$rg_id,LB:$rg_lb,PL:$rg_pl,PU:$rg_pu,SM:$rg_sm
--processpart=$part
"""
}

Bpipe.run {
part("1/3", "2/3", "3/3") * [align_stampy]
}

Original issue: http://code.google.com/p/bpipe/issues/detail?id=21

Pause Command

From [email protected] on 2012-07-20T11:27:16Z

Currently you can use 'bpipe stop' to stop a running pipeline, which interrupts all jobs ie. kills them, and cleans up their outputs.

Sometimes I want to "nicely" stop a pipeline without abandoning the tasks in process. So it should let all running commands continue to completion, but not launch anything new, and when the last command finishes, exit.

This would allow me to adjust pipelines, interleave a different job I forgot on the same computer, etc. without losing lots of work every time I stop a Bpipe job.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=49

Ability to Pass Command Line Options to Script

From [email protected] on 2012-07-05T20:58:51Z

Bpipe now supports parameters inside a pipeline, eg:

foo = {
exec """echo "$mesage" """
}
run { foo.using(message: "hello world") }

However this forces you to hard code the parameter in your pipeline. It would be nice to be able to pass that from the command line when you run your Bpipe script.

message = "hello there"
foo = {
exec """echo "$mesage" """
}
run { foo }

Now "hello there" is a default, but (ideally) we could run Bpipe like this:

bpipe run -p foo="hello world" pipeline.groovy

And this would override the value of the foo variable.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=38

Support for Detecting when Commands have Changed in Rerunning Pipeline

From [email protected] on 2012-02-08T12:55:25Z

Currently Bpipe just uses timestamps to detect whether it needs to re-run a command in a pipeline stage.

If, however, the command itself changed then the outputs will be invalid, even though they are newer than the input files. Therefore Bpipe should support a way to detect if the command changed since the one that created the files.

Original issue: http://code.google.com/p/bpipe/issues/detail?id=14

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.