The bioruby-pipengine from fstrozzi

Step dependencies (other step files)

Each step could require something to be executed.
Generally step dependencies are files generated from previous steps.
You could specify required files using a "require" step feature AND/OR pipengine could directly infer required files parsing the command line.

example of a step from pipengine.yml file:

blasting:
  require: <previousstep/sample>.query.fasta
  run: <blastn> -query <previousstep/sample>.query.fasta -task megablast -db <referenceblastdb> -out blastoutput.txt -outfmt 6  -num_threads <cpu>
  cpu: 8

Add a --batch option to avoid opening an SSH connection to the PBS Server or Masternode for every job launched

DRMAA support to use different queue system

NAP: Not a priority

Example data for testing?

is incomplete.

Yannick@n56-215 ~/g/b/t/data> pwd
/Users/Yannick/gitStuff/bioruby-pipengine/test/data
Yannick@n56-215 ~/g/b/t/data> tree -hlCF 
.
├── [ 340]  mapping.yml
├── [1.2K]  pipeline-enh.yml
├── [1.5K]  pipeline.yml
├── [ 177]  samples.yml
└── [  68]  test/

1 directory, 4 files
Yannick@n56-215 ~/g/b/t/data> pipengine run -p pipeline.yml -f samples.yml -s mapping 
I, [2017-08-08T16:07:38.059687 #52908]  INFO -- : Directory ./test/sampleA/trim not found
E, [2017-08-08T16:07:38.059818 #52908] ERROR -- : Found an unsubstituted tag <flowcell>\tLB:sampleA\tPL:ILLUMINA\tPU:<flowcell> . Terminating the execution

perhaps this could be completed, or an alternate could be provided (that does depend on samtools/bwa installation)

Using Environment modules with pipengine

Is there a way to integrate the use of environment modules and pipegine?
I guess we could retain the 'resource' keyword, but maybe with an additional parameter providing the name of the module to be loaded.
e.g.

resources:
  macs: macs2, macs-2.0.1

where 'macs2' is the name of the executable made available by the module calling system, while macs-2.0.1 is the name of the module itself.

Print the job ID after submit

re-add 'pipengine jobs' support

Dear all,

Could you add the pipengine jobs support even in last pipengine release?

Many thanks,

Paolo

Implement subcommands for pipengine command line to better handle pbs stats

Check subcommands at http://trollop.rubyforge.org/

Create a config file where scheduler type and pipelines folder are specified

Refactor the code to shift to a more OO style

Master script for pipeline exec

It would be great if we could define one 'master' step inside pipeline.yml that aggregates and executes other steps.
Maybe with a syntax like (lets say we have already defined three steps, namely 'trimming', 'mapping' and 'quantification')

  master_step:
    desc: Step that aggregates other sub-steps
    run:
      > trimming
      > mapping
      > quantification

Change Job Name to use a combination of Sample Name and Step Name

Add the support for organizing samples into groups

Add logging feature to save information such as the pipeline used, the steps etc.

Idiotproofing

Running pipengine with nothing -> would be great if it pointed me to -h
(and maybe indicated version)

Adding a 'description' field for the steps in the pipeline yaml

This can help in getting back some descriptions of the pipeline steps that could be displayed using a --doc parameter on the command line.

Add placeholder to specify only the path to a previous step folder

It would be useful to have a placeholder like <mapping> that will translate into <output>/<sample>/mapping at runtime or just "mapping" if multiple steps are run together.

single step 'local resources' directly from pipengine command line

We have the 'global' pipeline resources, what about single step 'local' resources?
Each step could have 'local variables' placeholders for some software parameters (like it is already implemented now for 'cpu' and 'mem' tags) and those variables could be directly overwritten wen launching pipengine. It could be very useful if you need to explore a range of values effect for a single parameter into the command line.

example of a step into a pipeline.yml file:

blasting:
  run: <blastn> -query <previousstep/sample>.query.fasta -task megablast -db <referenceblastdb> -out blastoutput.txt -outfmt 6 -max_target_seqs <hitsnumber> -num_threads <cpu>
  cpu: 8
  hitsnumber: 10

nice example alternatives for pipengine command line:

pipengine run -p pipeline.yml -s blasting -hitsnumber 5 
pipengine run -p pipeline.yml -s blasting -parameters [hitsnumber 5 , cpu 2]

Add check for .torque_rm file

Also provide the user with an interactive input to fill the required information for TORQUE RM

Process groups altogether

It would great to be able to specify a -G flag in order to process all the sample groups reported in samples.yml (this would generate the tree structure /Group_name/Sample_name for every sample)

Change the multi-step behaviour

When launching multiple steps into one single job, each step should write into its own folder instead of a common folder for all the steps

Specific dependencies?

Hi,
gem install worked seamlessly for me.
It is probably worth mentioning which ruby version(s) is/are required or compatible with this.

Add option to enable PBS sending email to the user

Generate sample configuration by project or run directory

Generate a single samples.yml for all the projects under the same run folder

add the possibility to specify a temporary output directory for each step

Groups processing (Cuffdiff, Cuffnorm)

It would be useful to parse sample groups information directly from samples.yml instead of defining them by hand using -m, so that
pipengine -p pipeline.yml -m SampleA,SampleB SampleC,SampleB
could be turned into:
pipengine -p pipeline.yml -m config

check if all resources has been provided

All pipelines has "pipeline resources" described into pipeline.yml file,
but they expect to find some "samples resources" described into samples.yml file too.
It will be nice if pipengine could check if all resources has been provided and all placeholder has been replaced.

am trying to imagine this
pipengine --no-torque run -c /data/illumina

pipenengine --no-torque run -p pipeline.yml -f samples.yml -s mapping --tmp /tmp

Add support for Qdel through TORQUE RM

Add command line options to be passed to the PBS/Torque scheduler

Add an option like --pbs to accept PBS/Torque parameters that will be written into the sh script header.

fstrozzi / bioruby-pipengine Goto Github PK

bioruby-pipengine's People

Contributors

Stargazers

Watchers

Forkers

bioruby-pipengine's Issues

Recommend Projects

Recommend Topics

Recommend Org