Giter Site home page Giter Site logo

bioruby-pipengine's People

Contributors

arfon avatar fstrozzi avatar rjpbonnal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

bioruby-pipengine's Issues

Step dependencies (other step files)

Each step could require something to be executed.
Generally step dependencies are files generated from previous steps.
You could specify required files using a "require" step feature AND/OR pipengine could directly infer required files parsing the command line.

example of a step from pipengine.yml file:

blasting:
  require: <previousstep/sample>.query.fasta
  run: <blastn> -query <previousstep/sample>.query.fasta -task megablast -db <referenceblastdb> -out blastoutput.txt -outfmt 6  -num_threads <cpu>
  cpu: 8

Example data for testing?

is incomplete.

Yannick@n56-215 ~/g/b/t/data> pwd
/Users/Yannick/gitStuff/bioruby-pipengine/test/data
Yannick@n56-215 ~/g/b/t/data> tree -hlCF 
.
├── [ 340]  mapping.yml
├── [1.2K]  pipeline-enh.yml
├── [1.5K]  pipeline.yml
├── [ 177]  samples.yml
└── [  68]  test/

1 directory, 4 files
Yannick@n56-215 ~/g/b/t/data> pipengine run -p pipeline.yml -f samples.yml -s mapping 
I, [2017-08-08T16:07:38.059687 #52908]  INFO -- : Directory ./test/sampleA/trim not found
E, [2017-08-08T16:07:38.059818 #52908] ERROR -- : Found an unsubstituted tag <flowcell>\tLB:sampleA\tPL:ILLUMINA\tPU:<flowcell> . Terminating the execution
  • perhaps this could be completed, or an alternate could be provided (that does depend on samtools/bwa installation)

Using Environment modules with pipengine

Is there a way to integrate the use of environment modules and pipegine?
I guess we could retain the 'resource' keyword, but maybe with an additional parameter providing the name of the module to be loaded.
e.g.

resources:
  macs: macs2, macs-2.0.1

where 'macs2' is the name of the executable made available by the module calling system, while macs-2.0.1 is the name of the module itself.

Master script for pipeline exec

It would be great if we could define one 'master' step inside pipeline.yml that aggregates and executes other steps.
Maybe with a syntax like (lets say we have already defined three steps, namely 'trimming', 'mapping' and 'quantification')

  master_step:
    desc: Step that aggregates other sub-steps
    run:
      > trimming
      > mapping
      > quantification

Idiotproofing

Running pipengine with nothing -> would be great if it pointed me to -h
(and maybe indicated version)

single step 'local resources' directly from pipengine command line

We have the 'global' pipeline resources, what about single step 'local' resources?
Each step could have 'local variables' placeholders for some software parameters (like it is already implemented now for 'cpu' and 'mem' tags) and those variables could be directly overwritten wen launching pipengine. It could be very useful if you need to explore a range of values effect for a single parameter into the command line.

example of a step into a pipeline.yml file:

blasting:
  run: <blastn> -query <previousstep/sample>.query.fasta -task megablast -db <referenceblastdb> -out blastoutput.txt -outfmt 6 -max_target_seqs <hitsnumber> -num_threads <cpu>
  cpu: 8
  hitsnumber: 10

nice example alternatives for pipengine command line:

pipengine run -p pipeline.yml -s blasting -hitsnumber 5 
pipengine run -p pipeline.yml -s blasting -parameters [hitsnumber 5 , cpu 2] 

Process groups altogether

It would great to be able to specify a -G flag in order to process all the sample groups reported in samples.yml (this would generate the tree structure /Group_name/Sample_name for every sample)

Change the multi-step behaviour

When launching multiple steps into one single job, each step should write into its own folder instead of a common folder for all the steps

Specific dependencies?

Hi,
gem install worked seamlessly for me.
It is probably worth mentioning which ruby version(s) is/are required or compatible with this.

Groups processing (Cuffdiff, Cuffnorm)

It would be useful to parse sample groups information directly from samples.yml instead of defining them by hand using -m, so that
pipengine -p pipeline.yml -m SampleA,SampleB SampleC,SampleB
could be turned into:
pipengine -p pipeline.yml -m config

check if all resources has been provided

All pipelines has "pipeline resources" described into pipeline.yml file,
but they expect to find some "samples resources" described into samples.yml file too.
It will be nice if pipengine could check if all resources has been provided and all placeholder has been replaced.

Generate scripts even if Torque/PBS is not specified

Is it possible to configure PipEngine to simply generate step definitions and bash scripts even if Torque/PBS is not installed? I find it such a convenient way of generating and documenting workflows.

So I am wondering if it may also be nice to be able to generate scripts to run on a machine without a scheduling system. At the moment or unless am missing something, one requires a scheduling tool beforehand.

am trying to imagine this
pipengine --no-torque run -c /data/illumina

pipenengine --no-torque run -p pipeline.yml -f samples.yml -s mapping --tmp /tmp

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.