Giter Site home page Giter Site logo

pipeline_roesti's Issues

issue with Ruffus 2.8.1 and gevent library not present (ant-login)

Important Note: I had to downgrade from ruffus 2.8.1 to 2.6.3 because of bad handling of the gevent library. Running the pipeline on the cluster environment, I kept getting the following error:

Original exception:

    Exception #1
      'builtins.NameError(name 'gevent' is not defined)' raised in ...
       Task = def filter_alignments(...):
       Job  = [Task04_convert_sam_to_bam/WT_R2_16424_ACTTGA_sorted.bam -> [Task05_filter_alignments/WT_R2_16424_ACTTGA.filtered.bed, Task05_filter_alignments/WT_R2_16424_ACTTGA.filtered.bed.nreads], WT_R2_16424_ACTTGA, Task04_convert_sam_to_bam, Task05_filter_alignments, filter_alignments, <LoggingProxy>, <unlocked _thread.lock>]
    
    Traceback (most recent call last):
      File "/nfs/users/lserrano/mweber/Software/python/envs/.virtualenvs/python351/lib/python3.5/site-packages/ruffus/task.py", line 712, in run_pooled_job_without_exceptions
        register_cleanup, touch_files_only)
      File "/nfs/users/lserrano/mweber/Software/python/envs/.virtualenvs/python351/lib/python3.5/site-packages/ruffus/task.py", line 544, in job_wrapper_io_files
        ret_val = user_defined_work_func(*params)
      File "/users/lserrano/mweber/bin/roesti/pipeline_roesti", line 1101, in filter_alignments
        job_other_options=job_other_options, retain_job_scripts=False)
      File "/nfs/users/lserrano/mweber/Software/python/envs/.virtualenvs/python351/lib/python3.5/site-packages/ruffus/drmaa_wrapper.py", line 581, in run_job
        verbose, resubmit, pipeline)
      File "/nfs/users/lserrano/mweber/Software/python/envs/.virtualenvs/python351/lib/python3.5/site-packages/ruffus/drmaa_wrapper.py", line 350, in run_job_using_drmaa
        cmd_str, drmaa_session, job_template, logger, pipeline)
      File "/nfs/users/lserrano/mweber/Software/python/envs/.virtualenvs/python351/lib/python3.5/site-packages/ruffus/drmaa_wrapper.py", line 239, in submit_drmaa_job
        gevent.sleep(GEVENT_TIMEOUT_STARTUP)
    NameError: name 'gevent' is not defined

I don't know why but ruffus 2.8.1 seems to have implemented the optional gevent package. In the drmaa_wrapper.py file, the presence of the library is first test,

try:
    import gevent
    HAVE_GEVENT = True
except ImportError:
    HAVE_GEVENT = False

but afterwards a simple try statement is not catching correclty the NameError exception, in function submit_drmaa_job:

try:
        gevent.sleep(GEVENT_TIMEOUT_STARTUP)
except JobSignalledBreak:
        # ...

mpn_CDS.bed file permissions

When running with default options, the already computed CDS.bed file is used. However, we need to sort the file in order to ensure the correct running of some part of the script. Sorting of the file is done in my mpn_annotation directory. Thus, we have permissions conflicts when user A is running the pipeline and sorting the file in my directory.
Solutions:

  1. always use the genbank file as read only, and each pipeline will create its own annotation files in the output directory.
  2. put the common MPN annotation file in a common directory for all the lab. Ideally in the annotation file folder of the webserver DBSpipes on the cluster.

migrate to conda environment to better manage dependencies

Right now all the dependencies are loaded from a bash script, load_dependencies.sh, loading specific libraries and binaries on the cluster installed in my cluster homefolder, then loading the python virtual env and loading path for custom python packages. It would be much easier to manage if everything would be in a self-contained environment that could be loaded from anywhere.

Conda can manage the binaries and include them in an environment.
available from bioconda channel:

  • samtools v1.9
  • bedtools v2.27.1
  • skewer v0.2.2
  • SeqPurge is included in the one package that is a tools suite: ngs-bits v2018_11

Conda has also a nice way to manage environment variables associated to a conda environment. These variables are set/unset when activating/deactivating the conda env.

Also, see issue #4 about ruffus version.

Dependencies error on simba

I had a problem with ruffus older version 2.6.3 on simba,

>>> import ruffus
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mweber/.local/anaconda3/envs/roesti/lib/python3.7/site-packages/ruffus/__init__.py", line 28, in <module>
    from .task import Pipeline, Task
  File "/home/mweber/.local/anaconda3/envs/roesti/lib/python3.7/site-packages/ruffus/task.py", line 715, in <module>
    verbose=0)
TypeError: namedtuple() got an unexpected keyword argument 'verbose'

By upgrading to version 2.8.1 the problem is solved.

However, new problem on simba with DRMAA library. Ok I just have to avoid loading the load_dependencies.sh script which changes the path of the DRMAA library. So we need to set the option --host simba or anything else, in order to remove the loading of the script at the beginning of each job.

conda environment has still dependencies for external package mwTools

the script index_genome_files_bowtie2.py imports functions from the mwTools package, which is not included in the conda environment. External user could not run correctly the pipeline because of this missing dependencies.
Two solutions

  1. For the moment, include functions code directly in the index_genome_files_bowtie2.py script.
  2. Later, refactor the mwTools to a full python package and install it in the conda environment.

Delete intermediate files by default

We should implement an additional task that deletes the intermediate files, only if the pipeline finished correctly. This will reduce a lot the disk usage on isilon. This should be a command-line option enabled by default.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.