crg-cnag / pipeline_roesti Goto Github PK
View Code? Open in Web Editor NEWPipeline to analyze bacterial RNA-seq data from RNA-seq and ribo-seq (ribosome profiling) experiments.
Pipeline to analyze bacterial RNA-seq data from RNA-seq and ribo-seq (ribosome profiling) experiments.
Right now all the dependencies are loaded from a bash script, load_dependencies.sh
, loading specific libraries and binaries on the cluster installed in my cluster homefolder, then loading the python virtual env and loading path for custom python packages. It would be much easier to manage if everything would be in a self-contained environment that could be loaded from anywhere.
Conda can manage the binaries and include them in an environment.
available from bioconda channel:
Conda has also a nice way to manage environment variables associated to a conda environment. These variables are set/unset when activating/deactivating the conda env.
Also, see issue #4 about ruffus version.
I had a problem with ruffus older version 2.6.3 on simba,
>>> import ruffus
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mweber/.local/anaconda3/envs/roesti/lib/python3.7/site-packages/ruffus/__init__.py", line 28, in <module>
from .task import Pipeline, Task
File "/home/mweber/.local/anaconda3/envs/roesti/lib/python3.7/site-packages/ruffus/task.py", line 715, in <module>
verbose=0)
TypeError: namedtuple() got an unexpected keyword argument 'verbose'
By upgrading to version 2.8.1 the problem is solved.
However, new problem on simba with DRMAA library. Ok I just have to avoid loading the load_dependencies.sh
script which changes the path of the DRMAA library. So we need to set the option --host simba
or anything else, in order to remove the loading of the script at the beginning of each job.
When running with default options, the already computed CDS.bed file is used. However, we need to sort the file in order to ensure the correct running of some part of the script. Sorting of the file is done in my mpn_annotation directory. Thus, we have permissions conflicts when user A is running the pipeline and sorting the file in my directory.
Solutions:
Important Note: I had to downgrade from ruffus 2.8.1 to 2.6.3 because of bad handling of the gevent
library. Running the pipeline on the cluster environment, I kept getting the following error:
Original exception:
Exception #1
'builtins.NameError(name 'gevent' is not defined)' raised in ...
Task = def filter_alignments(...):
Job = [Task04_convert_sam_to_bam/WT_R2_16424_ACTTGA_sorted.bam -> [Task05_filter_alignments/WT_R2_16424_ACTTGA.filtered.bed, Task05_filter_alignments/WT_R2_16424_ACTTGA.filtered.bed.nreads], WT_R2_16424_ACTTGA, Task04_convert_sam_to_bam, Task05_filter_alignments, filter_alignments, <LoggingProxy>, <unlocked _thread.lock>]
Traceback (most recent call last):
File "/nfs/users/lserrano/mweber/Software/python/envs/.virtualenvs/python351/lib/python3.5/site-packages/ruffus/task.py", line 712, in run_pooled_job_without_exceptions
register_cleanup, touch_files_only)
File "/nfs/users/lserrano/mweber/Software/python/envs/.virtualenvs/python351/lib/python3.5/site-packages/ruffus/task.py", line 544, in job_wrapper_io_files
ret_val = user_defined_work_func(*params)
File "/users/lserrano/mweber/bin/roesti/pipeline_roesti", line 1101, in filter_alignments
job_other_options=job_other_options, retain_job_scripts=False)
File "/nfs/users/lserrano/mweber/Software/python/envs/.virtualenvs/python351/lib/python3.5/site-packages/ruffus/drmaa_wrapper.py", line 581, in run_job
verbose, resubmit, pipeline)
File "/nfs/users/lserrano/mweber/Software/python/envs/.virtualenvs/python351/lib/python3.5/site-packages/ruffus/drmaa_wrapper.py", line 350, in run_job_using_drmaa
cmd_str, drmaa_session, job_template, logger, pipeline)
File "/nfs/users/lserrano/mweber/Software/python/envs/.virtualenvs/python351/lib/python3.5/site-packages/ruffus/drmaa_wrapper.py", line 239, in submit_drmaa_job
gevent.sleep(GEVENT_TIMEOUT_STARTUP)
NameError: name 'gevent' is not defined
I don't know why but ruffus 2.8.1 seems to have implemented the optional gevent
package. In the drmaa_wrapper.py
file, the presence of the library is first test,
try:
import gevent
HAVE_GEVENT = True
except ImportError:
HAVE_GEVENT = False
but afterwards a simple try
statement is not catching correclty the NameError
exception, in function submit_drmaa_job
:
try:
gevent.sleep(GEVENT_TIMEOUT_STARTUP)
except JobSignalledBreak:
# ...
We should implement an additional task that deletes the intermediate files, only if the pipeline finished correctly. This will reduce a lot the disk usage on isilon. This should be a command-line option enabled by default.
the script index_genome_files_bowtie2.py
imports functions from the mwTools package, which is not included in the conda environment. External user could not run correctly the pipeline because of this missing dependencies.
Two solutions
index_genome_files_bowtie2.py
script.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.