Giter Site home page Giter Site logo

urmi-21 / pyrpipe Goto Github PK

View Code? Open in Web Editor NEW
79.0 7.0 27.0 208.75 MB

Reproducible bioinformatics pipelines in python. Import any Unix tool/command in python.

License: MIT License

Shell 4.64% Python 95.36%
rna-seq rna-seq-pipeline ncbi-sra bioinformatics bioinformatics-pipeline bioinformatics-analysis python rna-seq-workflows conda bioconda

pyrpipe's Introduction

Build Status Coverage Status Documentation Status PyPI install with bioconda PyPI - License publication

pyrpipe: python rna-seq pipeliner

Introduction

pyrpipe (Pronounced as "pyre-pipe") is a python package to easily develop bioinformatic or any other computational pipelines in pure python. pyrpipe provides an easy-to-use framework for importing any UNIX command in python. pyrpipe comes with specialized classes and functions to easily code RNA-Seq processing workflows. Pipelines in pyrpipe can be created and extended by integrating third-party tools, executable scripts, or python libraries in an object oriented manner.

Read the paper here

Read the docs here

NOTE: Due to change in API designs, pyrpipe version 0.0.5 and above is not compatible with lower versions. All the tutorials and documentation have been updated to reflect v0.0.5.

What it does

Allows fast and easy development of bioinformatics pipelines in python by providing

  • a high level api to popular RNA-Seq processing tools -- downloading, trimming, alignment, quantificantion and assembly
  • optimizes program parameters based on the data
  • a general framework to execute any linux command from python
  • comprehensive logging features to log all the commands, output and their return status
  • report generating features for easy sharing, reproducing, benchmarking and debugging

Key Features (version 0.0.5)

  • Import any UNIX executable command/tool in python
  • Dry-run feature to check dependencies and commands before execution
  • Flexible and robust handling of options and arguments (both Linux and Java style options)
  • Auto load command options from .yaml files
  • Easily override threads and memory options using global values
  • Extensive logging for all the commands
  • Automatically verify Integrity of output targets
  • Resume feature to restart pipelines/jobs from where interrupted
  • Create reports, MultiQC reports for bioinformatic pipelines
  • Easily integrated into workflow managers like Snakemake and NextFlow (to schedule jobs, scale jobs, identify paralellel steps in pipelines)

What it CAN NOT do by itself

  • Schedule jobs
  • Scale jobs on HPC/cloud
  • Identify parallel steps in pipelines

Prerequisites

  • python 3.6 or higher
  • OS: Linux, Mac

API to RNA-Seq tools include:

Tool Purpose
SRA Tools (v. 2.10.9 ) SRA access
Trimgalore (v. 0.6.0) Trimming
BBDuk (v. 38.76) Trimming
Hisat2 (v. 2.2.1) Alignment
STAR (v. 2.7.7a) Alignment
Bowtie2 (v. 2.3.5.1) Alignment
Kallisto (v. 0.46.2) Quantification
Salmon (v. 0.14.1) Quantification
Stringtie (v. 2.1.4) Transcript Assembly
Cufflinks (v. 2.2.1) Transcript Assembly
Samtools (v. 1.9) Tools

Examples

Get started with the basic tutorial. Read the documentation here. Several examples are provided here

Download, trim and align RNA-Seq data

Following python code downloads data from SRA, uses Trim Galore to trim the fastq files and STAR to align reads. More detailed examples are provided here

from pyrpipe.sra import SRA
from pyrpipe.qc import Trimgalore
from pyrpipe.mapping import Star
trimgalore = Trimgalore(threads=8)
star = Star(index='data/index',threads=4)
for srr in ['SRR976159','SRR978411','SRR971778']:
    SRA(srr).trim(trimgalore).align(star)

Import a Unix command

This simple example imports and runs the Unix grep command. See this for more examples.

>>> from pyrpipe.runnable import Runnable
>>> grep=Runnable(command='grep')
>>> grep.run('query1','file1.txt',verbose=True)
>>> grep.run('query2','file2.txt',verbose=True)

Installation

Please follow these instructions:

To create a new Conda environment (recommended):

NOTE: You need to install the third-party tools to work with pyrpipe. We recomend installing these through bioconda where possible. An example of setting up the environment using conda is provided below. It is best to share your conda environment files with pyrpipe scripts to ensure reproducibility.

  1. Download and install Conda
  2. conda create -n pyrpipe python=3.8
  3. conda activate pyrpipe
  4. conda install -c bioconda pyrpipe star=2.7.7a sra-tools=2.10.9 stringtie=2.1.4 trim-galore=0.6.6

The above command will install pyrpipe and the required tools inside a conda environment. Alternatively, use the conda environment.yaml file provided in this repository and build the conda environment by running

conda env create -f pyrpipe_environment.yaml

Install latest stable version

Through conda

conda install -c bioconda pyrpipe 

Through PIP

pip install pyrpipe --upgrade

If above command fails due to dependency issues, try:

  1. Download the requirements.txt
  2. pip install -r requirements.txt
  3. pip install pyrpipe

To run tests:

  1. Download the test set (direct link)
  2. pip install pytest
  3. To build test_environment. Please READ THIS
  4. From pyrpipe root directory, run pytest tests/test_*

Install dev version

git clone https://github.com/urmi-21/pyrpipe.git
pip install -r pyrpipe/requirements.txt
pip install -e path_to/pyrpipe

#Running tests; From pyrpipe root perform
#To build test_environment (This will download tools): 
cd tests ; . ./build_test_env.sh
#in same terminal
py.test tests/test_*

Setting NCBI SRA-Tools

If you face problems with downloading data from SRA, try configuring the SRA-Tools. Use vdb-config -i to configure SRA Toolkit. Make sure that:

  • Under the TOOLS tab, prefetch downloads to is set to public user-repository
  • Under the CACHE tab, location of public user-repository is not empty

Use the following pyrpipe_diagnostic command to test if SRA-Tools are setup properly

pyrpipe_diagnostic test

Contributing

Please see CONTRIBUTING.md

Funding

This work is funded in part by the National Science Foundation award IOS 1546858, "Orphan Genes: An Untapped Genetic Reservoir of Novel Traits".

pyrpipe's People

Contributors

eve-syrkin-wurtele avatar lijing28101 avatar urmi-21 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyrpipe's Issues

OSError: cannot load library 'pango-1.0-0'

While running the Import a Unix command example, I received this error message

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
/tmp/ipykernel_22298/2898335744.py in <module>
----> 1 from pyrpipe.runnable import Runnable
      2 grep=Runnable(command='grep')
      3 grep.run('v3.9.8:bb3fdcf','.',verbose=True)

~/.local/lib/python3.8/site-packages/pyrpipe/__init__.py in <module>
     26 import atexit
     27 from pyrpipe import pyrpipe_utils as pu
---> 28 from pyrpipe import reports
     29 import uuid
     30 

~/.local/lib/python3.8/site-packages/pyrpipe/reports.py in <module>
     12 from pyrpipe import pyrpipe_utils as pu
     13 from jinja2 import Environment, BaseLoader
---> 14 from weasyprint import HTML,CSS
     15 from html import escape
     16 import datetime as dt

~/.local/lib/python3.8/site-packages/weasyprint/__init__.py in <module>
    320 
    321 # Work around circular imports.
--> 322 from .css import preprocess_stylesheet  # noqa isort:skip
    323 from .html import (  # noqa isort:skip
    324     HTML5_UA_COUNTER_STYLE, HTML5_UA_STYLESHEET, HTML5_PH_STYLESHEET,

~/.local/lib/python3.8/site-packages/weasyprint/css/__init__.py in <module>
     25 from ..logger import LOGGER, PROGRESS_LOGGER
     26 from ..urls import URLFetchingError, get_url_attribute, url_join
---> 27 from . import computed_values, counters, media_queries
     28 from .properties import INHERITED, INITIAL_NOT_COMPUTED, INITIAL_VALUES
     29 from .utils import get_url, remove_whitespace

~/.local/lib/python3.8/site-packages/weasyprint/css/computed_values.py in <module>
     14 
     15 from ..logger import LOGGER
---> 16 from ..text.ffi import ffi, pango, units_to_double
     17 from ..text.line_break import Layout, first_line_metrics, line_size
     18 from ..urls import get_link_attribute

~/.local/lib/python3.8/site-packages/weasyprint/text/ffi.py in <module>
    381     ffi, 'gobject-2.0-0', 'gobject-2.0', 'libgobject-2.0-0',
    382     'libgobject-2.0.so.0', 'libgobject-2.0.dylib')
--> 383 pango = _dlopen(
    384     ffi, 'pango-1.0-0', 'pango-1.0', 'libpango-1.0-0', 'libpango-1.0.so.0',
    385     'libpango-1.0.dylib')

~/.local/lib/python3.8/site-packages/weasyprint/text/ffi.py in _dlopen(ffi, *names)
    375             pass
    376     # Re-raise the exception.
--> 377     return ffi.dlopen(names[0])  # pragma: no cover
    378 
    379 

~/.local/lib/python3.8/site-packages/cffi/api.py in dlopen(self, name, flags)
    148                             "or an already-opened 'void *' handle")
    149         with self._lock:
--> 150             lib, function_cache = _make_ffi_library(self, name, flags)
    151             self._function_caches.append(function_cache)
    152             self._libraries.append(lib)

~/.local/lib/python3.8/site-packages/cffi/api.py in _make_ffi_library(ffi, libname, flags)
    830 def _make_ffi_library(ffi, libname, flags):
    831     backend = ffi._backend
--> 832     backendlib = _load_backend_lib(backend, libname, flags)
    833     #
    834     def accessor_function(name):

~/.local/lib/python3.8/site-packages/cffi/api.py in _load_backend_lib(backend, name, flags)
    825         if first_error is not None:
    826             msg = "%s.  Additionally, %s" % (first_error, msg)
--> 827         raise OSError(msg)
    828     return backend.load_library(path, flags)
    829 

OSError: cannot load library 'pango-1.0-0': pango-1.0-0: cannot open shared object file: No such file or directory.  Additionally, ctypes.util.find_library() did not manage to locate a library called 'pango-1.0-0'

I used the pyrpipe_environment.yaml file from this repo and ran the provided installation command

conda env create -f pyrpipe_environment.yaml

I tried editing this line to use pango=1.0.0 and ran conda env update --file pyrpipe_environment.yaml --prune but got the error:

ResolvePackageNotFound: 
  - pango=1.0.0

How do I satisfy the missing dependency?

Other info

Command Output
lsb_release -d Description: Ubuntu 20.04.3 LTS
conda --version conda 4.10.3

Freeze after fasterq-dump

I tried to save my session, and reload it to convert sra to fastq file.
The program freeze after the first sample finish, without any error message.

>>> for ob in sraObjects:
...     ob.runFasterQDump(deleteSRA=True,**{"-e":"8","-f":"","-t":workingDir})
...
$ fasterq-dump -e 8 -f -t /work/LAS/mash-lab/jing/testpyrpipe -O /work/LAS/mash-lab/jing/testpyrpipe/SRR976159 -o SRR976159.fastq /work/LAS/mash-lab/jing/testpyrpipe/SRR976159/SRR976159.sra
Time taken:0:00:05.402236

OSError 'Directory not empty' while running 'pyrpipe_diagnostic test'

Hi,

I just installed pyrpipe and I ran the testing commands 'pyrpipe_diagnostic test' but it gave me errors as follows:

Traceback (most recent call last): 
 file "/home/sirius/anaconda3/envs/pyrpipe/bin/pyrpipe_diagnostic", line 10, in <module>      
   sys.exit(main())                                                                                                              
File "/home/sirius/anaconda3/envs/pyrpipe/lib/python3.8/site-packages/pyrpipe/__diagnostic__.py", line 274, in main               
    testsra()                                                                                                                     File "/home/sirius/anaconda3/envs/pyrpipe/lib/python3.8/site-packages/pyrpipe/__diagnostic__.py", line 232, in testsra            
    test_sratools.runtest()                                                                                                       File "/home/sirius/anaconda3/envs/pyrpipe/lib/python3.8/site-packages/pyrpipe/test_sratools.py", line 20, in runtest              
    os.rmdir(sraob.directory)                                                                                                   
OSError: [Errno 39] Directory not empty: './pyrpipe_sratest/ERR726985'

I also looked at the config-TOOLS and I set the settings as "user repository" (No word 'public' exists), but with no help.

Could you please help me solve the issue? I haven't run actual program yet but I guess something's off if there's an error in testing commands.

Best

'matplotlib' has no attribute 'get_data_path'`

Hi,

I downloaded the Pyrpipe package through git clone, but when I want to import it using from pyrpipe import sra,qc,mapping,assembly get the following:

`Traceback (most recent call last):

Cell In[2], line 1
from pyrpipe import sra,qc,mapping,assembly

File c:\users\celin\pyrpipe\pyrpipe_init_.py:28
from pyrpipe import reports

File c:\users\celin\pyrpipe\pyrpipe\reports.py:18
from pyrpipe import benchmark as bm

File c:\users\celin\pyrpipe\pyrpipe\benchmark.py:12
import seaborn as sns

File ~\anaconda3\Lib\site-packages\seaborn_init_.py:2
from .rcmod import * # noqa: F401,F403

File ~\anaconda3\Lib\site-packages\seaborn\rcmod.py:3
import matplotlib as mpl

File ~\anaconda3\Lib\site-packages\matplotlib_init_.py:964
cbook._get_data_path("matplotlibrc"),

File ~\anaconda3\Lib\site-packages\matplotlib\cbook.py:545 in _get_data_path
return Path(matplotlib.get_data_path(), *args)

File ~\anaconda3\Lib\site-packages\matplotlib_api_init_.py:217 in getattr
raise AttributeError(

AttributeError: module 'matplotlib' has no attribute 'get_data_path'`

What can I do to solve this?

log failed commands

log commands as they start to capture any commands terminated before completion

ERROR - STAR command not found

OSError: CommandNotFoundException encountered when running the first cookbook recipe of aligning and mapping A. Thaliana genome. (Screenshot Attached)

Screenshot 2021-05-28 at 1 10 39 PM

PackagesNotFoundError: The following packages are not available from current channels: - sra-tools=2.10.9

Hi, I'm unable to install the required tools by using the command: conda install -c bioconda pyrpipe star=2.7.7a sra-tools=2.10.9 stringtie=2.1.4 trim-galore=0.6.6 orfipy=0.0.3 salmon=1.4.0

I added the three conda channels in the right order as you mentioned, but this is what I get each time (in MacOS):

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  • sra-tools=2.10.9

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

Could you please help? Thanks!

cannot change the output dir for stringtie merge

After running stringtie for each sample, I want to merge the gtf by stringtie merge. But the output was written to a SRR folder, not the working folder, and the name has SRR1573523_hisat2_sorted_stringtie prefix, which cannot change

st.stringtie_merge(*tuple(gtfList),out_suffix="maizeMerged",**{"-p":"16"})

basictest.py: sam_to_bam() got an unexpected keyword argument 'threads'

Hi,

I am running basictest.py for Pyrpipe but i got the issues as follows:

ERROR basictest.py - TypeError: sam_to_bam() got an unexpected keyword argument 'threads'

I tried both in my laptop and on the Slurm server but neither worked.

Could you please help find out the possible issue?

Thank you!

Salmon quantification with several srr_object

Hi

Fist of all congratulations for pyrpipe, it is a great tool!

I am trying to perform an RNA-seq analysis, however I am having problems with Salmon, I am using salmon as a pseudoalingner. So, after loading the different SRR files in a sraObject, I performed the trimming with Trimgalore, ( I don't do STAR as aligner and I directly go with Salmon). When I try to use Salmon, I obtain a quant.sf file with the values of the first SRR, and apparently the following ones are over write in the file. So, I cannot perform the quantification for multiple files.

Any idea about how to solve it, please?

Thank you very much!

Best

Victor

"ResolvePackageNotFound:" Trouble installing pyrpipe from environment.yaml into a virtual environment using conda

Hi, I'm trying to install the pyrpipe package using the conda env create -f pyripe_environment.yaml command (as suggested here: https://githubmemory.com/repo/urmi-21/pyrpipe) but I'm getting a couple error messages: one is saying Solving environment: failed and the other is saying ResolvePackageNotFound: followed by about a hundred other packages. The first dozen or so are:

  - boost-cpp==1.74.0=hc6e9bd1_2
  - orfipy==0.0.3=py39h7cff6ad_1
  - coloredlogs==15.0=py39hf3d152e_0
  - stringtie==2.1.4=h7e0af3c_0
  - libtool==2.4.6=h58526e2_1007
  - perl-xml-libxml==2.0132=pl526h7ec2d77_1
  - sqlite==3.35.3=h74cdb3f_0
  - wrapt==1.12.1=py39h3811e60_3
  - setuptools==52.0.0=py39h06a4308_0
  - googleapis-common-protos==1.53.0=py39hf3d152e_0
  - openblas==0.3.12=pthreads_h43bd3aa_1
  - chardet==4.0.0=py39hf3d152e_1
  - rich==10.0.1=py39hf3d152e_0
  - libpng==1.6.37=hed695b0_2
  - libstdcxx-ng==9.3.0=h6de172a_18

I'm using a Mac computer and my operating system is OSX 11.5.2.

My conda configuration (obtained using the conda info command) is:

    active env location : /Users/myname/miniconda3
            shell level : 1
       user config file : /Users/myname/.condarc
 populated config files : /Users/myname/.condarc
          conda version : 4.10.3
    conda-build version : not installed
         python version : 3.9.5.final.0
       virtual packages : __osx=10.16=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /Users/myname/miniconda3  (writable)
      conda av data dir : /Users/myname/miniconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/osx-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/bioconda/osx-64
                          https://conda.anaconda.org/bioconda/noarch
                          https://repo.anaconda.com/pkgs/main/osx-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/osx-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /Users/myname/miniconda3/pkgs
                          /Users/myname/.conda/pkgs
       envs directories : /Users/myname/miniconda3/envs
                          /Users/myname/.conda/envs
               platform : osx-64
             user-agent : conda/4.10.3 requests/2.25.1 CPython/3.9.5 Darwin/20.6.0 OSX/10.16
                UID:GID : 501:20
             netrc file : None
           offline mode : False

From what I've seen on this thread (Climate-Data-Science/Climate-Similarity-Metrics#13) it seems like one issue might be that there are Operating System-specific dependencies in the .yaml package, and that this could be resolved by adding --no-builds to the conda env export -f environment.yml [here?] script. I am relatively new to programming but from what I've heard, I think this is something that would be run on the developer's end. I'm writing to ask if this would resolve the error I'm having, and if it wouldn't, if you could suggest some ways for me to resolve it (possibly using a command analogous to --no-builds but for conda env create instead of for conda env export)?

Could you help me to install pyrpipe?

Hi.
I am trying to use pyrpipe for my RNAseq data. I am relatively new to python world.
i followed install page (https://pyrpipe.readthedocs.io/en/latest/installation.html) and setting up page (https://pyrpipe.readthedocs.io/en/latest/tutorial/setup.html).
Then I wanted to do 'pyrpipe_diagnostic build-tools' and I got this error "pyrpipe_diagnostic: command not found"

I am not sure pyrpipe is installed correctly or the PATH does not have pyrpipe folder.
I am testing the system by installing it on an ubuntu 20 OS in my virtual machine.
Could you help me?

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.