Giter Site home page Giter Site logo

cmkobel / mspipeline1 Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 1.8 MB

๐Ÿช†๐Ÿฆ– A snakemake wrapper around Nesvilab's FragPipe-CLI. In a perfect world, this pipeline was based on Sage.

License: GNU General Public License v3.0

Python 80.69% Shell 19.31%
fragpipe metaproteomics proteomics hpc parallel-computing snakemake-pipeline

mspipeline1's Introduction

mspipeline1

                                              _____________ 
                                             < mspipeline1 >
                                              ------------- 
                                                          \ 
                             ___......__             _     \
                         _.-'           ~-_       _.=a~~-_  
 --=====-.-.-_----------~   .--.       _   -.__.-~ ( ___===>
               '''--...__  (    \ \\\ { )       _.-~        
                         =_ ~_  \\-~~~//~~~~-=-~            
                          |-=-~_ \\   \\                    
                          |_/   =. )   ~}                   
                          |}      ||                        
                         //       ||                        
                       _//        {{                        
                    '='~'          \\_    =                 
                                    ~~'    

If you want to use fragpipe using the command line interface, then this is the tool for you.

This pipeline takes 1) a list of .d files and 2) a list of fasta-amino acid files and outputs sane protein calls with abundances. It uses philosopher database and fragpipe to do the job. The snakemake pipeline maintains a nice output file tree.

Why you should use this pipeline

Because it makes sure that all outputs are updated when you change input-parameters. It also yells at you if something fails, and hopefully makes it a bit easier to find the error.

Installation

  1. Prerequisites:
  • Preferably a HPC system, or a beefy local workstation.
  • An anaconda or miniconda3 package manager on that system.
  1. Clone this repo on the HPC/workstation where you want to work.

    git clone https://github.com/cmkobel/mspipeline1.git && cd mspipeline1
    
  2. If you don't already have an environment with snakemake and mamba installed, use the following command to install a "snakemake" environment with the bundled environment file:

    conda env create -f environment.yaml -n mspipeline1
    

    This environment can then be activated by typing conda activate mspipeline1

  3. If needed, tweak the profiles/slurm/ configuration so that it matches your execution environment. There is a profile for local execution without a job management system (profiles/local/) as well as a few profiles for different HPC environments like PBS and SLURM.

Usage

1) Update config.yaml

The file config_template.yaml contains all the parameters needed to run this pipeline. You should change the parameters to reflect your sample batch.

Because nesvilab do not make their executables immediately publicly available, you need to tell the pipeline where to find them on your system. Update addresses for the keys philosopher_executable, msfragger_jar, ionquant_jar and fragpipe_executable which can be downloaded here, here, here and here, respectively.

Currently the pipeline is only tested on the input of .d-files (agilent/bruker): Create an item in batch_parameters where you define key d_base which is the base directory where all .d-files reside. Define key database_glob which is a path (or glob) to the fasta-amino acid files that you want to include in the target protein database.

Define items under the samples key which link sample names to the .d-files.

Lastly, set the batch key to point at the batch that you want to run.

2) Run

Finally, run the pipeline in your command line with:

$ snakemake --profile profiles/slurm/

Below is visualization of the workflow graph:

Screenshot 2023-02-23 at 10 48 07

Future

This pipeline might involve an R-markdown performing trivial QC. Also, a test data set that accelerates the development cycle. ๐Ÿšดโ€โ™€๏ธ

mspipeline1's People

Contributors

cmkobel avatar

Watchers

 avatar

mspipeline1's Issues

temporary directories

One major bottleneck of this pipeline is that it doesn't utilize temporary directories efficiently. Currently, the snakemake temporary directory is set to somewhere on the userwork partition on saga. The problem is that stupidly, several rules need access to the same temporary directory. I guess it really goes against the definition of a temporary directory to use it for something that isn't temporary.

rule philosopher_database permissions

Config file config.yaml is extended by additional config specified via the command line.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Select jobs to execute...

[Tue Jul 19 10:21:30 2022]
rule philosopher_database:
    output: output/220506_DnnT6S_carl/msfragger/philosopher_database.fas
    jobid: 0
    benchmark: output/220506_DnnT6S_carl/benchmarks/philosopher_database.tsv
    reason: Forced execution
    wildcards: config_batch=220506_DnnT6S_carl
    resources: mem_mb=1000, disk_mb=1000, tmpdir=/scratch/3973804




        >&2 echo "Catting database files ..."
        # Cat all database source files into one.
        cat  > output/220506_DnnT6S_carl/msfragger/cat_database_sources.faa


        >&2 echo "Change dir ..."
        # As philosopher can't specify output files, we need to change dir.
        mkdir -p output/220506_DnnT6S_carl/msfragger
        cd output/220506_DnnT6S_carl/msfragger

        >&2 echo "Philosopher workspace clean ..."
        /faststorage/project/gBGC/bin/philosopher_v4.1.1_linux_amd64/philosopher workspace             --nocheck             --clean 

        >&2 echo "Philosopher workspace init ..."
        /faststorage/project/gBGC/bin/philosopher_v4.1.1_linux_amd64/philosopher workspace             --nocheck             --init 

        >&2 echo "Removing previous .fas ..."
        rm *.fas || echo "nothing to delete" # Remove all previous databases if any.

        >&2 echo "Philosopher database ..."
        /faststorage/project/gBGC/bin/philosopher_v4.1.1_linux_amd64/philosopher database             --custom cat_database_sources.faa             --contam 

        >&2 echo "Move output ..."
        # Manually rename the philosopher output so we can grab it later
        mv *-decoys-contam-cat_database_sources.faa.fas philosopher_database.fas

        >&2 echo "Clean up ..."
        # clean up 
        rm cat_database_sources.faa


        
Catting database files ...
Change dir ...
Philosopher workspace clean ...
time="10:21:32" level=info msg="Executing Workspace  v4.1.1"
time="10:21:32" level=info msg="Removing workspace"
time="10:21:32" level=warning msg="Cannot read file. open .meta/meta.bin: no such file or directory"
time="10:21:32" level=info msg=Done
Philosopher workspace init ...
time="10:21:32" level=info msg="Executing Workspace  v4.1.1"
time="10:21:32" level=info msg="Creating workspace"
time="10:21:32" level=warning msg="Cannot verify local directory path. getwd: no such file or directory"
time="10:21:32" level=fatal msg="Cannot create meta directory; check folder permissions. stat .meta: no such file or directory"
[Tue Jul 19 10:21:32 2022]
Error in rule philosopher_database:
    jobid: 0
    output: output/220506_DnnT6S_carl/msfragger/philosopher_database.fas
    shell:
        


        >&2 echo "Catting database files ..."
        # Cat all database source files into one.
        cat  > output/220506_DnnT6S_carl/msfragger/cat_database_sources.faa


        >&2 echo "Change dir ..."
        # As philosopher can't specify output files, we need to change dir.
        mkdir -p output/220506_DnnT6S_carl/msfragger
        cd output/220506_DnnT6S_carl/msfragger

        >&2 echo "Philosopher workspace clean ..."
        /faststorage/project/gBGC/bin/philosopher_v4.1.1_linux_amd64/philosopher workspace             --nocheck             --clean 

        >&2 echo "Philosopher workspace init ..."
        /faststorage/project/gBGC/bin/philosopher_v4.1.1_linux_amd64/philosopher workspace             --nocheck             --init 

        >&2 echo "Removing previous .fas ..."
        rm *.fas || echo "nothing to delete" # Remove all previous databases if any.

        >&2 echo "Philosopher database ..."
        /faststorage/project/gBGC/bin/philosopher_v4.1.1_linux_amd64/philosopher database             --custom cat_database_sources.faa             --contam 

        >&2 echo "Move output ..."
        # Manually rename the philosopher output so we can grab it later
        mv *-decoys-contam-cat_database_sources.faa.fas philosopher_database.fas

        >&2 echo "Clean up ..."
        # clean up 
        rm cat_database_sources.faa


        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

Merge some rules together.

The temporary directories given to some of the rules, especially the ones that create workspaces should be merged together. The problem is that the temporary dirs (scratch) don't exist after the initial job ends, so, really, the only option to use the scratch directory is to have more rules merged.

ule link_input fails sometimes


        
Touching output file output/220506_DnnT6S_carl/msfragger/link_input.done.
Traceback (most recent call last):
  File "/home/cmkobel/faststorage/miniconda3/envs/smk785/lib/python3.10/site-packages/snakemake/__init__.py", line 726, in snakemake
    success = workflow.execute(
  File "/home/cmkobel/faststorage/miniconda3/envs/smk785/lib/python3.10/site-packages/snakemake/workflow.py", line 1133, in execute
    raise e
  File "/home/cmkobel/faststorage/miniconda3/envs/smk785/lib/python3.10/site-packages/snakemake/workflow.py", line 1129, in execute
    success = self.scheduler.schedule()
  File "/home/cmkobel/faststorage/miniconda3/envs/smk785/lib/python3.10/site-packages/snakemake/scheduler.py", line 423, in schedule
    self._finish_jobs()
  File "/home/cmkobel/faststorage/miniconda3/envs/smk785/lib/python3.10/site-packages/snakemake/scheduler.py", line 526, in _finish_jobs
    self.get_executor(job).handle_job_success(job)
  File "/home/cmkobel/faststorage/miniconda3/envs/smk785/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 633, in handle_job_success
    super().handle_job_success(job)
  File "/home/cmkobel/faststorage/miniconda3/envs/smk785/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 232, in handle_job_success
    job.postprocess(
  File "/home/cmkobel/faststorage/miniconda3/envs/smk785/lib/python3.10/site-packages/snakemake/jobs.py", line 1076, in postprocess
    self.dag.check_and_touch_output(
  File "/home/cmkobel/faststorage/miniconda3/envs/smk785/lib/python3.10/site-packages/snakemake/dag.py", line 594, in check_and_touch_output
    f.touch()
  File "/home/cmkobel/faststorage/miniconda3/envs/smk785/lib/python3.10/site-packages/snakemake/io.py", line 707, in touch
    raise e
  File "/home/cmkobel/faststorage/miniconda3/envs/smk785/lib/python3.10/site-packages/snakemake/io.py", line 693, in touch
    with open(file, "w"):
PermissionError: [Errno 13] Permission denied: 'output/220506_DnnT6S_carl/msfragger/20220506_C9_Slot1-33_1_1971.d/.snakemake_timestamp'```

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.