Giter Site home page Giter Site logo

jdmagasin / nifh-asv-workflow Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 23.67 MB

Workflow stages and data for Morando, Magasin et al. 2024

License: Other

Makefile 16.64% Shell 2.25% R 53.84% Python 27.27%
dada2 diazotrophs marine-microbial-ecology nitrogen-fixation

nifh-asv-workflow's Introduction

Workflow for processing nifH amplicon data sets

This repository contains all post-pipeline software stages and data deliverables described in Morando, Magasin et al. 2024. The workflow was used to process nearly all published nifH amplicon MiSeq data sets that existed at the time of publication, as well as two new data sets produced by the Zehr Lab at UC Santa Cruz. The samples are shown in this map which links to an interactive Google map with study names, sample IDs, and collection information for each sample. Map of studies used in Morando, Magasin et al. 2024

Workflow overview

The following figure shows the workflow: Overview of DADA2 niH workflow DADA2 ASVs were created by our DADA2 nifH pipeline (green). Post-pipeline stages (lavender), each executed by a Makefile or Snakefile, were used to gather the ASVs from all studies, filter the ASVs for quality, and annotate them, as well as to download sample-colocated environmental data from the Simons Collaborative Marine Atlas Project (CMAP). The nifH ASV database generated by the workflow will support future research into N2-fixing marine microbes. The published database and any updated versions are available within the WorkspaceStartup directory, both as nifH_ASV_database.tgz as well as the R image, workspace.RData. The published database is also available at https://doi.org/10.6084/m9.figshare.23795943.v1.

Running the workflow

The workflow requires the DADA2 nifH pipeline as well as its ancillary tools. Please see the Installation directory in the pipeline repository. Additionally you will need to install GNU make to run the post-pipeline stages, each of which uses a Makefile. Please see the Installation instructions.

The DADA2 nifH pipeline outputs for all studies are provided in the Data directory. You do not need to run the pipeline. However, if you wish to run the pipeline, the parameters files used for each study are included in Data. You are free to modify them.

Each of the post-pipeline stages 1 through 6 can be run -- in order -- by entering the associated directory and running "make" from your shell's command line. For example, if I wanted to run the GatherAsvs stage I would do the following from the command line:

(base) [jmagasin@thalassa]$ conda activate nifH_ASV_workflow
(nifH_ASV_workflow) [jmagasin@thalassa]$ cd GatherAsvs
(nifH_ASV_workflow) [jmagasin@thalassa]$ make &> log.18July2023.txt &

Here I am using a BASH shell (recommended). First I activate the nifH_ASV_workflow environment, a critical step that ensures that all tools and packages needed by the workflow are available. Note how activation changes the prompt to begin with "(nifH_ASV_workflow)" on line two. On the third line, I make the GatherAsvs stage and save the Makefile messages to a log file. Most stages take at least a few minutes to complete so I run them in the background (the trailing &).

Please see documentation at the top of each Makefile for an overview of the stage.


Copyright (C) 2023 Michael B. Morando and Jonathan D. Magasin

nifh-asv-workflow's People

Contributors

mo-morando avatar jdmagasin avatar

Watchers

 avatar  avatar

nifh-asv-workflow's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.