Giter Site home page Giter Site logo

shenshenwu5 / rproject-template Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 4 KB

This is a template for creating repoducible R projects. The folder structure is opinionated and is designed to keep your project organized and reproducible.

Dockerfile 5.98% Makefile 48.31% R 45.71%

rproject-template's Introduction

rproject-template

This is a template for creating repoducible R projects using apptainer (formerly singularity) or docker and GNU Make. The folder structure is opinionated and is designed to keep your project organized and reproducible.

Pre-requisites

Install docker on your machine: https://docs.docker.com/get-docker/

How to use this template

  1. Create a repository from this template according to these instructions: https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-repository-from-a-template.
    Make sure to select the option to include all branches. This will allow you to use either the apptainer branch for working on projects on HPC systems or the docker branch for working locally.

  2. Then clone your new repository locally:

    git clone https://github.com/{YourUserName}/{YourProject}.git
  3. Set your project name in the Makefile:

    ## Define the project name
    export PROJECT_NAME := {YourProject}

    and rename the .Rproj file:

    mv rproject-template.Rproj {YourProject}.Rproj
  4. Run the command make up in your terminal. This will build the docker image and start the RStudio session. You can access the RStudio session at http://localhost:8787 in your browser. The default username is rstudio and the password will be randomly generated and printed in the terminal. The password is stored in the .env file in the root directory of your project.

  5. Stop the project by running make down in your terminal.

Adding packages and other dependences to your project

If you need to add additional R packages or other dependencies to your project, add them to the Dockerfile. This will ensure that the packages are installed in the docker image each time you run make up.

Run the example structure

You can run the example structure to see how the project is organized and how the Makefile works by running make or make all in the terminal.

Follow these steps to clear out the example structure for your own project:

  • Remove scripts: rm scripts/processing/* scripts/analysis/* scripts/utils/*.
  • Clear out Makefile objects and rules.
  • Replace README.md with your project description.

What to put in README.md

The README contains information about your project. Here you can describe your analyses, processing steps, or approaches. Most importantly, the README should contain information about your raw data. Describe 1) how the raw data was generated, 2) where it was obtained, and 3) how it can be accessed for others. This is the perfect place to describe steps that were carried out on the command line or on a remote cluster.

Makefile

The Makefile is like a recipe book that describes the desired output files, the input files used to make them and the instructions for transforming inputs into outputs. Everytime you create an output file of any type, make sure to add it to the Makefile. This ensures that changes to your scripts will produce the most up-to-date outputs after running make.

In the terminal, navigate to the project directory and run make to build all objects and make clean to remove all objects. Alternatively, use the build tab in RStudio.

Resources for learning GNU make:

Directory organization

  • data
    • Contains all "processed" data for the project. Whether data is "processed" is subjective, however, we consider data to be processed if it was produced by an Rscript. Since the Makefile will generate the contents of this folder, files should not be tracked by git (add to .gitignore).
  • data/raw
    • Contains all input files needed for analysis that are not produced in this project. Example would be loop calls, .hic files, and other large data. Contents can be suborganized into folders as desired. Since these files are typically large, files are not tracked through git (add to .gitignore). However, these files should not be deleted since they are used to generate all processed data in the analysis.
  • scripts
    • Scripts contain all R scripts and functions that are used to generate output data objects, plots, and tables. Everything in this folder should be tracked with git to ensure reproducibility. For convenience we suggest using the following subdirectories:

      • scripts/processing

        • Processing data to create data objects. Files typically begin with "make" and correspond to an object in data (e.g. scripts/processing/makeObject1.R produces data/object1.rds).
      • scripts/analysis

        • Whenever you are using data objects to create a plot, table, report it should be placed in this folder. (e.g. scripts/analysis/surveyPlot1.R produces plots/surveyPlot1.pdf).
      • scripts/utils

        • Keep R functions that are used in more than one file here. Access them in other scripts by using the source() function.
  • plots
    • Output plots from scripts/processing. Can be suborganized into folders as desired. Since the Makefile will generate the contents of this folder, files should not be tracked by git (add to .gitignore).
  • tables
    • Output tables from scripts/processing. Can be suborganized into folders as desired. Since the Makefile will generate the contents of this folder, files should not be tracked by git (add to .gitignore).

rproject-template's People

Contributors

shenshenwu5 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.