Giter Site home page Giter Site logo

dsci_552-heather_harjyot_taracyc_analysis's Introduction


Viral Voyager: Taracyc Ocean Virus Analysis


Authors

Name CWL
Harjyot Kaur HarjyotKaur
Heather Van Tassel heathervant

Overview

One of the most promising places to sequester carbon is in the oceans. The ocean plays a vital dominant role in oxygen production, weather patterns, climate and the global carbon cycle. Cyanobacteria in the oceans digest carbon, and when the bacteria die, this carbon sinks to the bottom of the ocean, thereby sequestering it from our atmosphere. There are viruses that can infect bacteria and alter their chance of survival.

Motivation for research

In 2009, a 3-year voyage around the world began, to collect more information about our precious oceans. The project was led by the TARA oceans project and resulted in the collection of 300 water samples, involving over 150 Scientists who are curious about the biodiversity and distribution of micro-organisms in the oceans. The Hallam lab at UBC has taken these genetic sequences from the viruses and bacteria and created a complex algorithm that classifies the DNA sequences into biological pathways that these genes may be involved in regulating. A team of students and researchers took this dataset and made a shiny app to help the public interact with and explore the data at the University of British Columbia's hackseq 2018. Many questions are waiting to be explored with this dataset, to help characterize genetic diversity of the ocean, and make inferences about how bacteria and viruses interact and how they might be altered by changing climates.

Research Question

Does the mean abundance of viral DNA sequences differ across biological pathways? Does the mean abundance of viral DNA sequences differ across ocean depth levels? Does the mean abundance of viral DNA sequences of the biological pathways differ across ocean depth levels?

Analysis Overview

The goal is to carry out a Two-Way ANOVA (Factorial Analysis) to compare the main effects and interaction effects between biological pathways and ocean depth levels on the abundance of viral DNA sequences.

Variable Name Type Description
RKPM Continuous Reads per kilobase of transcript per million mapped reads
LEVEL1 Categorical Biological Pathways
Depth Categorical Levels of ocean depth

A detailed report of the analysis is available here.

Usage

There are multiple ways to run the entire analysis:

The foremost step for running the analysis is, Download or clone this Github repository: Taracyc_Ocean_Virus_Analysis

Method 1: Using Docker

  1. Install Docker

  2. Use the command line to navigate to the root of this project directory

  3. Run the following code in terminal to download the Docker image:

docker pull hkaur112/taracyc_ocean_virus_analysis
  1. Type the following code into terminal to run the analysis:

fill in PATH_ON_YOUR_COMPUTER with the absolute path to the root of this project on your computer

docker run --rm -e PASSWORD=test -v PATH_ON_YOUR_COMPUTER:/home/rstudio/taracyc_analysis hkaur112/taracyc_ocean_virus_analysis make -C 'home/rstudio/taracyc_analysis' all
  1. To clean the output of the analysis, type the following code into the terminal:

fill in PATH_ON_YOUR_COMPUTER with the absolute path to the root of this project on your computer

docker run --rm -e PASSWORD=test -v PATH_ON_YOUR_COMPUTER:/home/rstudio/taracyc_analysis hkaur112/taracyc_ocean_virus_analysis make -C 'home/rstudio/taracyc_analysis' clean

Link: Dockerfile

Method 2: Using Make

  1. Use the command line to navigate to the root of this project directory

  2. Type the following code into terminal to run the analysis:

make all
  1. To clean the output of the analysis, type the following code into the terminal:
make clean

Link: Makefile

Dependency diagram of the Makefile

Method 3: Shell Script

  1. Use the command line to navigate to the root of this project directory

  2. Run the following in your command shell:

bash run_all.sh

Link: Shell Script run_all.sh

Detailed WorkFlow

Step 1: Data Load

The first script src/taracyc_data_load.R runs and downloads the data from a URL and stores it in a csv data/taracyc_data.csv.

Step 2: Data Wrangling and Explanatory Data Analysis

The second script src/taracyc_data_explore_clean.R takes output of the first script and runs and explores data while simultaneously producing plots and cleaning data. It produces 5 plots that are stored in results/figures data as .png files. It also creates a csv taracyc_data_cleaned.csv with cleaned data.

Step 3: Data Analysis

The third script src/taracyc_data_analysis.R takes output of the second script and runs a Two-Way Anova on the data and stores it in the csv results/taracyc_results.csv.

Step 4: Compiling Results

The fourth script src/taracyc_results.R takes output of the second script and produces a visual representation of the Two-Way Anova and stores it in results/figures/fig7_results.png

Step 5: Creating Report

The report compiled in doc/taracyc_report.Rmd is rendered as a markdown and html file and stored in doc/ folder.

Dependencies

R version 3.5.1 and R libraries

Library Version
tidyverse tidyverse_1.2.1
ggplot2 ggplot2_3.0.0
car car_3.0-2
ggpubr ggpubr_0.2.999
rmarkdown rmarkdown_1.10
knitr knitr_1.20

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.