Genomic changes in A. castellanii during infection of amoeba by L. pneumophila

Background

This repository contains the analysis of Acanthamoeba castellanii infection by Legionella pneumophila. We investigate how the host genome is remodelled during infection by an intracellular bacterium. To investigate these changes, we use Hi-C and RNAseq to measure both 3D changes in chromatin and gene expression changes. We use two biological replicates of uninfected A. castellanii (strain C3) and two infected replicates at 5h post infection.

A frozen copy of this repository and its output data are available for download at the corresponding Zenodo record.

Dependencies

The pipeline is written using snakemake and has the following dependencies:

python >= 3.7
conda >= 4.8
snakemake >= 5.5

Each rule is encapsulated in a conda environment where its dependencies are managed automatically. Fastq files containing the Hi-C and RNA-seq reads are also downloaded automatically from SRA. Input files (genomes, annotations, ...) are automatically downloaded from the corresponding Zenodo record.

Installation

You need to have a working conda installation on your machine and install snakemake (>=5.5) via pip or conda.

Usage

You can then run the pipeline with:

snakemake -j6 --use-conda

And the pipeline should fetch required packages and data as it runs.

Configuration

Some metadata files are provided with the pipeline to help understand the design and modify parameters. The following files may be of interest:

samples.tsv: Samples used in analyses and associated informations
units.tsv: sequencing libraries used in the pipeline, file paths for the reads and metadata
config.yaml: path to key files and general parameters to control results of the pipeline.
cluster_slurm.json: Cluster resource requirements in the event that the pipeline is run on a HPC with the SLURM scheduler. In that case, the following command should be used to run the pipeline instead:
- snakemake --rerun-incomplete --use-conda --cluster-config cluster_slurm.json --cluster "sbatch -n {cluster.ntasks} -c {cluster.ncpus} --mem {cluster.mem} --qos {cluster.queue}" --jobs 30

Pipeline

The pipeline is subdivided into submodules relating to the processing and downstream analysis of Hi-C and RNAseq data. It starts from fastq files to generate Hi-C matrices and differential expression results. It also computes statistics and does pattern detection on Hi-C contact map to generate figures and tables which will be used by tailored analyses in jupyter notebooks.

Here is a visual summary of pipeline steps and their dependencies:

For a more detailed visual summary showing input/output files, see the filegraph

Analyses

Analyses are described in jupyter notebooks located in the docs/notebooks folder. These notebooks are numbered to reflect the logical order in which analyses should be done. They should be executed in that order as some will generate files for the next notebook.

Notebook: Statistical exploration of chromatin loop changes
Notebook: Visual exploration of global contact changes during infection
Notebook: Analysis of interchromosomal contacts changes
Notebook: Detection and overview of chromatin insulation domains
Notebook: Analysis of the relationship between expression and contacts changes during infection
Notebook: Analysis of gene coexpression versus contact changes using lifted-over expression data from Li et al. 2020

pythseq / acastellanii_legionella_infection Goto Github PK

acastellanii_legionella_infection's Introduction

Genomic changes in A. castellanii during infection of amoeba by L. pneumophila

Background

Dependencies

Installation

Usage

Configuration

Pipeline

Analyses

acastellanii_legionella_infection's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent