eQTL-Detect is a Nextflow based bioinformatics workflow to detect the cis, trans and splicing eQTLs (expresssion quantitative trait loci) by perfoming associations with genotype and expression data.
This repository provides the Nextflow scripts and demo data to test the eQTL analyses and run with large datasets. It was developed as a sub-workflow design (in five independent nextflow scripts and ordered numerically from 00 to 04) in order make the workflow distributable across different project partners and also as a standalsone script, to run the complete analysis with single command on HPC machines. It was primarily developed to detect eQTLs in cattle (Bos taurus), but it can be adopted for any other species by providing the reference genome assembly and transcriptome annotation gtf files of the species of interest.
- Nextflow version => 21.04
- Docker version >= 20.10.8 installed on their machines to run these scripts.
The analysis can run with a single script or using modular scripts based on user preferences.
-
For single script analysis, the user should use the following command and should provide the read type and read strandedness for the RNAseq data:
- Parameters
- readtype: --pairedEnd_reads, --singleEnd_reads,
- Strandedness: --firstStranded, --secondStranded and --unStranded
Script main: nextflow run [main.nf] (https://github.com/BovReg/BovReg_eQTL/blob/main/main.nf) -params-file [main.json] (https://github.com/BovReg/BovReg_eQTL/blob/main/main.json) --pairedEnd_reads --firstStranded
- Parameters
-
If the user has aligned bam files, the alignment step can be skipped using the following command
- Parameters
- Strandedness: --firstStranded, --secondStranded and --unStranded
Script main with bam files as input: nextflow run [main.nf] (https://github.com/BovReg/BovReg_eQTL/blob/main/main.nf) -params-file [main.json] (https://github.com/BovReg/BovReg_eQTL/blob/main/main.json) --bamFiles_input --firstStranded
- Parameters
-
This script can also run by provinding the expression count matrices using the following command
- Parameters
- --countMatrices_input_
Script main with count matrices as input: nextflow run [main.nf] (https://github.com/BovReg/BovReg_eQTL/blob/main/main.nf) -params-file [main.json] (https://github.com/BovReg/BovReg_eQTL/blob/main/main.json) --countMatrices_input
- Parameters
-
For modular analysis users can opt for the following scripts.
Script 00: Indexing the reference genome.
nextflow run 00_eQTLDetect.nfScript 01: Alignment and quantification of expression data.
nextflow run 01_eQTLDetect.nf -params-file 01_eQTLDetect.jsonScript 02: Extracting the genotype data from samples having corresponding RNAseq samples.
nextflow run 02_eQTLDetect.nfScript 03: Merging the read and transcript counts generated from stringtie and cluster introns found in junction files estimate covriates for splicing sites based on PCs.
nextflow run 03_eQTLDetect.nf -params-file 03_eQTLDetect.jsonScript 04: Performing cis and trans QTL mapping using the RNA seq and corresponding genotype data.
nextflow run 04_eQTLDetect.nf -params-file 04_eQTLDetect.json
-
The genotype demodata is available in the folder Demo_genotype_BovReg.
-
The Phenotype data can be provided in the following formats 1. Raw data (RNAseq expression data in fastq format), 2. Aligned reads (RNAseq expression data in bam format) and 3. Expression counts across samples (expression count matrices in text file).
These files can be downloaded from research data open repository Zenodo (Fastq), Zenodo (expression counts and aligned bam files). -
The genotype-phenotype corresponding samples information can be found in the text file RNA_WGS_CorresID_BovReg.txt
-
The reference genome and annotation file for the demo analysis can be downloaded here reference genome: fasta format and reference annotation: gtf format.