Giter Site home page Giter Site logo

btmartin721 / clinehelpr Goto Github PK

View Code? Open in Web Editor NEW
16.0 3.0 1.0 95.64 MB

Detects Outliers and plots genomic clines from BGC output, and extends the plotting functionality of INTROGRESS to Correlate genomic clines and hybrid indices with Environmental Variables

License: GNU General Public License v3.0

R 82.50% Python 9.13% Shell 5.92% Dockerfile 2.45%
genomic-cline bgc introgression introgress r r-package chromosome ideogram enmeval maxent raster plot hybridization hybrid-zone

clinehelpr's Introduction

ClineHelpR

Plot BGC and INTROGRESS genomic cline results and correlate INTROGRESS clines with environmental variables.

ClineHelpR allows you to plot BGC (Bayesian Genomic Cline) output. After we ran BGC, we realized it wasn't easy to plot the BGC results, so we put together this package in the process of figuring it out.

Our package provides tools for running BGC as well as pre- and post-processing its input and output files. The pipeline spans from input file conversion to plotting results, identifies outliers from the BGC output files, and allows you to make numerous highly customizable, publication-quality plots.

ClineHelpR also provides INTROGRESS tools spanning input file conversion and parsing and plotting the results. Finally, ClineHelpR provides tools for performing ecological niche modeling to correlate environmental variables with the INTROGRESS output.

The BGC and INTROGRESS software packages are described elsewhere (Gompert and Buerkle, 2010, 2011, 2012; Gompert et al., 2012a, 2012b).

Software Flow Diagram

Example Dataset

All example data are available from a Dryad Digital Repository (https://doi.org/10.5061/dryad.b2rbnzsc8), as the files are too large for GitHub. To run the example data, download the exampleData directory from DRYAD, then run the R scripts in the ClineHelpR/scripts directory.

NOTE: For the tutorials, we will use the example dataset. The current working directory in each tutorial is /home/user/app/notebooks, and we will place the example data in "../data/exampleData". ../data and notebooks are volumes that are automatically created for you in the docker container, and they allow you to interface between your host system and the docker container. When you run docker, create a parent directory to keep the input and output, and create subdirectories in it called data/, notebooks, and results. Store the exampleData file in data/, and store your Jupyter Notebooks in notebooks.

Installation

There are two options for installing ClineHelpR and its dependencies:

  1. With docker
  2. Manual installation.

Docker

ClineHelpR can be run in a container from our pre-built image. The image has all the dependencies installed and is compatible with Python 3.6, BGC, and R. Additionally, the docker container can be run in a Jupyter notebook directly from your browser! However, if you would rather run it in a terminal, we still provide that as an option.

Docker Step 1: Pull the Docker Image

First, pull the docker image.

sudo docker pull btmartin721/clinehelpr:latest

Make sure the tag, :latest, is included.

Docker Step 2: Run the Image

Once you have the image, then you can run the docker image in a container. If you aren't familiar with docker, it basically runs a pre-built image in a "container", which is like a virtual machine that is isolated from your operating system. In this case, that virtual machine runs Ubuntu 18.04 and has all the necessary dependencies and software pre-installed and in your path.

If you are using Windows, you can run the container in Windows Subsystem for Linux (fully tested with WSL version 2).

Once you generate the container, copy and paste the link that contains the IP address into your browser, and you can then run our scripts, BGC, INTROGRESS, ENMeval, and ClineHelpR from a Jupyter notebook!

We have provided a BASH script to run the docker image in a container. The script has command-line arguments that allow you to choose between running a shell environment or a Jupyter Notebook.

Let's run it to start the container. The script can be found in ClineHelpR/scripts/.

To run it, you can type run_docker.sh -h to pull up the help menu. It looks like this:

# run_docker.sh is in the ClineHelpR/scripts directory.
run_docker.sh -h

Run ClineHelpR Docker image

Usage: run_docker.sh <0 or 1> <PATH_TO_PROJECT_DIRECTORY>
Options:
-h     Print this Help.
-s     Shell to run docker in; 0 -> BASH shell, 1 -> Jupyter Notebook
-p     Path to project directory (optional); If not specified, uses current working directory

If you want to run the docker container in a terminal, you can specify:

run_docker.sh -s 0 -p ./analysis

If you want to run it in a Jupyter Notebook, run the command like this:

run_docker.sh -s 1 -p ./analysis

-s 0 tells the docker container to start in a terminal. -s 1 tells it to run the docker container in a Jupyter Notebook.

Either way, all the necessary dependencies and scripts will be in your path.

If it asks you if your daemon is running, type sudo service docker start and run the run_docker.sh script again.

If you choose to run the Jupyter Notebook, a link that you can copy and paste into your browser will print to the terminal.

Note: Make sure the link is the one with the IP address at the beginning (after the OR), or you'll get an error. Also make sure to include the characters after token=. See the below image for reference.

Running the Jupyter Notebook

If you are using Jupyter, you can freely switch between Python3 and R kernels. We will start with Python3 because it allows you to run bash with the magic command, %%bash. To switch between Python3 and R, click just to the left of the circle in the top right corner of the Jupyter Notebook and choose the kernel you want. See the following images.

Manual Installation

If you don't want to use a jupyter notebook with docker, e.g. if you are on a high-performance computing cluster, you can manually install the dependencies. Most of them can be installed directly with Anaconda. The only one that has to be installed manually is the "INTROGRESS" R package. Additionally, we include a conda environment file, environment.yml, that contains a blueprint for installing all the necessary conda dependencies.

Conda Environment File

You can use it by typing conda env create --file environment.yml into a terminal that has anaconda3 or miniconda3 installed. The environment.yml file is located in the root ClineHelpR GitHub directory.

Full Manual Installation with Conda

If you would rather install the dependencies manually, they are listed below. To install them, you can run the following commands in a terminal:

conda create -n clinehelpr python=3.6
conda activate clinehelpr
conda install -c conda-forge r-base r-dplyr r-bayestestr r-scales r-reshape2 r-ggplot2 r-forcats r-gtools r-rideogram r-gdata r-adegenet r-enmeval r-rjava r-raster r-sp r-dismo r-ggforce r-concaveman r-readr r-xml r-stringi r-devtools jupyterlab

To install the additional pyVCF dependency for the vcf2bgc.py script:

conda install -c bioconda pyvcf

In our experience, installing conda packages from conda-forge and bioconda works better with R packages than the default anaconda or r channels. Importantly, we have experienced compatibility issues when trying to install some packages from the r or default conda channels and others from bioconda or conda-forge. We highly recommend using conda-forge and bioconda, which play nicely together.

Install R Packages

If you are not using docker, then you also need to install the INTROGRESS R package. Type the following command into your R session:

install.packages("introgress", dependencies=TRUE, repos="https://cran.r-project.org/")

Installing ClineHelpR

Finally, if you are not using docker, you need to install ClineHelpR. ClineHelpR can be installed directly from GitHub using the devtools R package. Run the following command(s) from an R session:

# NOTE: devtools can be installed with conda (recommended)
# However, if you don't already have devtools installed, uncomment the next line
# install.packages("devtools")

# Install ClineHelpR
devtools::install_github("btmartin721/ClineHelpR")

Dependencies

ClineHelpR has multiple dependencies, most of which can be installed using Anaconda3. They are all listed below.

The bgcPlotter functions require:

  • data.table
  • dplyr
  • bayestestR
  • scales
  • reshape2
  • ggplot2
  • forcats
  • gtools
  • RIdeogram
  • gdata
  • adegenet
  • ggforce
  • concaveman
  • readr

The environmental functions require:

  • ENMeval
  • rJava
  • raster
  • sp
  • dismo

The INTROGRESS functions require:

  • introgress (not available from conda)
  • ggplot2
  • dplyr
  • scales

Other required dependencies

  • XML (R package)
  • stringi

The vcf2bgc.py script requires:

  • Python >= 3.4 and Python <= 3.6
  • pyVCF

Pipeline

There are R and python scripts in the ClineHelpR/scripts directory that allow you to run our whole pipeline. All the steps from below can be run by modifying and using those scripts. We also demonstrate each step in our Jupyter Notebook tutorial in the ClineHelpR/tutorials directory.

Change Log

  • 06/08/2022 - Updated Docker image with:
    • Added tutorial jupyter notebooks to container
    • Fixed bug where run_bgc.sh would name the lnl file incorrectly
    • Added support for stacks VCF files with vcf2bgc.py
    • Changed docker tag to btmartin721/clinehelpr:latest

References

Gauthier J., de Silva D.L., Gompert Z., Whibley A., Houssin C., Le Poul Y., McClure, M., Lemaitre, C., Legeai, F., Mallet, J., Elias, M. 2020. Contrasting genomic and phenotypic outcomes of hybridization between pairs of mimetic butterfly taxa across a suture zone. Molecular Ecology, 29: 1328–1343.

Gompert Z., Buerkle C.A.. 2009. A powerful regression-based method for admixture mapping of isolation across the genome of hybrids. Molecular Ecology, 18: 1207–1224.

Gompert, Z., Buerkle, C.A. 2010. INTROGRESS: A software package for mapping components of isolation in hybrids. Molecular Ecology Resources, 10(2): 378-384.

Gompert Z., Buerkle C.A. 2012. BGC: Software for Bayesian estimation of genomic clines. Molecular Ecology Resourses, 12: 1168–1176.

Gompert Z., Buerkle C.A. 2011. Bayesian estimation of genomic clines. Molecular Ecology, 20: 2111–2127.

Gompert Z., Lucas L.K., Nice C.C., Fordyce J.A., Forister M.L., Buerkle C.A. 2012. Genomic regions with a history of divergent selection affect fitness of hybrids between two butterfly species. Evolution, 66: 2167–2181.

Gompert Z., Parchman T.L., Buerkle C.A. 2012. Genomics of isolation in hybrids. Philosophical Transactions of the Royal Society B Biological Sciences. 367: 439–450.

Gompert Z., Mandeville E.G., Buerkle CA. 2017. Analysis of population genomic data from hybrid zones. Annual Review of Ecology & Evolution, 48: 207–229.

Hao, Z., Lv, D., Ge, Y., Shi, J., Weijers, D., Yu, G., Chen, J. 2020. RIdeogram: Drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Computer Science, 6: e251.

Kass, J. M., Muscarella, R., Galante, P. J., Bohl, C. L., Pinilla‐Buitrago, G. E., Boria, R. A., Soley-Guardia, M., Anderson, R. P. 2021. ENMeval 2.0: redesigned for customizable and reproducible modeling of species’ niches and distributions. Methods in Ecology and Evolution, 12: 1602-1608.

Li, H. 2018. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics, 34(18): 3094-3100.

Martin, B.T., Douglas, M.R., Chafin, T.K., Placyk, J.S. Jr., Birkhead, R.D., Phillips, C.A., Douglas, M.E. 2020. Contrasting signatures of introgression in North American box turtle (Terrapene spp.) contact zones. Molecular Ecology, 29(21): 4186-4202.

Phillips, S.J., Anderson, R.P., Schapire, R.E. 2006. Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190: 231-259.

Phillips, S.J., Dudík, M., Schapire, R.E. 2004. A maximum entropy approach to species distribution modeling. In Proceedings of the Twenty-First International Conference on Machine Learning, pp. 655-662.

clinehelpr's People

Contributors

btmartin721 avatar tkchafin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

altingia

clinehelpr's Issues

run bgc error in docker

Dear Martin:
I used the docker to run the bgc:
./bgc -a ~/test_file/YJS_newhybrids_chr10_filter_p0in.txt -b ~/test_file/YJS_newhybrids_chr10_filter_p1in.txt -h ~/test_file/YJS_newhybrids_chr10_filter_admixedin.txt -M ~/test_file/YJS_newhybrids_chr10_filter_map.txt -O 0 -x 50000 -n 25000 -p 1 -q 1 -N 1 -m 1 -D 0.5 -t 5 -E 0.0001 -d 1 -s 1 -I 0 -u 0.04
8fb7de8bec3ec5b1745de9ac6f0b8c0

but it have something wrong like:

Reading input files
Number of loci: 1090786
Number of admixed populations: 1
Number of individuals: 6
Using the linkage model for locus effects
Allowing for uncertainty in allele counts
Allocating memory
gsl: init_source.c:40: ERROR: failed to allocate space for block data
Default GSL error handler invoked.
Aborted (core dumped)

I don't know how to deal with it, any guidance would be fantastic. Thank you!

phiPlot visualization issue

Hello.
Thanks for this great package to visualize and explore bgc output.
I'm having a visualization issue with the phi plot that I haven't been able to figure out.
The image seems to be missing the portions of the curves extending from hybrid index = 0.95.
Can you help?
Regards
Screenshot 2024-02-19 at 1 15 15 PM

GSL error.

Hi Bradley,

thank you for your fantastic tool.

I am getting errors when running run_bgc.sh, and I have no idea where to look for the problem:

After:
Initializing MCMC chain
gsl: ../gsl/gsl_vector_int.h:182: ERROR: index out of range

I attached my input files and I'd be most grateful for any pointers.

Kind regards

Ludo

bgc_p1in.txt
bgc_p0in.txt
bgc_loci.txt
bgc_admixedin.txt
bgc_settings.txt

Error

I got it running with genind2bcg, but once I try to run:

bcg.genes I got an error although I have the files in the indicated directory.

Screen Shot 2022-05-02 at 3 48 33 PM

Any ideas on what is wrong?
Thank you very much!

R function: combine_bgc_output

This function assumes that the lnl output suffix includes "LnL" but using the docker image the output in the suffix has "lnl" (lowercase).
Workaround: rename the output files changing the lnl to LnL

example:
bgc produces: test_stat_lnl_1
change it to: test_stat_LnL_1

jinja2

Not sure it is only me but I had to change jinja2 version in environment.yml to 2.11.3 else conda could not resolve the environment

pafscaff

Hello,

This R package helps me a lot!
I am trying to plot an Ideogram with the output of bgc. The species I am working on has a draft genome and I aligned this to the chromosome level before running bgc. Is it possible to plot the output with our fasta data which we mapped the reads or something, instead of pafscaff output file? Alternatively. perhaps I can create a dummy file of .scaffold.tdt in pafscaff. Can you give me details of the contents of the .scaffold.tdt?

Thank your very much in advance!

Error with parsing alleles depth

Hi! I am trying to run vcf2bgc and I found this error:
$ vcf2bgc.py -v chr22_ldna.recode.vcf -m population_map.txt --p1 P1 --p2 P2 --admixed ADMIXED --outprefix clines_chr22

P1 population has 6 individuals...

P2 population has 8 individuals...

Admixed populalation has 67 individuals...

Processing 1563 records in VCF file...

Traceback (most recent call last):
File "/home/user/app/src/scripts/vcf2bgc.py", line 240, in get_allele_depth
alleles = call.data[2].split(",")
AttributeError: 'int' object has no attribute 'split'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/app/src/scripts/vcf2bgc.py", line 422, in
main()
File "/home/user/app/src/scripts/vcf2bgc.py", line 172, in main
write_output(record, popsamples, ref, alt, locus, args.outprefix, admix, p1, p2)
File "/home/user/app/src/scripts/vcf2bgc.py", line 285, in write_output
admix_output = get_allele_depth(record, "Admixed", ref, alt, sampledict)
File "/home/user/app/src/scripts/vcf2bgc.py", line 254, in get_allele_depth
raise AttributeError("Error with parsing allele depths!")
AttributeError: Error with parsing allele depths!

My vcf was generated with GATK 4. Any idea on what is going on?
Thank you so much!
Best wishes

vcf2bgc.py file

Hello,

I'm trying to convert my vcf file (from ipyrad) to bcg using vcf2bgc.py file. For this, I'm using your example data files download from DRYAD but I get an error:

./vcf2bgc.py -v eatt.trans.finalfilt.recode.vcf -m eatt.bgc.popmap_final.txt --p1 PureEA --p2 PureTT --admixed EATT -o example_tutorial

P1 population has 8 individuals...

P2 population has 8 individuals...

Admixed populalation has 85 individuals...

Processing 233 records in VCF file...

Traceback (most recent call last):
File "./vcf2bgc.py", line 378, in
main()
File "./vcf2bgc.py", line 140, in main
normalize_linkagemap(pos_list, pos_min, pos_max, chrom_number, linkage_fh, locus_list)
File "./vcf2bgc.py", line 192, in normalize_linkagemap
mylist[i] = (val - nmin) / (nmax - nmin)
ZeroDivisionError: integer division or modulo by zero

Does anybody have an idea what is wrong? Any help would be appreciated.
The same error occurs with my own data.

The package is really helpful, congrats!
Thank you!

Hybrid Index

Hi! I would like to know whether there is a way to provide the hybrid index (admixture coefficient), calculated in previous analysis, in ClineHelpR like in HIEST: https://cran.r-project.org/web/packages/HIest/HIest.pdf

The idea is to avoid classifying individuals in discrete populations, but rather use a continuous measure of admixture coefficient.

Thank you so much!

ClineHelpR R package conda install?

Hi!

I'm in the process of creating my bgc input files. The genind2bgc function is taking a long time to run in RStudio, so I thought I would run the R script on our remote servers to free up my laptop.

I'm running the Rscript through the ClineHelpR conda environment, and I'm getting an error that the "genind2bgc" command is not recognized: Error in genind2bgc(gen = hybrid_0.8gmiss_0.3imiss_minDP5_srich_sp_remove, : could not find function "genind2bgc"

It looks like the genind2bgc is not included in any of the dependencies that were required for the ClineHelpR conda environment. Is there an R package conda install of ClineHelpR that I can install within the ClineHelpR environment?

Thanks!

get_bgc_outliers error

Hi there! Hope you are doing well. I'm fairly new to this and attempting to use the ClineHelpR functions on my bgc outputs, but keep running into the same error when I get to the get_bgc_ouliers step(see output below).

image

It fails my run each time at this step and never creates the gene.outliers object. I did make sure to include "loci.file=NULL" in my code and I'm only really interested in outputting the Phi Plots. So I'm not quite sure how to fix this issue. Any guidance would be fantastic. Thank you!

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.