This pipeline takes as input a directory of genomes in complete or draft format (one file = one genome) and outputs a phylogeny of those genomes based on single copy ribosomal proteins. Ribosomal proteins are identified using hmmsearch and an alignment of ribosomal proteins from Yutin et al., 2012, aligned using mafft, and used to build a phylogeny with raxml. The basis for this pipeline is the method detailed in Hehemann, et al., 2016. It utilizes snakemake for ease of use and reproducibility.
##How to run
Dependencies of this pipeline are handled by conda. You can download miniconda (for python 3.5 or higher please!) here.
After installing miniconda:
- Modify the
RiboTree_config.yml
file to reflect appropriate filepaths, parameters, and filenames. All parameters are described in comments within theRiboTree_config.yml
file.
1b. If running on a cluster that uses slurm as a job scheduler, modify the RiboTree_cluster_params.yml
file to match your cluster setup.
2a. If running on a single machine, simply run the RiboTree_run_on_single_machine.sh
shell script.
2b. If running on a cluster that uses slurm as a job scheduler, modify the RiboTree_run_on_slurm.sbatch
jobscript as appropriate and submit to the job scheduler.
##Notes
Not tested on Windows or OSX machines. I suspect that at present the code will break in windows due to the use of \
as opposed to /
in filepaths on Windows systems. May work in OSX.