CRAVAT-P Galaxy Docker

A Docker image containing a fully-operational Galaxy instance with pre-installed demonstration material for CRAVAT-P.

Created as a demonstration for the following technical note for the Journal of Proteome Research:

Bridging the Chromosome-Centric and Biology and Disease Human Proteome Projects: Accessible and automated tools for interpreting biological and pathological impact of protein sequence variants detected via proteogenomics

Ray Sajulga, Subina Mehta, Praveen Kumar, James E. Johnson, Candace R. Guerrero, Michael C. Ryan, Rachel Karchin, Pratik D. Jagtap, and Timothy J. Griffin

Galaxy-P

Collaborators

Installation Guide ⤴

1.) Install Docker for Mac or PC. Open Docker.

2.) Open your terminal. Run the following command:

docker run -d -p 8080:80 galaxyp/cravatp

The image will now download from the public repository galaxyp/cravatp on Docker Hub. This should take around 15 minutes to download.

In the meanwhile, feel free to take some time to understand the different components of this Docker command. You can also read up on CRAVAT-P background information in the next section.

Component	Type	Description
docker	Base command	The base command for the Docker CLI (Command Language Interface)
run	Command	Run a command in a new container
-d, --detach	OPTION	Run container in background and print container ID
-p, --publish	OPTION	Publish a container's port(s) to the host
galaxyp/cravatp	IMAGE	galaxyp's cravatp image

More documentation can be found at Docker's documentation website.

3.) Once the command is finished, wait a few moments for the Docker image to initialize as a container. Open http://localhost:8080 and follow the CRAVAT-P tutorial to access the CRAVAT-P suite. If you do not see the Galaxy screen, wait a few seconds and then reload the page.

Once you are finished using this container, you can clean up your workspace by simply exiting out of Docker.

Background ⤴

CRAVAT-P ⤴

(Cancer Related Analysis of VAriants Toolkit - Proteomics)

CRAVAT-P is a proteomic extension of CRAVAT (http://cravat.us) developed for the Galaxy-P (http://galaxyp.org) bioinformatics platform. CRAVAT-P exists as a downstream analysis suite for peptide variants. Current support is tailored towards workflows that generate peptide sequences mapped to genomic locations.

Galaxy Tool ⤴

The figure above shows the Galaxy tool developed for submitting jobs to the CRAVAT server. It extends from an earlier version of In Silico Solutions' Galaxy tool (cravat_score_and_annotate). In our CRAVAT-P tool, we added support for additional parameters: CHASM classifiers (e.g., breast, brain-glioblastoma-multiforme, etc.) and the older GRCh37/hg19 human genome build. We also added proteomic support, as highlighted by the outlined red box. Here, a proBED file can be provided for intersection with the genomic input file—VCF (Variant Call Format). You can specify whether you want to output the intersected VCF file or submit only the intersected variants.

Example input files

VCF (Variant Call Format)

ID	Chr.	Position	Strand	Ref. base	Alt. base
VAR527	chr12	6561055	+	T	C
VAR529	chr12	110339630	+	C	T
VAR532	chr14	102083954	+	C	T
VAR539	chr19	17205335	+	A	T
VAR541	chr19	17205973	+	T	C
VAR542	chr19	18856059	+	C	T

ProBED (Proteomic Browser Extensible Data)

Chr.	Start	End	Peptide	Strand
chr12	6561014	6561056	STGVILANDANAER	-
chr12	110339607	110339637	EWGSGSDILR	+
chr14	102083930	102083972	GVVDSENLPLNISR	-
chr19	17205327	17206022	GRMGEPGAEPGHFGVCVDSLTSDK	+
chr19	18856027	18856078	EAIDSPVSFLVLHNQIR	+

Galaxy Workflow ⤴

Galaxy workflows are tailored pipelines that promote reproducibility, ease-of-use, and preservation of complex analyses. Two workflows, both with differing complexities, are shown above. The simple workflow (top left panel) was used for the paper and Docker image to redirect focus to the downstream analysis i.e., CRAVAT-P's outputs and viewer. A fully-fledged workflow (bottom panel) is shown as an example of a highly complex workflow. The top right panel shows how workflows can automate parameter selection and offer additional options such as e-mail notification and output cleanup.

Galaxy Viewer Plugin ⤴

Galaxy uses JavaScript-based visualization plugins to interactively explore your data.

Panel A shows the actual viewer, with panels B - E as blown-up images for further detail.

(A-i) Sidebar for showing additional information, mainly column visibility toggling. There are many columns to sift through > from CRAVAT's annotation.

(A-ii) An embedded webpage from the CRAVAT server termed their "Single Variants Page" feature.

(B) Leveraging the DataTable.js library, this table can be sorted and filtered. By default, it is sorted by p-values (based on the machine learning analysis i.e., VEST or CHASM) from most impactful to least. The selected box exhibits a peptide column that highlights the variant amino acid within a peptide hit. Since some cells may have large amounts of text, the full datum is shown in the display box at the top.

(C) CRAVAT uses Protein Diagrams to show lollipop mutations from your given protein variant. You can also choose TCGA (The Cancer Genome Atlas) tissue mutations. You can mouse over different parts to show domains, binding sites, and other regions of interest.

(D) CRAVAT uses the cytoscape.js library to display gene enrichment networks housed by the NDEx (Network Data Exchange) infrastructure. You can move elements around and examine different pathways.

(E) CRAVAT uses another project developed by the same lab (Professor Rachel Karchin's lab of John Hopkin's University) called MuPIT (Mutation Position Imaging Toolbox) designed to show the location of single nucleotide variants (SNVs) on interactive three-dimensional protein structures. You can click on individual residues and adjust the display options.

CRAVAT-P Tutorial ⤴

Overview

Import the input files → Run the workflow → Access the viewer

1.) Import the input files from the data library ⤴

click Shared Data > Data Libraries
open Training Data > Input files for CRAVAT-P Demo
check the checkbox in the header to select both input files
click to History
optional: name your new history (e.g., mcf7_cancer_proteogenomics)
click import
click on the green pop-up window to go back to the homepage to analyze these datasets.

2.) Log in and run the workflow ⤴

The CRAVAT-P workflow was placed into an administrative account through Docker. To access it, click Login or Register > Login and log in using the following credentials:
- Username: [email protected]
- Password: admin
click Workflow to show the list of workflows in this account. In this case, we only have the CRAVAT Workflow
click on the CRAVAT Workflow button and click Run from the resulting dropdown
click Run workflow. The analysis will start and will finish in a couple of minutes. This workflow was set to include proteogenomic input and automatically select the correct input file types (VCF and proBED) in the history.

3.) Access the viewer ⤴

Once the VCF output turns green (signifying completion), you can access the visualizer. Open the dataset collection, and click on any of the four datasets to expand it. The variant dataset is preferred, since it typically contains the most useful information. In the viewer, you will be able to access all the datasets anyway.
Click the "visualize" icon and select CRAVAT Viewer.

jraysajulga / cravatp-galaxy-docker Goto Github PK