A Docker image containing a fully-operational Galaxy instance with pre-installed demonstration material for CRAVAT-P.
Created as a demonstration for the following technical note for the Journal of Proteome Research:
Ray Sajulga, Subina Mehta, Praveen Kumar, James E. Johnson, Candace R. Guerrero, Michael C. Ryan, Rachel Karchin, Pratik D. Jagtap, and Timothy J. Griffin
- Galaxy Instance (version 17.09)
- CRAVAT-P submit, intersect, annotate, and retrieve Galaxy tool
- CRAVAT-P Galaxy Viewer (Galaxy Visualization Plugin)
- Input files (i.e., VCF and proBED files)
- Basic CRAVAT-P Workflow
Installation Guide ⤴
1.) Install Docker for Mac or PC. Open Docker.
2.) Open your terminal. Run the following command:
docker run -d -p 8080:80 galaxyp/cravatp
The image will now download from the public repository galaxyp/cravatp on Docker Hub. This should take around 15 minutes to download.
In the meanwhile, feel free to take some time to understand the different components of this Docker command. You can also read up on CRAVAT-P background information in the next section.
Component | Type | Description |
---|---|---|
docker | Base command | The base command for the Docker CLI (Command Language Interface) |
run | Command | Run a command in a new container |
-d, --detach | OPTION | Run container in background and print container ID |
-p, --publish | OPTION | Publish a container's port(s) to the host |
galaxyp/cravatp | IMAGE | galaxyp's cravatp image |
More documentation can be found at Docker's documentation website.
3.) Once the command is finished, wait a few moments for the Docker image to initialize as a container. Open http://localhost:8080 and follow the CRAVAT-P tutorial to access the CRAVAT-P suite. If you do not see the Galaxy screen, wait a few seconds and then reload the page.
Once you are finished using this container, you can clean up your workspace by simply exiting out of Docker.
Background ⤴
CRAVAT-P ⤴
(Cancer Related Analysis of VAriants Toolkit - Proteomics)
CRAVAT-P is a proteomic extension of CRAVAT (http://cravat.us) developed for the Galaxy-P (http://galaxyp.org) bioinformatics platform. CRAVAT-P exists as a downstream analysis suite for peptide variants. Current support is tailored towards workflows that generate peptide sequences mapped to genomic locations.
Galaxy Tool ⤴
The figure above shows the Galaxy tool developed for submitting jobs to the CRAVAT server. It extends from an earlier version of In Silico Solutions' Galaxy tool (cravat_score_and_annotate). In our CRAVAT-P tool, we added support for additional parameters: CHASM classifiers (e.g., breast, brain-glioblastoma-multiforme, etc.) and the older GRCh37/hg19 human genome build. We also added proteomic support, as highlighted by the outlined red box. Here, a proBED file can be provided for intersection with the genomic input file—VCF (Variant Call Format). You can specify whether you want to output the intersected VCF file or submit only the intersected variants.
VCF (Variant Call Format)
ID | Chr. | Position | Strand | Ref. base | Alt. base |
---|---|---|---|---|---|
VAR527 | chr12 | 6561055 | + | T | C |
VAR529 | chr12 | 110339630 | + | C | T |
VAR532 | chr14 | 102083954 | + | C | T |
VAR539 | chr19 | 17205335 | + | A | T |
VAR541 | chr19 | 17205973 | + | T | C |
VAR542 | chr19 | 18856059 | + | C | T |
ProBED (Proteomic Browser Extensible Data)
Chr. | Start | End | Peptide | Strand |
---|---|---|---|---|
chr12 | 6561014 | 6561056 | STGVILANDANAER | - |
chr12 | 110339607 | 110339637 | EWGSGSDILR | + |
chr14 | 102083930 | 102083972 | GVVDSENLPLNISR | - |
chr19 | 17205327 | 17206022 | GRMGEPGAEPGHFGVCVDSLTSDK | + |
chr19 | 18856027 | 18856078 | EAIDSPVSFLVLHNQIR | + |
Galaxy Workflow ⤴
Galaxy workflows are tailored pipelines that promote reproducibility, ease-of-use, and preservation of complex analyses. Two workflows, both with differing complexities, are shown above. The simple workflow (top left panel) was used for the paper and Docker image to redirect focus to the downstream analysis i.e., CRAVAT-P's outputs and viewer. A fully-fledged workflow (bottom panel) is shown as an example of a highly complex workflow. The top right panel shows how workflows can automate parameter selection and offer additional options such as e-mail notification and output cleanup.
Galaxy Viewer Plugin ⤴
Galaxy uses JavaScript-based visualization plugins to interactively explore your data.
Panel A shows the actual viewer, with panels B - E as blown-up images for further detail.
(A-i) Sidebar for showing additional information, mainly column visibility toggling. There are many columns to sift through > from CRAVAT's annotation.
(A-ii) An embedded webpage from the CRAVAT server termed their "Single Variants Page" feature.
(B) Leveraging the DataTable.js library, this table can be sorted and filtered. By default, it is sorted by p-values (based on the machine learning analysis i.e., VEST or CHASM) from most impactful to least. The selected box exhibits a peptide column that highlights the variant amino acid within a peptide hit. Since some cells may have large amounts of text, the full datum is shown in the display box at the top.
(C) CRAVAT uses Protein Diagrams to show lollipop mutations from your given protein variant. You can also choose TCGA (The Cancer Genome Atlas) tissue mutations. You can mouse over different parts to show domains, binding sites, and other regions of interest.
(D) CRAVAT uses the cytoscape.js library to display gene enrichment networks housed by the NDEx (Network Data Exchange) infrastructure. You can move elements around and examine different pathways.
(E) CRAVAT uses another project developed by the same lab (Professor Rachel Karchin's lab of John Hopkin's University) called MuPIT (Mutation Position Imaging Toolbox) designed to show the location of single nucleotide variants (SNVs) on interactive three-dimensional protein structures. You can click on individual residues and adjust the display options.
CRAVAT-P Tutorial ⤴
Overview
Import the input files → Run the workflow → Access the viewer
1.) Import the input files from the data library ⤴
- click Shared Data > Data Libraries
- open Training Data > Input files for CRAVAT-P Demo
- check the checkbox in the header to select both input files
- click to History
- optional: name your new history (e.g., mcf7_cancer_proteogenomics)
- click import
- click on the green pop-up window to go back to the homepage to analyze these datasets.
2.) Log in and run the workflow ⤴
- The CRAVAT-P workflow was placed into an administrative account through Docker. To access it, click Login or Register > Login and log in using the following credentials:
- Username: [email protected]
- Password: admin
- click Workflow to show the list of workflows in this account. In this case, we only have the CRAVAT Workflow
- click on the CRAVAT Workflow button and click Run from the resulting dropdown
- click Run workflow. The analysis will start and will finish in a couple of minutes. This workflow was set to include proteogenomic input and automatically select the correct input file types (VCF and proBED) in the history.
3.) Access the viewer ⤴
- Once the VCF output turns green (signifying completion), you can access the visualizer. Open the dataset collection, and click on any of the four datasets to expand it. The variant dataset is preferred, since it typically contains the most useful information. In the viewer, you will be able to access all the datasets anyway.
- Click the "visualize" icon and select CRAVAT Viewer.