Giter Site home page Giter Site logo

antigen.garnish's Introduction

antigen.garnish 2

Human and mouse ensemble tumor neoantigen prediction from SNVs and complex variants. Immunogenicity filtering based on the Tumor Neoantigen Selection Alliance (TESLA).

Citation

Richman LP, Vonderheide RH, and Rech AJ. Neoantigen dissimilarity to the self-proteome predicts immunogenicity and response to immune checkpoint blockade. Cell Systems. 2019.

Selected references

Duan, F., Duitama, J., Seesi, S.A., Ayres, C.M., Corcelli, S.A., Pawashe, A.P., Blanchard, T., McMahon, D., Sidney, J., Sette, A., et al. Genomic and bioinformatic profiling of mutational neoepitopes reveals new rules to predict anticancer immunogenicity. J Exp Med. 2014.

Luksza, M, Riaz, N, Makarov, V, Balachandran VP, et al. A neoepitope fitness model predicts tumour response to checkpoint blockade immunotherapy. Nature. 2017.

Rech AJ, Balli D, Mantero A, Ishwaran H, Nathanson KL, Stanger BZ, Vonderheide RH. Tumor immunity and survival as a function of alternative neopeptides in human cancer. Clinical Cancer Research, 2018.

Wells DK, van Buuren MM, Dang KK, Hubbard-Lucey VM, Sheehan KCF, Campbell KM, Lamb A, Ward JP, Sidney J, Blazquez AB, Rech AJ, Zaretsky JM, Comin-Anduix B, Ng AHC, Chour W, Yu TV, Rizvi1 H, Chen JM, Manning P, Steiner GM, Doan XC, The TESLA Consortium, Merghoub T, Guinney J, Kolom A, Selinsky C, Ribas A, Hellmann MD, Hacohen N, Sette A, Heath JR, Bhardwaj N, Ramsdell F, Schreiber RD, Schumacher TN, Kvistborg P, Defranoux N. Key Parameters of Tumor Epitope Immunogenicity Revealed Through a Consortium Approach Improve Neoantigen Prediction. Cell. 2020.

Installation

Two methods exist to run antigen.garnish:

  1. Docker
  2. Linux

Docker

docker pull andrewrech/antigen.garnish:2.3.1

cID=$(docker run -it -d andrewrech/antigen.garnish:2.3.1 /bin/bash)

Download netMHC binaries (academic license): NetMHC 4.0, NetMHCpan 4.1b, NetMHCII 2.3, NetMHCIIpan 4.0.

Copy netMHC tar.gz files to the container and run the installation script:

docker cp netMHC-4.0a.Linux.tar.gz $cID:/netMHC-4.0a.Linux.tar.gz
docker cp netMHCII-2.3.Linux.tar.gz $cID:/netMHCII-2.3.Linux.tar.gz
docker cp netMHCpan-4.1b.Linux.tar.gz $cID:netMHCpan-4.1b.Linux.tar.gz
docker cp netMHCIIpan-4.0.Linux.tar.gz $cID:netMHCIIpan-4.0.Linux.tar.gz

docker exec $cID config_netMHC.sh

Linux

Dependencies

Installation

Install the dependencies listed above. Then, download and extract antigen.garnish data:

ANTIGEN_GARNISH_DIR="~/antigen.garnish"

cd ~
curl -fsSL "https://s3.amazonaws.com/get.rech.io/antigen.garnish-2.3.0.tar.gz" | tar -xvz
chmod 700 -R "$ANTIGEN_GARNISH_DIR"

Install antigen.garnish:

# install.packages("remotes")
remotes::install_github("andrewrech/antigen.garnish")

Next, download netMHC binaries (academic license): NetMHC 4.0, NetMHCpan 4.1b, NetMHCII 2.3, NetMHCIIpan 4.0.

Move the binaries into the antigen.garnish data directory, first setting the NET_MHC_DIR and ANTIGEN_GARNISH_DIR environment variables:

NET_MHC_DIR=/path/to/folder/containing/netMHC/downloads

cd "$NET_MHC_DIR"
mkdir -p "$ANTIGEN_GARNISH_DIR/netMHC"

tar xvzf netMHC-4.0a.Linux.tar.gz -C "$ANTIGEN_GARNISH_DIR/netMHC"
tar xvzf netMHCII-2.3.Linux.tar.gz -C "$ANTIGEN_GARNISH_DIR/netMHC"
tar xvzf netMHCpan-4.1b.Linux.tar.gz -C "$ANTIGEN_GARNISH_DIR/netMHC"
tar xvzf netMHCIIpan-4.0.Linux.tar.gz -C "$ANTIGEN_GARNISH_DIR/netMHC"

chown "$USER" "$ANTIGEN_GARNISH_DIR/netMHC"
chmod 700 -R "$ANTIGEN_GARNISH_DIR/netMHC"

Usage

See the reference manual.

Docker

Interactive use

Copy GRCh38-annotated VCF files and/or metadata including HLA alleles onto the running container using the docker cp command. The container ID is still saved as $cID from the installation above. You will also need to use this command and container ID to copy saved output files from the docker container after you complete your analysis.

Copy any needed files onto the running container, for example:

docker cp myfile.txt $cID:/myfilecopy.txt

Now launch the interactive virtual machine with the container you started:

docker exec -it $cID bash
R
library(antigen.garnish)

Follow the instructions in the next section titled Linux to complete your interactive R analysis. When you complete your analysis, copy any desired output files off the container to your local machine with the docker cp command. Shut down and clean up your container like this:

docker cp $cID:/myoutput.txt ~/myagdockeroutput.txt

docker stop $cID

docker rm $cID

Linux

Parallel cores used can be set via environment variable AG_THREADS (default: all available).

Predict neoantigens from missense mutations, insertions, and deletions

library(magrittr)
library(data.table)
library(antigen.garnish)

# load an example VCF
dir <- system.file(package = "antigen.garnish") %>%
       file.path(., "extdata/testdata")

file <- file.path(dir, "TUMOR.vcf")

# extract variants
dt <-  garnish_variants(file)

# add space separated MHC types
# see list_mhc() for nomenclature of supported alleles
# MHC may also be set to "all_human" or "all_mouse" to use all supported alleles

dt[, MHC := c("HLA-A*01:47 HLA-A*02:01 HLA-DRB1*14:67")]

# predict neoantigens
result <- dt %>% garnish_affinity(.)

result %>% str

Predict neoantigens from Microsoft Excel or other table input

Transcript ID level input table format:

# sample_id ensembl_transcript_id cDNA_change MHC
# sample_1  ENST00000311936       c.718T>A    HLA-A*02:01 HLA-A*03:01
# sample_1  ENST00000311936       c.718T>A    H-2-Kb H-2-Kb

Protein level input (with optional WT paired input) table format:

# sample_id pep_mut           pep_wt            mutant_index MHC
# sample_1  MTEYKLVVVDADGVGK  MTEYKLVVVDAGGVGK  12           HLA-A*02:01
# sample_1  MTEYKLVVVDDDGVGK  MTEYKLVVVDAGGVGK  12 13        HLA-A*02:01
# sample_1  MTEYKLVVVDAGGAAA  MTEYKLVVVDAGGVGK  14 15 16     HLA-A*02:01
# sample_1  SIINFEKLMILKATFI  MTEYKLVVVDAGGVGK  all          HLA-A*02:01
library(magrittr)
library(data.table)
library(antigen.garnish)
library(rio) # package to import Excel and other tables

# load an example table
dir <- system.file(package = "antigen.garnish") %>%
       file.path(., "extdata/testdata")

file <- file.path(dir, "antigen.garnish_example_peptide_with_WT_input.txt")

# read in excel or other format file with rio::import and convert to data table
# or substitute the path to your file here
mytable <- rio::import(file) %>% data.table::as.data.table()

# only use first two rows of table for example
mytable <- mytable[1:2]

# predict neoantigens from data table object
result <- garnish_affinity(mytable)

result %>% str

Directly calculate foreignness score and dissimilarity for a list of sequences

library(magrittr)
library(data.table)
library(antigen.garnish)

# generate our character vector of sequences
v <- c("SIINFEKL", "ILAKFLHWL", "GILGFVFTL")

# calculate foreignness score
v %>% foreignness_score(db = "human") %>% print

# calculate dissimilarity
v %>% dissimilarity_score(db = "human") %>% print

How are peptides generated?

library(magrittr)
library(data.table)
library(antigen.garnish)

data.table::data.table(
   pep_base = "Y___*___THIS_IS_________*___A_PEPTIDE_TEST!______*__X",
   mutant_index = c(5, 25, 47, 50),
   pep_type = "test",
   var_uuid = c(
                "front_truncate",
                "middle",
                "back_truncate",
                "end")) %>%
   make_nmers %>% print

Acknowledgments

We thank the follow individuals for contributions and helpful discussion:

License

Please see LICENSE.

antigen.garnish's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

antigen.garnish's Issues

password authentication during install on raw ubuntu AMI.

Is this a normal warning on devtools install from github on an open repo?

> devtools::install_github("andrewrech/antigen.garnish")
Downloading GitHub repo andrewrech/antigen.garnish@master
from URL https://api.github.com/repos/andrewrech/antigen.garnish/zipball/master
Installing antigen.garnish

-----------------------------------------------------------------------
ATTENTION!  Your password for authentication realm:

   <https://hedgehog.fhcrc.org:443> The bioconductor Subversion Repository

can only be stored to disk unencrypted!  You are advised to configure
your system so that Subversion can store passwords encrypted, if
possible.  See the documentation for details.

You can avoid future appearances of this warning by setting the value
of the 'store-plaintext-passwords' option to either 'yes' or 'no' in
'/home/ubuntu/.subversion/servers'.
-----------------------------------------------------------------------
Store password unencrypted (yes/no)?

error with README.md example

I am receiving an error running the README.md example that is due to not have parenthetical :
Error in .::stats : unused argument (na.omit)

library(magrittr)
library(antigen.garnish)

  # download an example VCF
    dt <- "antigen.garnish_example.vcf" %T>%
    utils::download.file("http://get.rech.io/antigen.garnish_example.vcf", ., method = "libcurl") %>%

  # extract variants
    garnish_variants

full output:

trying URL 'http://get.rech.io/antigen.garnish_example.vcf'
Content type 'text/x-vcard; charset=utf-8' length 6690 bytes
==================================================
downloaded 6690 bytes

Loading VCFs
Scanning file to determine attributes.
File attributes:
  meta lines: 56
  header line: 57
  variant count: 3
  column count: 11
Meta line 56 read in.
All meta lines processed.
gt matrix initialized.
Character matrix gt created.
  Character matrix gt rows: 3
  Character matrix gt cols: 11
  skip: 0
  nrows: 3
  row_num: 0
Processed variant: 3
All variants processed
Error in .::stats : unused argument (na.omit)

It looks like it is failing during variant caller look-up

vcf@meta %>%
        unlist %>%
        stringr::str_extract(stringr::regex("(Strelka)|(Mutect)|(VarScan)|(samtools mpileup)|(somaticsniper)|(freebayes)|(virmid)",
          ignore_case = TRUE)) %>%
        stats::na.omit %>%
                 unlist %>%
data.table::first

Calling both stats::na.omit() and data.table::first() with parenthetical does returns [1] "strelka". Is this an environment issue with my install?

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /home/ubuntu/.anaconda2/lib/R/lib/libRblas.so
LAPACK: /home/ubuntu/.anaconda2/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.10.4-3 magrittr_1.5        vcfR_1.7.0
[4] RevoUtils_10.0.8

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16      lattice_0.20-35   ape_5.1           memuse_4.0-0
 [5] viridisLite_0.3.0 permute_0.9-4     MASS_7.3-48       grid_3.4.3
 [9] nlme_3.1-131      stringi_1.1.7     vegan_2.4-6       Matrix_1.2-12
[13] pinfsc50_1.1.0    tools_3.4.3       stringr_1.3.0     parallel_3.4.3
[17] compiler_3.4.3    cluster_2.0.6     mgcv_1.8-22

garnish_summary returns null

When passing from garnish_jaffa > garnish_predictions > garnish_summary, null data.table is returned.

Does this happen for any time input to garnish_predictions is not from a VCF passed to garnish_variants? ie when cDNA or peptide direct input is passed to garnish_predictions and no DAI is calculated.

note on required R version

Add dependency on R version requirements:

ERROR: this R is version 3.3.1, package 'antigen.garnish' requires R >= 3.4.1

edit: missed this info in DESCRIPTION file. Maybe add a line at top of README.md?

Make Docker image

  • Ubuntu 16.04
  • python
  • pip
  • R/Bioconductor
  • libcurl4-gnutls-dev
  • libssl-dev
  • libxml2-dev
  • subversion

Add BLASTp functionality

  • Add blast to install.sh and dependencies
  • write function to blast non-wt peptides and uuid for corresponding matches
  • write test for function
  • make sure tests work and pass appropriately
  • update garnish_predictions manpage to include DAI and Blast_A descriptions/link to garnish_summary
  • use blastp instead of mass grep to remove mutants that are coded elsewhere by wild-type?
  • test with fusion input via garnish_jaffa
  • test with real vcf input
  • update tests after get_metadata changes

Blast all neoepitopes against the human peptidome

It is straightforward to filter all predicted neoepitopes against the human entire peptidome. Perhaps this is only required for 8mers or 9mers, probabilistically. It seems that stringi is the right combination of fast and reliable.

The search can stop after any match and we don't care about a return value.

Test VCFs

Function garnish_variants should accept any SnpEff-annotated VCF.

  • Mutect2, Strelka intersection
  • from source other than Mutect2, Strelka

detailed install instructions

Add detailed install instructions to README.md that explain the installation of all necessary dependencies on a raw Ubuntu AWS AMI.

How does SnpEff handle rare insertions / deletions?

Do edge cases exist where SnpEff annotates rare insertions / deletions as protein coding with nomenclature that will generate spurious peptides in antigen.garnish::get_cDNA?

These cases

  • c.183_186+48del
  • c.1149del
  • 1149+1del
  • c.4072-1234_5155-246del
  • c.19_21=/del

are handled properly by the _ lookahead/lookbehind:

    dt[, cDNA_locs := cDNA_change %>%
          stringr::str_extract("[0-9]+") %>%
          as.integer]
    dt[, cDNA_locl := cDNA_change %>%
          stringr::str_extract("(?<=_)[0-9]+") %>%
          as.integer]

However, these cases

  • c.(4071+1_4072-1)_(5154+1_5155-1)del
  • c.(?_-245)_(31+1_32-1)del
  • c.(?_-1)_(*1_?)del

will be filtered out, but they should be extremely rare (would SnpEff annotate as structural variants?), and it is not clear what, if any, protein would be generated.

test for this

Implement mhcnuggets

Implement prediction algorithms MHCnuggets-GRU and MHCnuggets-LSTM from mhcnuggets. Paper.

  • obtain and provision models
  • split allele lists by gru and lstm and keep only those that have existing models
  • adapt garnish.predictions to tolerate gru output, lstm output or both
  • add gru and ltsm models
  • write command generaion
  • parse results
  • add to merge
  • include in mean for consensus score summary

Mouse MHCII results not returning

No murine MHCII results coming out of garnish_predictions, throws warnings if MHCII is listed before MHCI alleles in space sep character string of MHC column.

Accept xlsx input in garnish predictions

It would be helpful to accept xlsx input in garnish_predictions so users do not need to construct a data table.

  • Provide example xlsx files
  • Allow garnish_predictions to take a path name from getwd() using rio::import and coerce to a data table. This would allow many input formats. Check class and proceed.

@davidballi @leeprichman thoughts?

Travis

Fix travis-ci integration to test on bioc-release and bioc-devel

manually check output

  • total number of peptides tested is ~= expected
  • peptides in DAI calculation differ by a single amino acid
  • pep_wt and pep_mut are correctly registered for the given transcript
  • nmers are contained in peptides
  • mutant_loc matches SnpEff protein change call
  • mutated amino acid in nmer matches SnpEff protein change call

ensemble consensus score reporting

Hi Andrew,

Very cool project. Some thoughts I had looking through your code.

  1. Do we know if the distribution of affinity scores between netmhc* tools and mhcflurry are similar enough that taking the mean of scores is appropriate? I have seen a general trend of mhcflurry reporting lower absolute affinity for very weak binder peptide compared to other tools (e.g. ic50 of 24000 vs 32000 netmhcpan). If the distributions are different, perhaps taking the mean of percentage rank for a given peptide cohort may be better?

  2. reporting 95% CI intervals for ensemble score. For example, predicted affinity of SIINFEKL to HLA-A0201 is quite diverse between the tools:

  • mchflurry 0.2.0: 10672.3
  • netmhcpan3.0: 18233.3
  • netmhc3.4: 12576.0

I know mhcflurry v1.0 is reporting upper and lower estimates for the ic50 values.

Write examples

  • garnish_predictions non-VCF entry
  • garnish_predictions output column interpretation
  • garnish_summary output column interpretation
  • document column output 982af16

Question about interpreting results

When run the "Predict neoepitopes" example, I get this:

Meta line 56 read in.
All meta lines processed.
Character matrix gt created.
Character matrix gt rows: 3
Character matrix gt cols: 11
skip: 0
nrows: 3
row_num: 0

Processed variant: 3
All variants processed
sample_id priority_neos classic_neos classic_top_score alt_neos
1: normal_tumor.bam 0 0 0.01066477 5
alt_top_score mhc_binders variants transcripts predictions nmers
1: 35.80603 13 3 3 276 276

I've seen the documentation for garnish_summary, but I'm still not sure how to interpret these results.

Write analysis vignette

Write analysis vignette starting with

  1. a Microsoft Excel table of peptides
  2. a VCF file
  3. JAFFA output

including a sample-level summary.

Gene fusions and arbitrary cDNA entry

To enable:

  1. Neoepitope predictions from gene fusions
    • JAFFA: de novo assembly of raw RNA reads aligned to a reference transcriptome -> cDNA with identified breakpoint
  2. Other manual sequence entry without a transcript ID

  • accept JAFFA cDNA, breakpoint output to predict over novel sequence portion
  • add entry to garnish_neoepitopes with coding (cDNA), MHC only to predict over whole entered cDNA

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.