Giter Site home page Giter Site logo

kohlbacherlab / epytope Goto Github PK

View Code? Open in Web Editor NEW
13.0 4.0 6.0 9.25 MB

Python-based framework for computational immunomics

Home Page: http://fred-2.github.io/

License: BSD 3-Clause "New" or "Revised" License

Python 94.93% Makefile 0.17% Batchfile 0.17% Jupyter Notebook 4.72%

epytope's Introduction

epytope - An Immunoinformatics Framework for Python

PyPi Tests Tests external Anaconda-Server Badge Anaconda-Server Badge License Anaconda-Server Badge Anaconda-Server Badge

Copyright 2014 by Benjamin Schuber, Mathias Walzer, Philipp Brachvogel, Andras Szolek, Christopher Mohr, and Oliver Kohlbacher

epytope is a framework for T-cell epitope detection, and vaccine design. It offers consistent, easy, and simultaneous access to well established prediction methods of computational immunology. epytope can handle polymorphic proteins and offers analysis tools to select, assemble, and design linker sequences for string-of-beads epitope-based vaccines. It is implemented in Python in a modular way and can easily be extended by user defined methods.

Copyright

epytope is released under the three clause BSD license.

Installation

use the following commands:

pip install git+https://github.com/KohlbacherLab/epytope

Dependencies

Python Packages

  • pandas
  • pyomo>=4.0
  • svmlight
  • PyMySQL
  • biopython
  • pyVCF
  • h5py<=2.10.0

Third-Party Software (not installed through pip)

Please pay attention to the different licensing of third party tools.

Framework summary

Currently epytope provides implementations of several prediction methods or interfaces to external prediction tools.

Getting Started

Users and developers should start by reading our wiki and IPython tutorials. A reference documentation is also available online.

How to Cite

Please cite

Schubert, B., Walzer, M., Brachvogel, H-P., Sozolek, A., Mohr, C., and Kohlbacher, O. (2016). FRED 2 - An Immunoinformatics Framework for Python. Bioinformatics 2016; doi: 10.1093/bioinformatics/btw113

and the original publications of the used methods.

epytope's People

Contributors

antschum avatar b-schubert avatar christopher-mohr avatar e-dorigatti avatar jonasscheid avatar lkuchenb avatar skrakau avatar zethson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

epytope's Issues

Support multiple epitope prediction scores per tool

Most tools report multiple scores for each prediction. Currently the framework interfaces are affinity-centric, which is actually discouraged by most tool developers.

The epitope prediction method interface should support multiple scores and all scores produced by each method should be extracted from their outputs.

Wrong mouse allele string in supported alleles of external tools

The current predictors in External.py that support the prediction of MouseAlleles contain the wrong internal allele representation, which has the consequence that they are neglected. The mouse allele strings in __alleles of each predictor interface should contain the MouseAllele.name string else they will be ignored here.

Therefore all mouse allele strings in __alleles in each interface needs to be adjusted

Add interface for mhcflurry v2.0.0

We should consider adding interfaces for more recent mhcflurry version (2.0.0 and later). Especially the percentile rank score feature in 2.0.0 would be important.

Remove support of old netmhc version dependent on Python 2.x

I recently recognized, that some NetMHC Family tools (e.g. NetMHC <= 3.4) run with python2 scripts. Since Epytope runs python3 the execution of this external software runs into py2 vs py3 syntax conflicts. Should we remove support of all py2 based predictors?

Rename Package

  • Rename Fred2 to epytope in setup.py
  • Replace all module imports in the library itself
  • Replace all module imports in docs / Notebooks

Protein generation from transcript fails when (coding) transcript sequence is unavailable

The method generate_proteins_from_transcripts fails with

...
in _translate_str
      "Codon '{0}' is invalid".format(codon)
  Bio.Data.CodonTable.TranslationError: Codon 'SEQ' is invalid

at

prot_seq = str(t.translate(table=table, stop_symbol=stop_symbol, to_stop=to_stop, cds=cds))

This occurs when the coding sequence is not available in the BioMart which then returns the value Sequence unavailable that is set as transcript sequence.

We should check for that in get_transcript_information (MartsAdaper) or in generate_transcripts_from_variants (Core/Generator).

NetMHCpan: False peptide min/max boundaries

The internal boundaries of netmhcpan regarding peptide lengths are 8 - 56. NetMHCpan 4.1 always picks the most relevant Core of the peptide and does prediction on it. That is currently prohibited by the supported length that is inherited by the netmhcpan 2.8 interface.

Regarding NetMHCIIpan there seems to be only a lower limit to 9mers, I can go beyond 100 amino acids, which clearly does not make sense. So we could also say that we restrict it to peptides having max length of 56

EpitopePrediction ANN Classes create new Allele objects on predict

The implementations in ANN.py in EpitopePrediction create new allele objects during prediction. Therefore the allele objects stored are not the same as the ones used as input and don't have the same information content (e.g., don't have the field prob set) which causes problems in downstream tasks.

epytope installation fails for python >3.7

Installation of epytope fails for python versions >3.7 (according to my local tests) because of an issue with the PyVCF package.

This is a known issue as it can be seen from various issue. Unfortunately, the package seems to be not maintained anymore. We should keep this in mind and look for an alternative.

Improve MartsAdapter and support different BioMart references

Currently only certain releases of Ensembl BioMart are supported by MartsAdapter since the database attribute names changed a multiple times over time.

We should check if the BioMart attributes are stable and if it's feasible to support multiple (and the most recent) versions of Ensembl BioMart (GRCh37 and 38).

Missing alleles in supportedAlleles set of syfpeithi

supportedAlleles of the syfpeithi predictor currently does not contain all alleles added in a previous PR (#42).
Although the prediction can be done with all present alleles, it would be beneficial to also have them in the list of supported alleles

Improve netmhc* error handling

The netmhc tools tend to return 0 even on failure. E.g. if the tool fails to create it's temporary files in the specified temp dir, it will print an error to stdout and return 0.

Naming

  • Pymmune
  • ImmuPy
  • Pydaptive
  • ePYtope
  • scikit-immuno
  • PyTope

Predicting on mouse alleles fails due to erroneous nomenclature conversion

Predicting 8mers on mouse alleles with syfpeithi, e.g. H2-Kb keeps resulting in

No model found for H-2-Kb with length 8

although the underlying matrix is available.

The issue here is, that self.convert_alleles converts the given H2-Kb to the internal epytope representation K_b. This representation is then used here to load the matrix K_b_8. However it is not found, because the matrix file is called H2_Kb_8.py.

For HLA alleles the conversion is working, so I would propose to change the files and strip H2.
We should also look into how the nomenclature is handled internally in external predictors (netmhc family)

Let's also increment the tests of the epitopeprediction with mouse alleles accordingly.

Current Class2 external tools don't predict CombinedAlleles

When I am trying to predict peptides with e.g. NetMHCIIpan 3.0 with a CombinedAllele like 'HLA-DPA1*01:03-HLA-DPB1*01:01' the output states, that the allele is not supported. However it is supported and you can observe it in the supported_alleles. When I change 'HLA-DPA1*01:03-HLA-DPB1*01:01' to 'HLA-DPA1*01:03-DPB1*01:01' in the supported_alleles it works. So I guess all the listed CombinedAlleles have an obsolete HLA- string in front of the beta chain

My input looked as follows in a Jupiter notebook:

peptides = [Peptide("SYFPEITHIFIASFS"),Peptide("FIASNGVKLSYFPEI")]
alleles = [Allele("HLA-DRB1*01:01"), Allele("HLA-DRB1*01:03"), CombinedAllele("HLA-DPA1*01:03-HLA-DPB1*01:01")]
predictor = EpitopePredictorFactory("netmhcIIpan", version="3.0")
results = predictor.predict(peptides, alleles=alleles, command='.path/to/netMHCIIpan-3.0/netMHCIIpan')

Failure to parse netMHCpan 4.0 output

Parsing predictions from netMHCpan 4.0 fails with the following trace:

  File "data_preparation.py", line 204, in get_binding_affinity_process
    res = predictor.predict(batch, alleles)
  File "/tmp/GeneralizedEvDesign/epytope/epytope/EpitopePrediction/External.py", line 191, in predict
    df_result = EpitopePredictionResult.from_dict(result, list(pep_seqs.values()), self.name)
  File "/tmp/GeneralizedEvDesign/epytope/epytope/Core/Result.py", line 160, in from_dict
    df[allele][method][metric][pep] = score
  File "/home/edo/miniconda3/envs/gnv/lib/python3.6/site-packages/pandas/core/series.py", line 1042, in __setitem__
    self._set_with(key, value)
  File "/home/edo/miniconda3/envs/gnv/lib/python3.6/site-packages/pandas/core/series.py", line 1098, in _set_with
    self._set_labels(key, value)
  File "/home/edo/miniconda3/envs/gnv/lib/python3.6/site-packages/pandas/core/series.py", line 1105, in _set_labels
    raise ValueError(f"{key[mask]} not contained in the index")
ValueError: ['L' 'L' 'N' 'G' 'S' 'L' 'A' 'E' 'E'] not contained in the index

Reproducing example (I am using the latest version on the main branch):

from epytope.EpitopePrediction.External import NetMHCpan_4_0
from epytope.Core import Peptide, Allele

predictor = NetMHCpan_4_0()
predictor.predict([Peptide('LLNGSLAEE'), Peptide('KDQQLLGIW')], [Allele('HLA-A*02:01'), Allele('HLA-B*40:01')])

The issue seems to be related to pandas indexing (and no response on StackOverflow for a similar problem) and is easy to fix:

diff --git a/epytope/Core/Result.py b/epytope/Core/Result.py
index eabb271..8bb5f32 100644
--- a/epytope/Core/Result.py
+++ b/epytope/Core/Result.py
@@ -154,9 +154,10 @@ class EpitopePredictionResult(AResult):
         # Fill DataFrame
         for allele, metrics in d.items():
             for metric, pep_scores in metrics.items():
+                df_slice = df[allele][method][metric]
                 for pep, score in pep_scores.items():
-                    df[allele][method][metric][pep] = score
+                    df_slice[df_slice.index == pep] = score
         return EpitopePredictionResult(df)

But maybe some other modules are affected as well?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.