Giter Site home page Giter Site logo

antibody_benchmark's Introduction

Antibody Docking and Affinity Benchmark

This repository contains the antibody-antigen test cases, with unbound and bound structures, for Docking Benchmark 5.5. This dataset is a major update of Docking Benchmark 5.0, which was released in 2015 (Vreven et al. "Updates to the Integrated Protein-Protein Interaction Benchmarks: Docking Benchmark Version 5 and Affinity Benchmark Version 2" J Mol Biol 427(19):3031-41). Users can download this set to test and benchmark their predictive algorithms.

This update contains antibody-antigen structures from 67 test cases, more than doubling the amount in the previous benchmark. The citation for this benchmark is: Guest JD, Vreven T, Zhou J, Moal I, Jeliazkov JR, Gray JJ, Weng Z, Pierce BG. "An Expanded Benchmark for Antibody-Antigen Docking and Affinity Prediction Reveals Insights into Antibody Recognition Determinants", Under Review.

Nomenclature

Each test case is represented by four pdb files, the nomenclature for which is as follows:

'complex-pdb-code_r_u.pdb' - Unbound antibody structure

'complex-pdb-code_l_u.pdb' - Unbound antigen structure

'complex-pdb-code_r_b.pdb' - Bound antibody structure

'complex-pdb-code_l_b.pdb' - Bound antigen structure

Bound structures originate from the same PDB as the case name, yet unbound structures were taken from separate PDBs that correspond to the bound complex. For instance, 1AHW_r_b.pdb and 1AHW_l_b.pdb are from the antibody-antigen complex in 1AHW, but 1AHW_r_u.pdb is a structure from 1FGN and 1AHW_l_u.pdb is a structure from 1TFH. Bound-unbound pairs come pre-aligned for easy visualization of conformational changes. If testing a docking algorithm that is sensitive to initial positioning of unbound structures, users may randomize unbound structure positions to avoid possible bias in docking results.

Cases

Information on the cases, including docking difficulties, conformational changes, and binding affinities, is provided in these tab-delimited tables: antibody_antigen_cases.txt and antibody_antigen_affinities.txt.

Additional information for a columns in antibody_antigen_cases.txt is below:

Complex PDB/Antibody PDB/Antigen PDB: PDB code is followed by IDs for antibody and antigen chains. For complexes, antibody chains are listed first and separated from antigen chains by a colon. Antibody: Trade names for therapeutic antibodies in test cases are shown in parentheses. I-RMSD: Interface RMSD, which helped to assign docking difficulty level, was calculated by superposition of unbound antibody and antigen structures onto the bound complex structure using root-mean-square fit of interface residues.
ΔASA: Measured change in accessible surface area upon complex formation.

Scripts

The "scripts" directory contains code and input files that were used in our pipeline to identify new cases. The README in that directory contains more details and information.

Preliminary cases

The "preliminary_cases" directory contains files where each line contains triplets (bound complex PDB code, and PDB codes of unbound counterparts) representing potential cases that were identified through automated searches of the PDB. These still require manual curation and inspection, but are made available for informational purposes.

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0

antibody_benchmark's People

Contributors

bpierce12 avatar jdguest avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

antibody_benchmark's Issues

Potential Mismatches between bound and unbound pdbs

I'm getting very large RMSD's between unbound/bound receptor chains for a few targets.

Upon further inspection, it almost appears that the bound and unbound chains have

I'm not sure, but e.g. 6B0S is listed as an easy target, yet the receptor bound/unbound RMSD is >10A.
6B0S Mismatch

This is not the only target exhibiting this issue, some others are 5CX7 and 3JM9 (plus a few more). Is this normal?

Questions about antibody structure data format

Hi! I have a few questions regarding the antibody structures in the dataset:

  1. As far as I understand, the bound structures are obtained from SAbDab, then the corresponding unbound structures are searched from the PDB. Are the antibody structures in either the bound or unbound conformations already renumbered by one of the numbering systems (e.g. SAbDab has Chothia numbered structures)?
  2. In the summary file antibody_antigen_cases.txt, for the "Complex PDB" columns, when there are two antibody chains, are these consistently listed as heavy chain id followed by light chain id?

Thank you for curating such a valuable dataset!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.