Giter Site home page Giter Site logo

ieconv_proteins's People

Contributors

luwei0917 avatar phermosilla avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ieconv_proteins's Issues

Comparison with previous work implementation

Could you please provide me with the implementation for the downstream head (Enzyme task) used for the Bepler&Berger model? I have been trying to replicate the results, but I fail to reach the same performance stated in the paper.

The Enzyme dataset contains a mix between train and test splits.

I have looked at the dateset in more detail and found many examples of protein sequences that are the same. In the paper it is mentioned that the similarity is less than 100%. I am assuming it is less than equal to 100%.

On a more serious note, there are examples of mixed samples between test-train and train-val.

Protein chains that are the same across splits ({id in pdb}|{original id}|{split}):


['5lf0_W|5lf0_W|train', '5m32_I|5m32_I|train', '5le5_I|5le5_I|train', '5lf1_I|5lf1_I|train', '5lf3_I|5lf3_I|train', '5gjq_q|5gjq_q|valid']

5lf0: Human 20S proteasome complex with Epoxomicin at 2.4 Angstrom
5m32: Human 26S proteasome in complex with Oprozomib
5le5: Native human 20S proteasome at 1.8 Angstrom
5lf1: Human 20S proteasome complex with Dihydroeponemycin at 2.0 Angstrom
5lf3: Human 20S proteasome complex with Bortezomib at 2.1 Angstrom
5gjq: Structure of the human 26S proteasome bound to USP14-UbAl

All the chains used from these complexes all have the same sequence pointing to:
Proteasome subunit beta type-3
UniProtKB accession: P49720

['3von_E|3von_E|train', '3von_b|3von_b|test', '3von_p|3von_p|test', '3von_i|3von_i|test']
3von: Crystalstructure of the ubiquitin protease

All the chains used from this complex all have the same sequence pointing to::
Ubiquitin-conjugating enzyme E2 N
UniProtKB accession: P61088


['3mg8_I|3mg8_I|train', '4qlq_W|4qlq_W|train', '6huv_I|6huv_I|train', '5fga_W|5fga_W|train', '4qby_W|4qby_W|train', '5mpa_j|5mpa_j|test', '5mp9_j|5mp9_j|test']

3mg8:Structure of yeast 20S open-gate proteasome with Compound 16
4qlq: yCP in complex with tripeptidic epoxyketone inhibitor 8
6huv: Yeast 20S proteasome with human beta2c (S171G) in complex with 39
5fga: Yeast 20S proteasome beta5-K33A mutant (propeptide expressed in trans)
4qby: yCP in complex with BOC-ALA-ALA-ALA-CHO
5mpa: 26S proteasome in presence of ATP (s2)
5mp9: 26S proteasome in presence of ATP (s1)

All the chains used from these complexes all have the same sequence pointing to::
Proteasome subunit beta type-3
UniProtKB accession: P25451


['4y84_X|4y84_X|train', '5l5e_X|5l5e_X|train', '6huu_J|6huu_J|train', '4qby_J|4qby_J|train', '4ya9_J|4ya9_J|train', '5mp9_k|5mp9_k|test', '5mpa_k|5mpa_k|test']

4y84: Yeast 20S proteasome in complex with N3-A(4,4-F2P)nLL-ep
5l5e: Yeast 20S proteasome with human beta5i (1-138) and human beta6 (97-111; 118-133) in complex with carfilzomib
6huu: Yeast 20S proteasome with human beta2c (S171G) in complex with 29
4qby: yCP in complex with BOC-ALA-ALA-ALA-CHO
4ya9: Yeast 20S proteasome beta2-H114D mutant in complex with Ac-LAD-ep
5mp9: 26S proteasome in presence of ATP (s1)
5mpa: 26S proteasome in presence of ATP (s2)

All the chains used from these complexes all have the same sequence pointing to::
Proteasome subunit beta type-4
UniProtKB accession: P22141


Train and test mix:
[('train', 190, '4y84_X'), ('train', 190, '5l5e_X'), ('train', 190, '6huu_J'), ('train', 190, '4qby_J'), ('train', 190, '4ya9_J'), ('test', 190, '5mp9_k'), ('test', 190, '5mpa_k')]


[('train', 155, '3von_E'), ('test', 155, '3von_b'), ('test', 155, '3von_p'), ('test', 155, '3von_i')]


[('train', 190, '6hed_4'), ('train', 190, '6hec_5'), ('train', 190, '6he8_4'), ('train', 190, '6he9_3'), ('train', 190, '6he7_6'), ('test', 190, '6he8_k'), ('test', 190, '6hed_h'), ('test', 190, '6hea_i'), ('test', 190, '6hea_h'), ('test', 190, '6he9_i')]


[('train', 190, '3mg8_I'), ('train', 190, '4qlq_W'), ('train', 190, '6huv_I'), ('train', 190, '5fga_W'), ('train', 190, '4qby_W'), ('test', 190, '5mpa_j'), ('test', 190, '5mp9_j')]


[('train', 190, '5lf1_b'), ('train', 190, '5lf1_B'), ('test', 190, '5gjq_j')]


[('train', 190, '1iru_R'), ('test', 190, '5gjq_k')]


Train and validation mix:
[('train', 190, '5lf0_W'), ('train', 190, '5m32_I'), ('train', 190, '5le5_I'), ('train', 190, '5lf1_I'), ('train', 190, '5lf3_I'), ('valid', 190, '5gjq_q')]


test and validation mix:
[]

PDB ids to be removed beacause of the mix:
['4y84_X', '5l5e_X', '6huu_J', '4qby_J', '4ya9_J', '5mp9_k', '5mpa_k', '3von_E', '3von_b', '3von_p', '3von_i', '6hed_4',
'6hec_5', '6he8_4', '6he9_3', '6he7_6', '6he8_k', '6hed_h', '6hea_i', '6hea_h', '6he9_i', '3mg8_I', '4qlq_W', '6huv_I',
'5fga_W', '4qby_W', '5mpa_j', '5mp9_j', '5lf1_b', '5lf1_B', '5gjq_j', '1iru_R', '5gjq_k', '5lf0_W', '5m32_I', '5le5_I',
'5lf1_I', '5lf3_I', '5gjq_q']


Length refers to the number of entries pointing to the same protein sequence

Total number of chains: 37428
Total number of unique chains 15640
length 1 5845
length 2 4308
length 3 1307
length 4 1895
length 5 2264
length 6 8
length 7 8
length 8 4
length 9 0
length 10 1
length 11 0
length 12 0


Number of same sequence pointing to different EC numbers: 1
[('train', 201, '6giq_e'), ('train', 152, '6giq_E'), ('train', 152, '6giq_P')]

On environment setup

Hello Pedro,

I am working on identifying enzyme functions (EC Numbers) and want to compare our approach with yours. However, the server, which our lab is using, does not support Docker (as well as other container techs).

Would it be possible for you to provide the environment setup file (Conda .yml or pip requirements) for running the code?

Thank you in advance~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.