thunlp-mt / mean Goto Github PK
View Code? Open in Web Editor NEWThis repo contains the codes for our paper Conditional Antibody Design as 3D Equivariant Graph Translation.
Home Page: https://arxiv.org/abs/2208.06073
License: MIT License
This repo contains the codes for our paper Conditional Antibody Design as 3D Equivariant Graph Translation.
Home Page: https://arxiv.org/abs/2208.06073
License: MIT License
It seems that these file mentioned in title are not used in training.
Could you please teach me that where are the usages?
Hello,
First - thanks for making this nice repository available for use.
I was looking over your network architecture, and noticed that the initialisation procedure for the coordinates and sequence in the masked positions in your MCATTModel.init_mask() function appears to be deterministic, see here.
Does this mean that you produce the same final output coordinates and sequence logits every time you generate from a reference antibody-antigen complex, or is randomness entering somewhere else that I can't see?
Thanks!
Hi. Thank you soo much for sharing the code for reproducing the results in your paper. Usually, people want to run the inference on a given pdb file. So it would be much more convenient to provide a minimal code where given input pdb (including Heavy, light, antigen chain), the script outputs the predicted structure). Current code is a little bit of over-complicated to me...
Hi
I'm running the MINE tutorial and I'm having an issue with the final command
GPU=0 bash scripts/k_fold_eval.sh summaries 111 mean 0
I assume I've failed to generate data at the previous stage but could you suggest where I'm going wrong?
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:04<00:00, 2.42it/s]
0%| | 0/299 [00:00<?, ?it/s]/bin/sh: 1: /home/matthewdavies/MEAN/evaluation/TMscore: not found
0%| | 0/299 [00:01<?, ?it/s]
/bin/sh: 1: /home/matthewdavies/MEAN/evaluation/TMscore: not found
/bin/sh: 1: /home/matthewdavies/MEAN/evaluation/TMscore: not found
/bin/sh: 1: /home/matthewdavies/MEAN/evaluation/TMscore: not found
/bin/sh: 1: /home/matthewdavies/MEAN/evaluation/TMscore: not found
/bin/sh: 1: /home/matthewdavies/MEAN/evaluation/TMscore: not found
/bin/sh: 1: /home/matthewdavies/MEAN/evaluation/TMscore: not found
/bin/sh: 1: /home/matthewdavies/MEAN/evaluation/TMscore: not found
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/concurrent/futures/process.py", line 239, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/opt/conda/lib/python3.7/concurrent/futures/process.py", line 198, in _process_chunk
return [fn(*args) for args in chunk]
File "/opt/conda/lib/python3.7/concurrent/futures/process.py", line 198, in <listcomp>
return [fn(*args) for args in chunk]
File "generate.py", line 74, in eval_one
summary['TMscore'] = tm_score(cplx.get_heavy_chain(), new_cplx.get_heavy_chain())
File "/home/matthewdavies/MEAN/evaluation/tm_score.py", line 31, in tm_score
score = float(res.group(1))
AttributeError: 'NoneType' object has no attribute 'group'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "generate.py", line 257, in <module>
main(args)
File "generate.py", line 214, in main
average_test(args, model, test_set, test_loader, device)
File "generate.py", line 174, in average_test
summaries = process_map(partial(eval_one, out_dir=out_dir, cdr=cdr_type), inputs, max_workers=args.num_workers, chunksize=10)
File "/opt/conda/lib/python3.7/site-packages/tqdm/contrib/concurrent.py", line 130, in process_map
return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
File "/opt/conda/lib/python3.7/site-packages/tqdm/contrib/concurrent.py", line 76, in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, **map_args), **kwargs))
File "/opt/conda/lib/python3.7/site-packages/tqdm/std.py", line 1195, in __iter__
for obj in iterable:
File "/opt/conda/lib/python3.7/concurrent/futures/process.py", line 483, in _chain_from_iterable_of_lists
for element in iterable:
File "/opt/conda/lib/python3.7/concurrent/futures/_base.py", line 598, in result_iterator
yield fs.pop().result()
File "/opt/conda/lib/python3.7/concurrent/futures/_base.py", line 435, in result
return self.__get_result()
File "/opt/conda/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
AttributeError: 'NoneType' object has no attribute 'group'
Hi!
When i run this command, "bash scripts/prepare_data_kfold.sh summaries/sabdab_summary_all.tsv all_structures/imgt", here is the error:
####################################################################
Locate project at /root/autodl-tmp/MEAN
Summary file at summaries/sabdab_summary_all.tsv. PDB folder at all_structures/imgt. Data working directory at summaries
download sabdab from summary file summaries/sabdab_summary_all.tsv
using local PDB files: all_structures/imgt
PDB file already renumbered with scheme imgt
downloading raw files
16%|███████████████████████████▊ | 1266/7689 [00:05<00:34, 185.90it/s]scripts/prepare_data_kfold.sh: line 31: 839 Killed python -m data.download --summary ${SUMMARY} --pdb_dir ${PDB_DIR} --fout ${ALL} --type sabdab --numbering imgt --pre_numbered --n_cpu 4
Processing cdrh1
Traceback (most recent call last):
File "data/split.py", line 232, in
main(parse())
File "data/split.py", line 72, in main
items = load_file(args.data)
File "data/split.py", line 37, in load_file
with open(fpath, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'summaries/sabdab_all.json'
2023-09-14 15:13:08::WARN::Faild to load file summaries/cdrh1/fold_0/test_processed/_metainfo, error: [Errno 2] No such file or directory: 'summaries/cdrh1/fold_0/test_processed/_metainfo'
Traceback (most recent call last):
File "data/dataset.py", line 303, in
dataset = EquiAACDataset(args.dataset, args.save_dir, num_entry_per_file=-1)
File "data/dataset.py", line 63, in init
self.preprocess(file_path, save_dir, num_entry_per_file)
File "data/dataset.py", line 135, in preprocess
with open(file_path, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'summaries/cdrh1/fold_0/test.json'
2023-09-14 15:13:10::WARN::Faild to load file summaries/cdrh1/fold_0/valid_processed/_metainfo, error: [Errno 2] No such file or directory: 'summaries/cdrh1/fold_0/valid_processed/_metainfo'
Traceback (most recent call last):
File "data/dataset.py", line 303, in
dataset = EquiAACDataset(args.dataset, args.save_dir, num_entry_per_file=-1)
File "data/dataset.py", line 63, in init
self.preprocess(file_path, save_dir, num_entry_per_file)
File "data/dataset.py", line 135, in preprocess
with open(file_path, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'summaries/cdrh1/fold_0/valid.json'
2023-09-14 15:13:12::WARN::Faild to load file summaries/cdrh1/fold_0/train_processed/_metainfo, error: [Errno 2] No such file or directory: 'summaries/cdrh1/fold_0/train_processed/_metainfo'
Traceback (most recent call last):
File "data/dataset.py", line 303, in
dataset = EquiAACDataset(args.dataset, args.save_dir, num_entry_per_file=-1)
File "data/dataset.py", line 63, in init
self.preprocess(file_path, save_dir, num_entry_per_file)
File "data/dataset.py", line 135, in preprocess
with open(file_path, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'summaries/cdrh1/fold_0/train.json'
^CTraceback (most recent call last):
File "data/dataset.py", line 11, in
import numpy as np
File "/root/miniconda3/envs/mean/lib/python3.8/site-packages/numpy/init.py", line 141, in
from . import core
File "/root/miniconda3/envs/mean/lib/python3.8/site-packages/numpy/core/init.py", line 105, in
from . import _internal
File "/root/miniconda3/envs/mean/lib/python3.8/site-packages/numpy/core/_internal.py", line 7, in
import ast
File "/root/miniconda3/envs/mean/lib/python3.8/ast.py", line 27, in
from _ast import *
KeyboardInterrupt
######################################################################################
Then i check the code, and find that the download method in download.py is maybe wrong, in where the program show that it opens "out_path" with creating it before open, details below:
def download(items, out_path, ncpu=8, pdb_dir=None, numbering='imgt', pre_numbered=False):
if pdb_dir is None:
map_func = download_one_item
else:
map_func = partial(download_one_item_local, pdb_dir)
print('downloading raw files')
valid_entries = thread_map(map_func, items, max_workers=ncpu)
valid_entries = [item for item in valid_entries if item is not None]
print(f'number of downloaded entries: {len(valid_entries)}')
pdb_out_dir = os.path.join(os.path.split(out_path)[0], 'pdb')
if os.path.exists(pdb_out_dir):
print(f'WARNING: pdb file out directory {pdb_out_dir} exists!')
else:
os.makedirs(pdb_out_dir)
print(f'writing PDB files to {pdb_out_dir}')
for item in tqdm(valid_entries):
pdb_fout = os.path.join(pdb_out_dir, item['pdb'] + '.pdb')
with open(pdb_fout, 'w') as pfout:
pfout.write(item['pdb_data'])
item.pop('pdb_data')
item['pdb_data_path'] = os.path.abspath(pdb_fout)
item['numbering'] = numbering
if 'pre_numbered' not in item:
item['pre_numbered'] = pre_numbered
print('post processing')
valid_entries = process_map(post_process, valid_entries, max_workers=ncpu, chunksize=1)
valid_entries = [item for item in valid_entries if item is not None]
print(f'number of valid entries: {len(valid_entries)}')
# 这里似乎错了,out_path没有创建,但是下面代码直接open了
fout = open(out_path, 'w')
for item in valid_entries:
item_str = json.dumps(item)
fout.write(f'{item_str}\n')
fout.close()
return valid_entries
我想请问一下,Amino Acid Recovery (AAR),这个指标是如何计算得到的啊,我没找到相应的代码。非常希望能得到您的回答!
Once all the .pdb files downloaded and then the post process is run it says "{file path} could not be parsed: Muscle could not be found in the path". Can you please help resolve this issue.
Hello, I feel a bit confused about y axis labeled density in Figure 5(B), What' s the meaning of density?
Thanks for ur good work.
You only predict four coordinates in a residue, so how do you get other coordinates in a residue?
Hi,
I have a question regarding coordinate initialization for the antibody CDR region. In the manuscript, it is written as,
we initialized its input feature with a mask vector and the coordinates according to the even distribution between the residue right before CDRs and the one right after CDRs
Is it similar to the linspace
function, or is its kind sampled from a uniform distribution? I would be grateful if you can attach a link to the code in repo, I wasn't able to find it.
Regards,
Yogesh
Hi. Your paper reports that The total numbers of clusters for CDR-H1, CDR-H2, and CDR-H3 are 765, 1093, and 1659, respectively. Then we split all clusters into training, validation, and test sets with a ratio of 8:1:1. Are you splitting the train/validation set on each cdr type? Or do you split all the clusters, i.e., 765 + 1093 + 1659?
Hi,
I am trying to download the RabD dataset via the summary file and it created an JSON file. However, the entries of the JSON file are:
{'pdb': '5nuz', 'heavy_chain': 'A', 'light_chain': 'B', 'antigen_chains': ['C'], 'pdb_data_path': '5nuz.pdb', 'numbering': 'imgt', 'pre_numbered': False, 'heavy_chain_seq': 'EVQLQQSGTVLARPGASVKMSCKASGYTFTSYWMHWIKQRPGQGLEWIGAIYPGDSDTKYNQKFKGKAKLTAVTSTSTAYMELSSLTNEDSAVYYCTRRNTLTGDYFDYWGQGTTLTVSS', 'light_chain_seq': 'DIVLTQSPASLAVSLGQRATISCRASESVDDYGISFMNWFQQKPGQPPKLLIYTASSQGSGVPARFSGSGSGTDFSLNIHPMEEDDTAMYFCQQSKEVPYTFGGGTKLEIK', 'antigen_seqs': ['LPLLCTLNKSHLYIKGGNASFQISFDDIAVLLPQYDVIIQHPADMSWCSKSDDQIWLSQWFMNAVGHDWHLDPPFLCRNRTKTEGFIFQVNTSKTGVNENYAKKFKTGMHHLYREYPDSCLNGKLCLMKAQPTSWPLQCPLD'], 'cdrh1_pos': (25, 32), 'cdrh1_seq': 'GYTFTSYW', 'cdrh2_pos': (50, 57), 'cdrh2_seq': 'IYPGDSDT', 'cdrh3_pos': (96, 108), 'cdrh3_seq': 'TRRNTLTGDYFDY', 'cdrl1_pos': (26, 35), 'cdrl1_seq': 'ESVDDYGISF', 'cdrl2_pos': (53, 55), 'cdrl2_seq': 'TAS', 'cdrl3_pos': (92, 100), 'cdrl3_seq': 'QQSKEVPYT'},
However, there are not any antigen spatial coordinates here. I think you have used the antigen coordinates for the experiment in your paper. Can you let me know if is this correct and how one can get antigen coordinates in this case?
Thanks!
Hi. I find the way you calculate radial is different from other similar works, e.g., EGNN.
Your strategy. the radial is the dot product of the coord differences.
def coord2radial(edge_index, coord):
row, col = edge_index
coord_diff = coord[row] - coord[col] # [n_edge, n_channel, d]
radial = torch.bmm(coord_diff, coord_diff.transpose(-1, -2)) # [n_edge, n_channel, n_channel]
# normalize radial
radial = F.normalize(radial, dim=0) # [n_edge, n_channel, n_channel]
return radial, coord_diff
EGNN's strategy. the radial is the squared distance between two nodes.
def coord2radial(self, edge_index, coord):
row, col = edge_index
coord_diff = coord[row] - coord[col]
radial = torch.sum(coord_diff**2, 1).unsqueeze(1)
if self.normalize:
norm = torch.sqrt(radial).detach() + self.epsilon
coord_diff = coord_diff / norm
return radial, coord_diff
I think your radial can represent the orientation of two multi-channel residues, and egnn's radial represents the distance. Is this reasonable? What do you think it represents? What's your motivation for defining it this way instead of following egnn?
The way you normalize the radial is quite interesting, you normalize it along the n_edge dimension (similar to "batch dimension").
Why? Have you tried removing normalization?
Best,
Zhangzhi
大佬请问当我是使用pdb数据库中的1fl5.pdb进行inference 推理时首先进行imgt的数据处理,但是提示1fl5.pdb could not be parsed: Muscle could not be found in the path?就没办法解析这个pdb,但是这个确实是个抗体,请问有好的解决方案么?谢谢
Hi, will you open-source the model checkpoints so that we can benchmark the model in our test set?
Hello. I'm trying to generate the data splits for training the version MEAN used in the rabd test set evaluation.
I have installed the conda envrionment using the setup file. I have run up to line 31 in prepare_data_kfold.sh
to generate the summaries/sabdab_all.json file. The pdbs have been moved to summaries/pdb
and are IMGT formatted (downloaded from sabdab)
I'm running the following line:
bash scripts/prepare_data_rabd.sh summaries/rabd_summary.jsonl summaries/pdb/ summaries/sabdab_all.json
And getting the following nested error message. Are there any dependencies that I'm missing that could be causing an earlier error?
It looks like there is an intermediate summaries/rabd_all.json
file. Is this supposed to be generated by the script or separately?
Traceback (most recent call last):
File "data/download.py", line 13, in
from .pdb_utils import Protein, AAComplex
ImportError: attempted relative import with no known parent package
Processing cdrh3
Valid entries after filtering with 111: 3127
Traceback (most recent call last):
File "data/split.py", line 232, in
main(parse())
File "data/split.py", line 78, in main
rabd = load_file(args.rabd)
File "data/split.py", line 37, in load_file
with open(fpath, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'summaries/rabd_all.json'
2022-12-22 15:21:00::WARN::Faild to load file summaries/cdrh3/test_processed/_metainfo, error: [Errno 2] No such file or directory: 'summaries/cdrh3/test_processed/_metainfo'
Traceback (most recent call last):
File "data/dataset.py", line 303, in
dataset = EquiAACDataset(args.dataset, args.save_dir, num_entry_per_file=-1)
File "data/dataset.py", line 63, in init
self.preprocess(file_path, save_dir, num_entry_per_file)
File "data/dataset.py", line 135, in preprocess
with open(file_path, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'summaries/cdrh3/test.json'
2022-12-22 15:21:02::WARN::Faild to load file summaries/cdrh3/valid_processed/_metainfo, error: [Errno 2] No such file or directory: 'summaries/cdrh3/valid_processed/_metainfo'
Traceback (most recent call last):
File "data/dataset.py", line 303, in
dataset = EquiAACDataset(args.dataset, args.save_dir, num_entry_per_file=-1)
File "data/dataset.py", line 63, in init
self.preprocess(file_path, save_dir, num_entry_per_file)
File "data/dataset.py", line 135, in preprocess
with open(file_path, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'summaries/cdrh3/valid.json'
2022-12-22 15:21:04::WARN::Faild to load file summaries/cdrh3/train_processed/_metainfo, error: [Errno 2] No such file or directory: 'summaries/cdrh3/train_processed/_metainfo'
Traceback (most recent call last):
File "data/dataset.py", line 303, in
dataset = EquiAACDataset(args.dataset, args.save_dir, num_entry_per_file=-1)
File "data/dataset.py", line 63, in init
self.preprocess(file_path, save_dir, num_entry_per_file)
File "data/dataset.py", line 135, in preprocess
with open(file_path, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'summaries/cdrh3/train.json'
您好,在follow Wengong Jin的setting的baseline中,我发现论文中RMSD的结果和RefineGNN论文汇报的结果并不一致,请问是由于在同样的数据集分割和训练设置下复现后结果出现了误差吗?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.