Comments (20)
In the mean time, it would be great if you could upload the scripts to generate the training features. Unfortunately, AFAICT they are missing. I'm especially interested in training the multimer variant. Thanks!
from uni-fold.
The multimer features mostly are the same as monomer ones, except the assembly of multiple chains.
You can refer this script https://github.com/dptech-corp/Uni-Fold/blob/main/scripts/get_pdb_assembly.py to generate the "pdb_assembly.json" we used.
from uni-fold.
The dataset is very large, and we are looking for a solution for data hosting. Last week we submitted the request to "AWS Open Data Sponsorship Application", but didn't receive any response yet.
from uni-fold.
I am trying to download the "Full training dataset" using modelscope but the MsDataset.load() doesn't work for me because the connection gets broken by peer. The latest message I get is:
File "/home/dmolodenskiy/.conda/envs/py38/lib/python3.8/site-packages/requests/models.py", line 818, in generate raise ChunkedEncodingError(e) requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))
from uni-fold.
@DimaMolod did it happen at the beginning, or already in-progress?
from uni-fold.
hi @guolinke
it happens after 10-20 minutes of hanging. Seems like it is trying to connect during this time and finally the error message pops up, after the connection time is out.
The modelscope directory has been created with the following structure:
modelscope/
hub/
datasets/
downloads/
DPTech/
Uni-Fold-Data/
master/
Uni-Fold-Data.json
dataset_infos.json
thanks for you help!
(I'll also copy the last few messages from python here just in case you find it useful)
>>> ds = MsDataset.load(dataset_name='Uni-Fold-Data', namespace='DPTech', split='train')
2022-11-18 09:49:40,975 - modelscope - WARNING - Reusing dataset Uni-Fold-Data's python file (modelscope/hub/datasets/downloads/DPTech/Uni-Fold-Data/master/Uni-Fold-Data.json)
2022-11-18 09:49:41,498 - modelscope - WARNING - Reusing dataset Uni-Fold-Data's python file (modelscope/hub/datasets/downloads/DPTech/Uni-Fold-Data/master/dataset_infos.json)
2022-11-18 09:49:41,499 - modelscope - INFO - No subset_name specified, defaulting to the default
from uni-fold.
I have the same issue. After re-trying I get now:
RequestError: {'status': -2, 'x-oss-request-id': '', 'details': "RequestError: HTTPSConnectionPool(host='dataset-hub.oss-cn-hangzhou.aliyuncs.com', port=443): Max retries exceeded with url: /public-unzip-dataset%2FDPTech%2FUni-Fold-Data%2Fmaster%2Fdatasets%2Fpdb_features%2F1e0z_A.feature.pkl.gz (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x2b2f5d9bdd50>: Failed to establish a new connection: [Errno -2] Name or service not known'))"}
Does this include the training data for multimer?
from uni-fold.
We will report the issues to the modelscope. And yes, the multimer data is included.
from uni-fold.
The problem is due to the unstable network, as the data is hosted in China. The modelscope team promised they would fix it in the next 2 weeks.
from uni-fold.
Thanks! Maybe meanwhile you could provide a script to generate the training dataset directory from scratch (from the downloaded databases)? I couldn't find it in the scripts directory.
from uni-fold.
@DimaMolod The data generation code is almost the same as the one used in inference, except for the label extraction from mmcif. @ZiyaoLi maybe we can add a script for the mmcif processing.
BTW, our data generation code highly relies on the cloud services (mostly Ali-cloud), because it is impossible to generate the data by a single machine. In particular, it takes us several months by hundreds of machines to generate these data. Therefore, we think it is less realistic to generate these data from scratch.
from uni-fold.
Any news?
from uni-fold.
@lhatsk we are waiting for the fix from modelscope team. will post the updates here.
from uni-fold.
i fix the bug, please refer to this link modelscope/modelscope#51
@guolinke @lhatsk
from uni-fold.
Thanks! Unfortunately, it still doesn't work for me.
RequestError: {'status': -2, 'x-oss-request-id': '', 'details': "RequestError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))"}
from uni-fold.
@lhatsk @guolinke i succuss yesterday, but failed today. it seems like the sever is unstable. is it possible to download the dataset from baidu drive? the issue exists too long.
from uni-fold.
@DimaMolod The data generation code is almost the same as the one used in inference, except for the label extraction from mmcif. @ZiyaoLi maybe we can add a script for the mmcif processing.
BTW, our data generation code highly relies on the cloud services (mostly Ali-cloud), because it is impossible to generate the data by a single machine. In particular, it takes us several months by hundreds of machines to generate these data. Therefore, we think it is less realistic to generate these data from scratch.
Thank you, it would be very useful if you could upload a script for the label extraction from mmcif files.
from uni-fold.
@DimaMolod The data generation code is almost the same as the one used in inference, except for the label extraction from mmcif. @ZiyaoLi maybe we can add a script for the mmcif processing.
BTW, our data generation code highly relies on the cloud services (mostly Ali-cloud), because it is impossible to generate the data by a single machine. In particular, it takes us several months by hundreds of machines to generate these data. Therefore, we think it is less realistic to generate these data from scratch.Thank you, it would be very useful if you could upload a script for the label extraction from mmcif files.
@teslacool can you merge the code into this repo?
from uni-fold.
Hi,
I managed to resolve the 104 error shown above but then this ReadTimeoutError was reported. Could you maybe increase your default timeout from 60s to something longer?
Thanks a lot.
HTTPConnectionPool(host='www.modelscope.cn', port=80): Max retries exceeded with url: /api/v1/datasets/DPTech/Uni-Fold-Data/oss/tree/?MaxLimit=-1&Revision=master&Recursive=True&FilterDir=True (Caused by ReadTimeoutError("HTTPConnectionPool(host='www.modelscope.cn', port=80): Read timed out. (read timeout=60)"))
I am trying to download the "Full training dataset" using modelscope but the MsDataset.load() doesn't work for me because the connection gets broken by peer. The latest message I get is:
File "/home/dmolodenskiy/.conda/envs/py38/lib/python3.8/site-packages/requests/models.py", line 818, in generate raise ChunkedEncodingError(e) requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))
from uni-fold.
@henrywotton you can report the issue to https://github.com/modelscope/modelscope
from uni-fold.
Related Issues (20)
- Is total_step fixed? HOT 2
- import_jax_weights_ failed on AlphaFold-Multimer 2.3.0 HOT 3
- parameters are missing in the pretrained weights HOT 4
- Multi node training HOT 3
- Could not find path to the "hhblits" binary
- Run Uni-Fold with Bohrium Apps
- FileNotFoundError: No such file or directory: '/C.feature.pkl.gz' HOT 1
- questions on installing on Ubuntu Linux 22.04 HOT 1
- recreating homo_search.py output -- minimal version HOT 3
- competition multimer analysis -- does chain order matter? HOT 7
- model name for all alphafold parameters HOT 1
- multi-gpu inference
- convert_unifold_to_alphafold.py?
- UniFold crash: unable to find SCOPdata (a bug that has popped up in ColabFold, & there is a straightforward reason and patch) HOT 2
- Training with linkers
- Symmetry code doesnโt work HOT 4
- Unifold-Musse training and finetuning scripts as well as the ability to use PDB templates
- Unifold on custom a3m MSA files
- parameters setting
- Any plans to implement AF3?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from uni-fold.