Giter Site home page Giter Site logo

Training Dataset about uni-fold HOT 20 OPEN

dptech-corp avatar dptech-corp commented on September 26, 2024
Training Dataset

from uni-fold.

Comments (20)

lhatsk avatar lhatsk commented on September 26, 2024 1

In the mean time, it would be great if you could upload the scripts to generate the training features. Unfortunately, AFAICT they are missing. I'm especially interested in training the multimer variant. Thanks!

from uni-fold.

guolinke avatar guolinke commented on September 26, 2024 1

The multimer features mostly are the same as monomer ones, except the assembly of multiple chains.
You can refer this script https://github.com/dptech-corp/Uni-Fold/blob/main/scripts/get_pdb_assembly.py to generate the "pdb_assembly.json" we used.

from uni-fold.

guolinke avatar guolinke commented on September 26, 2024

The dataset is very large, and we are looking for a solution for data hosting. Last week we submitted the request to "AWS Open Data Sponsorship Application", but didn't receive any response yet.

from uni-fold.

DimaMolod avatar DimaMolod commented on September 26, 2024

I am trying to download the "Full training dataset" using modelscope but the MsDataset.load() doesn't work for me because the connection gets broken by peer. The latest message I get is:
File "/home/dmolodenskiy/.conda/envs/py38/lib/python3.8/site-packages/requests/models.py", line 818, in generate raise ChunkedEncodingError(e) requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))

from uni-fold.

guolinke avatar guolinke commented on September 26, 2024

@DimaMolod did it happen at the beginning, or already in-progress?

from uni-fold.

DimaMolod avatar DimaMolod commented on September 26, 2024

hi @guolinke
it happens after 10-20 minutes of hanging. Seems like it is trying to connect during this time and finally the error message pops up, after the connection time is out.
The modelscope directory has been created with the following structure:

 modelscope/
    hub/
            datasets/
                downloads/
                    DPTech/
                        Uni-Fold-Data/
                            master/
                                Uni-Fold-Data.json
                                dataset_infos.json

thanks for you help!

(I'll also copy the last few messages from python here just in case you find it useful)

>>> ds = MsDataset.load(dataset_name='Uni-Fold-Data', namespace='DPTech', split='train')
2022-11-18 09:49:40,975 - modelscope - WARNING - Reusing dataset Uni-Fold-Data's python file (modelscope/hub/datasets/downloads/DPTech/Uni-Fold-Data/master/Uni-Fold-Data.json)
2022-11-18 09:49:41,498 - modelscope - WARNING - Reusing dataset Uni-Fold-Data's python file (modelscope/hub/datasets/downloads/DPTech/Uni-Fold-Data/master/dataset_infos.json)
2022-11-18 09:49:41,499 - modelscope - INFO - No subset_name specified, defaulting to the default

from uni-fold.

lhatsk avatar lhatsk commented on September 26, 2024

I have the same issue. After re-trying I get now:

RequestError: {'status': -2, 'x-oss-request-id': '', 'details': "RequestError: HTTPSConnectionPool(host='dataset-hub.oss-cn-hangzhou.aliyuncs.com', port=443): Max retries exceeded with url: /public-unzip-dataset%2FDPTech%2FUni-Fold-Data%2Fmaster%2Fdatasets%2Fpdb_features%2F1e0z_A.feature.pkl.gz (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x2b2f5d9bdd50>: Failed to establish a new connection: [Errno -2] Name or service not known'))"}

Does this include the training data for multimer?

from uni-fold.

guolinke avatar guolinke commented on September 26, 2024

We will report the issues to the modelscope. And yes, the multimer data is included.

from uni-fold.

guolinke avatar guolinke commented on September 26, 2024

The problem is due to the unstable network, as the data is hosted in China. The modelscope team promised they would fix it in the next 2 weeks.

from uni-fold.

DimaMolod avatar DimaMolod commented on September 26, 2024

Thanks! Maybe meanwhile you could provide a script to generate the training dataset directory from scratch (from the downloaded databases)? I couldn't find it in the scripts directory.

from uni-fold.

guolinke avatar guolinke commented on September 26, 2024

@DimaMolod The data generation code is almost the same as the one used in inference, except for the label extraction from mmcif. @ZiyaoLi maybe we can add a script for the mmcif processing.

BTW, our data generation code highly relies on the cloud services (mostly Ali-cloud), because it is impossible to generate the data by a single machine. In particular, it takes us several months by hundreds of machines to generate these data. Therefore, we think it is less realistic to generate these data from scratch.

from uni-fold.

lhatsk avatar lhatsk commented on September 26, 2024

Any news?

from uni-fold.

guolinke avatar guolinke commented on September 26, 2024

@lhatsk we are waiting for the fix from modelscope team. will post the updates here.

from uni-fold.

WeianMao avatar WeianMao commented on September 26, 2024

i fix the bug, please refer to this link modelscope/modelscope#51
@guolinke @lhatsk

from uni-fold.

lhatsk avatar lhatsk commented on September 26, 2024

Thanks! Unfortunately, it still doesn't work for me.

RequestError: {'status': -2, 'x-oss-request-id': '', 'details': "RequestError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))"}

from uni-fold.

WeianMao avatar WeianMao commented on September 26, 2024

@lhatsk @guolinke i succuss yesterday, but failed today. it seems like the sever is unstable. is it possible to download the dataset from baidu drive? the issue exists too long.

from uni-fold.

DimaMolod avatar DimaMolod commented on September 26, 2024

@DimaMolod The data generation code is almost the same as the one used in inference, except for the label extraction from mmcif. @ZiyaoLi maybe we can add a script for the mmcif processing.

BTW, our data generation code highly relies on the cloud services (mostly Ali-cloud), because it is impossible to generate the data by a single machine. In particular, it takes us several months by hundreds of machines to generate these data. Therefore, we think it is less realistic to generate these data from scratch.

Thank you, it would be very useful if you could upload a script for the label extraction from mmcif files.

from uni-fold.

guolinke avatar guolinke commented on September 26, 2024

@DimaMolod The data generation code is almost the same as the one used in inference, except for the label extraction from mmcif. @ZiyaoLi maybe we can add a script for the mmcif processing.
BTW, our data generation code highly relies on the cloud services (mostly Ali-cloud), because it is impossible to generate the data by a single machine. In particular, it takes us several months by hundreds of machines to generate these data. Therefore, we think it is less realistic to generate these data from scratch.

Thank you, it would be very useful if you could upload a script for the label extraction from mmcif files.

@teslacool can you merge the code into this repo?

from uni-fold.

dingquanyu avatar dingquanyu commented on September 26, 2024

Hi,

I managed to resolve the 104 error shown above but then this ReadTimeoutError was reported. Could you maybe increase your default timeout from 60s to something longer?

Thanks a lot.

HTTPConnectionPool(host='www.modelscope.cn', port=80): Max retries exceeded with url: /api/v1/datasets/DPTech/Uni-Fold-Data/oss/tree/?MaxLimit=-1&Revision=master&Recursive=True&FilterDir=True (Caused by ReadTimeoutError("HTTPConnectionPool(host='www.modelscope.cn', port=80): Read timed out. (read timeout=60)"))

I am trying to download the "Full training dataset" using modelscope but the MsDataset.load() doesn't work for me because the connection gets broken by peer. The latest message I get is: File "/home/dmolodenskiy/.conda/envs/py38/lib/python3.8/site-packages/requests/models.py", line 818, in generate raise ChunkedEncodingError(e) requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))

from uni-fold.

guolinke avatar guolinke commented on September 26, 2024

@henrywotton you can report the issue to https://github.com/modelscope/modelscope

from uni-fold.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.