Hi, Thank you for releasing the package! I wanted to check the procedure to ac

Access to .npy datasets about design-bench HOT 10 OPEN

preritt commented on June 3, 2024

Access to .npy datasets

from design-bench.

Comments (10)

brandontrabucco commented on June 3, 2024 1

Hello preritt,

Thanks for your interest in the benchmark. If you would like to download the entire benchmark at once to access the raw .npy files, they are available at the following gcp bucket:

https://github.com/rail-berkeley/design-bench/blob/new-api/design_bench/disk_resource.py#L7

This post may be of interest if you are not familiar with gsutil:

https://stackoverflow.com/questions/58581873/how-to-download-an-entire-bucket-in-gcp

Generally speaking, the dataset files are downloaded as needed from gcp when design_bench.make is called. Could you share the full script producing the error, and the full stack trace?

Warm regards,
Brandon

from design-bench.

preritt commented on June 3, 2024

Hi Brandon,

Sorry for the delayed response.
Thanks for the information!
Here is the code I used

import design_bench

# task = design_bench.make('TGFP-Transformer-v0')
# task = design_bench.make('TFBind8-Exact-v0')
task = design_bench.make('ChEMBL-ResNet-v0')

This is the error

`Traceback (most recent call last):

  File "/BerkleyDesignBenchVer01/testBerkleyV1.py", line 12, in <module>
    task = design_bench.make('ChEMBL-ResNet-v0')

  File "/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 328, in make
    oracle_kwargs=oracle_kwargs, **kwargs)

  File "/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 157, in make
    oracle_kwargs=oracle_kwargs, **kwargs)

  File "BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 111, in make
    oracle_kwargs=oracle_kwargs_final, **kwargs)

  File "BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/task.py", line 245, in __init__
    dataset = import_name(dataset)(**kwargs)

  File "BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/datasets/discrete/chembl_dataset.py", line 310, in __init__
    soft_interpolation=soft_interpolation, **kwargs)

  File "/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/datasets/discrete_dataset.py", line 279, in __init__
    super(DiscreteDataset, self).__init__(*args, **kwargs)

  File "/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/datasets/dataset_builder.py", line 470, in __init__
    for i, y in enumerate(self.iterate_samples(return_x=False)):

  File "/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/datasets/dataset_builder.py", line 865, in iterate_samples
    return_x=return_x, return_y=return_y):

  File "/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/datasets/dataset_builder.py", line 762, in iterate_batches
    y_shard_data = self.get_shard_y(shard_id)

  File "BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/datasets/dataset_builder.py", line 566, in get_shard_y
    return np.load(self.y_shards[shard_id].disk_target)

  File "BerkleyDesignBenchV1/lib/python3.7/site-packages/numpy/lib/npyio.py", line 416, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))

FileNotFoundError: [Errno 2] No such file or directory: 'BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench_data/chembl-GI50-CHEMBL1964047/chembl-y-2.npy'`

I'll try the GCP method and get back in case of error.

Thank you so much for your response!

from design-bench.

brandontrabucco commented on June 3, 2024

Could you try calling design_bench.make on a ChEMBL task with the following format:

https://github.com/rail-berkeley/design-bench/blob/new-api/design_bench/__init__.py#L809

For example, design_bench.make("ChEMBL_MCHC_CHEMBL3885882_MorganFingerprint-RandomForest-v0")

from design-bench.

preritt commented on June 3, 2024

I tried the following:
task =design_bench.make("ChEMBL_MCHC_CHEMBL3885882_MorganFingerprint-RandomForest-v0")
However, I got the following error now.

Traceback (most recent call last):

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 201, in spec
    return self.task_specs[task_name]

KeyError: 'ChEMBL_MCHC_CHEMBL3885882_MorganFingerprint-RandomForest-v0'


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "perspectaTestsVer2/perspectaV1/myCodesV9Della/BerkleyDesignBenchVer01/testBerkleyV1.py", line 13, in <module>
    task =design_bench.make("ChEMBL_MCHC_CHEMBL3885882_MorganFingerprint-RandomForest-v0")

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 328, in make
    oracle_kwargs=oracle_kwargs, **kwargs)

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 155, in make
    return self.spec(task_name).make(

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 232, in spec
    UNKNOWN_MESSAGE.format(task_name))

ValueError: No registered task with name: ChEMBL_MCHC_CHEMBL3885882_MorganFingerprint-RandomForest-v0

from design-bench.

brandontrabucco commented on June 3, 2024

Could you check which version number of the benchmark you have installed?

from design-bench.

preritt commented on June 3, 2024

It is 2.0.12

design-bench 2.0.12 pypi_0 pypi

from design-bench.

brandontrabucco commented on June 3, 2024

The latest version of the benchmark is 2.0.20, could you try that version?

from design-bench.

preritt commented on June 3, 2024

I have the correct version now:

design-bench 2.0.20 pypi_0 pypi

Not sure why, but now I get an import error when using:
import design_bench

runcell(0, '/BerkleyDesignBenchVer01/testBerkleyV1.py')
Traceback (most recent call last):

  File "/BerkleyDesignBenchVer01/testBerkleyV1.py", line 8, in <module>
    import design_bench

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/__init__.py", line 766, in <module>
    feature_extractor=MorganFingerprintFeatures(dtype=np.int32),

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/oracles/feature_extractors/morgan_fingerprint_features.py", line 74, in __init__
    os.path.join(DATA_DIR, 'smiles_vocab.txt'))

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/deepchem/feat/smiles_tokenizer.py", line 89, in __init__
    self.max_len_single_sentence = self.max_len - 2

AttributeError: 'SmilesTokenizer' object has no attribute 'max_len'

from design-bench.

brandontrabucco commented on June 3, 2024

Ah, this can happen if an incompatible version of deepchem is installed. Can you try installing the version of deepchem listed here: https://github.com/brandontrabucco/design-baselines/blob/master/requirements.txt#L29

I'm not sure if that's the only package that may need an update, so perhaps check the whole requirements file.

from design-bench.

preritt commented on June 3, 2024

Thanks a lot! I did a pip install on the requirements and it resolved the issue.

from design-bench.

Access to .npy datasets about design-bench HOT 10 OPEN

Comments (10)

Related Issues (2)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent