Giter Site home page Giter Site logo

Access to .npy datasets about design-bench HOT 10 OPEN

preritt avatar preritt commented on June 3, 2024
Access to .npy datasets

from design-bench.

Comments (10)

brandontrabucco avatar brandontrabucco commented on June 3, 2024 1

Hello preritt,

Thanks for your interest in the benchmark. If you would like to download the entire benchmark at once to access the raw .npy files, they are available at the following gcp bucket:

https://github.com/rail-berkeley/design-bench/blob/new-api/design_bench/disk_resource.py#L7

This post may be of interest if you are not familiar with gsutil:

https://stackoverflow.com/questions/58581873/how-to-download-an-entire-bucket-in-gcp

Generally speaking, the dataset files are downloaded as needed from gcp when design_bench.make is called. Could you share the full script producing the error, and the full stack trace?

Warm regards,
Brandon

from design-bench.

preritt avatar preritt commented on June 3, 2024

Hi Brandon,

Sorry for the delayed response.
Thanks for the information!
Here is the code I used

import design_bench

# task = design_bench.make('TGFP-Transformer-v0')
# task = design_bench.make('TFBind8-Exact-v0')
task = design_bench.make('ChEMBL-ResNet-v0')

This is the error

`Traceback (most recent call last):

  File "/BerkleyDesignBenchVer01/testBerkleyV1.py", line 12, in <module>
    task = design_bench.make('ChEMBL-ResNet-v0')

  File "/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 328, in make
    oracle_kwargs=oracle_kwargs, **kwargs)

  File "/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 157, in make
    oracle_kwargs=oracle_kwargs, **kwargs)

  File "BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 111, in make
    oracle_kwargs=oracle_kwargs_final, **kwargs)

  File "BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/task.py", line 245, in __init__
    dataset = import_name(dataset)(**kwargs)

  File "BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/datasets/discrete/chembl_dataset.py", line 310, in __init__
    soft_interpolation=soft_interpolation, **kwargs)

  File "/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/datasets/discrete_dataset.py", line 279, in __init__
    super(DiscreteDataset, self).__init__(*args, **kwargs)

  File "/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/datasets/dataset_builder.py", line 470, in __init__
    for i, y in enumerate(self.iterate_samples(return_x=False)):

  File "/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/datasets/dataset_builder.py", line 865, in iterate_samples
    return_x=return_x, return_y=return_y):

  File "/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/datasets/dataset_builder.py", line 762, in iterate_batches
    y_shard_data = self.get_shard_y(shard_id)

  File "BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/datasets/dataset_builder.py", line 566, in get_shard_y
    return np.load(self.y_shards[shard_id].disk_target)

  File "BerkleyDesignBenchV1/lib/python3.7/site-packages/numpy/lib/npyio.py", line 416, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))

FileNotFoundError: [Errno 2] No such file or directory: 'BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench_data/chembl-GI50-CHEMBL1964047/chembl-y-2.npy'`

I'll try the GCP method and get back in case of error.

Thank you so much for your response!

from design-bench.

brandontrabucco avatar brandontrabucco commented on June 3, 2024

Could you try calling design_bench.make on a ChEMBL task with the following format:

https://github.com/rail-berkeley/design-bench/blob/new-api/design_bench/__init__.py#L809

For example, design_bench.make("ChEMBL_MCHC_CHEMBL3885882_MorganFingerprint-RandomForest-v0")

from design-bench.

preritt avatar preritt commented on June 3, 2024

I tried the following:
task =design_bench.make("ChEMBL_MCHC_CHEMBL3885882_MorganFingerprint-RandomForest-v0")
However, I got the following error now.

Traceback (most recent call last):

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 201, in spec
    return self.task_specs[task_name]

KeyError: 'ChEMBL_MCHC_CHEMBL3885882_MorganFingerprint-RandomForest-v0'


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "perspectaTestsVer2/perspectaV1/myCodesV9Della/BerkleyDesignBenchVer01/testBerkleyV1.py", line 13, in <module>
    task =design_bench.make("ChEMBL_MCHC_CHEMBL3885882_MorganFingerprint-RandomForest-v0")

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 328, in make
    oracle_kwargs=oracle_kwargs, **kwargs)

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 155, in make
    return self.spec(task_name).make(

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/registration.py", line 232, in spec
    UNKNOWN_MESSAGE.format(task_name))

ValueError: No registered task with name: ChEMBL_MCHC_CHEMBL3885882_MorganFingerprint-RandomForest-v0

from design-bench.

brandontrabucco avatar brandontrabucco commented on June 3, 2024

Could you check which version number of the benchmark you have installed?

from design-bench.

preritt avatar preritt commented on June 3, 2024

It is 2.0.12

design-bench 2.0.12 pypi_0 pypi

from design-bench.

brandontrabucco avatar brandontrabucco commented on June 3, 2024

The latest version of the benchmark is 2.0.20, could you try that version?

from design-bench.

preritt avatar preritt commented on June 3, 2024

I have the correct version now:

design-bench 2.0.20 pypi_0 pypi

Not sure why, but now I get an import error when using:
import design_bench

runcell(0, '/BerkleyDesignBenchVer01/testBerkleyV1.py')
Traceback (most recent call last):

  File "/BerkleyDesignBenchVer01/testBerkleyV1.py", line 8, in <module>
    import design_bench

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/__init__.py", line 766, in <module>
    feature_extractor=MorganFingerprintFeatures(dtype=np.int32),

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/design_bench/oracles/feature_extractors/morgan_fingerprint_features.py", line 74, in __init__
    os.path.join(DATA_DIR, 'smiles_vocab.txt'))

  File "condaEnvs/BerkleyDesignBenchV1/lib/python3.7/site-packages/deepchem/feat/smiles_tokenizer.py", line 89, in __init__
    self.max_len_single_sentence = self.max_len - 2

AttributeError: 'SmilesTokenizer' object has no attribute 'max_len'

from design-bench.

brandontrabucco avatar brandontrabucco commented on June 3, 2024

Ah, this can happen if an incompatible version of deepchem is installed. Can you try installing the version of deepchem listed here: https://github.com/brandontrabucco/design-baselines/blob/master/requirements.txt#L29

I'm not sure if that's the only package that may need an update, so perhaps check the whole requirements file.

from design-bench.

preritt avatar preritt commented on June 3, 2024

Thanks a lot! I did a pip install on the requirements and it resolved the issue.

from design-bench.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.