Giter Site home page Giter Site logo

mist's People

Contributors

adafede avatar connorcoley avatar samgoldman97 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

mist's Issues

05_run_models.sh fails without GPU

Note that cudatoolkit is already commented out in environment.yml as of 2c4c801.

Dockerfile:

FROM mambaorg/micromamba:latest

COPY --chown=$MAMBA_USER:$MAMBA_USER mist/environment.yml /tmp/env.yml
RUN micromamba install -y -n base -f /tmp/env.yml && \
    micromamba clean --all --yes

RUN micromamba install -y -n base -c conda-forge \
    git curl wget unzip

COPY --chown=$MAMBA_USER:$MAMBA_USER mist /mist
ARG MAMBA_DOCKERFILE_ACTIVATE=1
RUN pip install -q --exists-action i -r /mist/requirements.txt && \
    cd /mist && \
    wget -q https://bio.informatik.uni-jena.de/repository/dist-release-local/de/unijena/bioinf/ms/sirius/4.9.3/sirius-4.9.3-linux64-headless.zip && \
    unzip -q sirius-4.9.3-linux64-headless.zip && \
    rm sirius-4.9.3-linux64-headless.zip && \
    python setup.py develop

Building:

git clone [email protected]:samgoldman97/mist.git
docker build -t mist .
source quickstart/00_download_models.sh
# works
python quickstart/01_reformat_mgf.py
# works (except mkdir: cannot create directory ‘data/paired_spectra/quickstart’: File exists)
source quickstart/02_run_sirius.sh
# works
python quickstart/03_summarize_sirius.py
# works
python quickstart/04_create_lookup.py
# works
source quickstart/05_run_models.sh
# gives following error:
mkdir: cannot create directory ‘quickstart/model_predictions’: File exists
Traceback (most recent call last):
  File "run_scripts/pred_fp.py", line 8, in <module>
    pred_fp.run_fp_pred()
  File "/mist/src/mist/pred_fp.py", line 70, in run_fp_pred
    pretrain_ckpt = torch.load(model_ckpt)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 607, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 882, in _load
    result = unpickler.load()
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 857, in persistent_load
    load_tensor(data_type, size, key, _maybe_decode_ascii(location))
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 846, in load_tensor
    loaded_storages[key] = restore_location(storage, location)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 175, in default_restore_location
    result = fn(storage, location)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 151, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 135, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
Traceback (most recent call last):
  File "run_scripts/retrieval_contrastive.py", line 9, in <module>
    retrieval_contrast.run_contrastive_retrieval()
  File "/mist/src/mist/retrieval_contrast.py", line 123, in run_contrastive_retrieval
    pretrain_ckpt = torch.load(model_ckpt)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 607, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 882, in _load
    result = unpickler.load()
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 857, in persistent_load
    load_tensor(data_type, size, key, _maybe_decode_ascii(location))
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 846, in load_tensor
    loaded_storages[key] = restore_location(storage, location)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 175, in default_restore_location
    result = fn(storage, location)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 151, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 135, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
100%|███████████████████████████████████████████| 7/7 [00:00<00:00, 2498.73it/s]
Traceback (most recent call last):
  File "run_scripts/embed_contrastive.py", line 9, in <module>
    embed_contrast.embed_specs()
  File "/mist/src/mist/embed_contrast.py", line 62, in embed_specs
    pretrain_ckpt = torch.load(model_ckpt)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 607, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 882, in _load
    result = unpickler.load()
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 857, in persistent_load
    load_tensor(data_type, size, key, _maybe_decode_ascii(location))
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 846, in load_tensor
    loaded_storages[key] = restore_location(storage, location)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 175, in default_restore_location
    result = fn(storage, location)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 151, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/opt/conda/lib/python3.8/site-packages/torch/serialization.py", line 135, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

I have not yet tried adding --gpu false parameters into the quickstart scripts, which seems like it would do something in pred_fp and retrieval_contrastive.

Exporting Molecular Fingerprints of Compounds

Hello, Sam. Your work is highly meaningful, but I would like to inquire if it is possible to export molecular fingerprints for the MS/MS spectra as well as the molecular fingerprints of compounds with known structures. If you could clarify these doubts for me, I would greatly appreciate it.

MIST fingerprint model on Mac OS

I am having trouble running the MIST fingerprint model on mac os. I have been able to run everything up until the FFN binned model except the MIST fingerprint model. I rebuilt the environment from scratch due getting UnsatisfiableError: The following specifications were found to be incompatible with each other: ... after running conda env create -f environment.yml . I have attached conda list below if it helps. The error I am getting is TypeError: h5py objects cannot be pickled. I am suspicious it is an OS issue because I have done the same procedure on linux and python run_scripts/train_mist.py works.

  • Error running MIST fingerprint model
(ms-gen) michaelvolk@M1-MV mist % mkdir results/model_train_demos                            8:26
python run_scripts/train_mist.py --cache-featurizers --dataset-name 'canopus_train_public' --fp-names morgan4096 --num-workers 10 --seed 1 --gpus 0 --split-file 'data/paired_spectra/canopus_train_public/splits/canopus_hplus_100_0.csv' --splitter-name 'preset' --augment-data --augment-prob 0.5 --batch-size 128 --inten-prob 0.1 --remove-prob 0.5 --remove-weights 'exp' --iterative-preds 'growing' --iterative-loss-weight 0.4 --learning-rate 0.00077 --weight-decay 1e-07 --max-epochs 600 --min-lr 0.0001 --lr-decay-time 10000 --lr-decay-frac 0.95 --hidden-size 256 --num-heads 8 --pairwise-featurization --peak-attn-layers 2 --refine-layers 4 --set-pooling 'cls' --spectra-dropout 0.1 --single-form-encoder --recycle-form-encoder --use-cls --cls-type 'ms1' --loss-fn 'cosine' --magma-aux-loss --frag-fps-loss-lambda 8 --magma-modulo 512 --patience 30 --save-dir 'mist_fp_model' --save-dir results/model_train_demos/mist_fp_model
mkdir: results/model_train_demos: File exists
Global seed set to 1
2023-04-13 08:40:49,196 INFO: add_forward_specs: false
additive_attn: false
augment_data: true
augment_prob: 0.5
batch_size: 128
cache_featurizers: true
ckpt_file: null
cls_type: ms1
dataset_name: canopus_train_public
debug: false
forward_aug_folder: null
fp_names:
- morgan4096
frac_orig: 0.4
frag_fps_loss_lambda: 8.0
gpus: 0
gradient_clip_val: 5
hidden_size: 256
inten_prob: 0.1
iterative_loss_weight: 0.4
iterative_preds: growing
learning_rate: 0.00077
loss_fn: cosine
lr_decay_frac: 0.95
lr_decay_time: 10000
magma_aux_loss: true
magma_modulo: 512
max_epochs: 600
max_peaks: null
min_epochs: null
min_lr: 0.0001
num_heads: 8
num_workers: 10
optim_name: radam
pairwise_featurization: true
patience: 30
peak_attn_layers: 2
persistent_workers: false
recycle_form_encoder: true
refine_layers: 4
remove_prob: 0.5
remove_weights: exp
reshuffle_val: false
save_dir: results/model_train_demos/mist_fp_model
scheduler: false
seed: 1
set_pooling: cls
shuffle_train: false
single_form_encoder: true
spectra_dropout: 0.1
split_file: data/paired_spectra/canopus_train_public/splits/canopus_hplus_100_0.csv
split_sizes:
- 0.8
- 0.1
- 0.1
splitter_name: preset
top_layers: 1
use_cls: true
weight_decay: 1.0e-07
worst_k_weight: null

^[[A2023-04-13 08:40:49,584 INFO: Loading paired specs
2023-04-13 08:40:49,921 INFO: Converting paired samples into Spectra objects
10709it [00:00, 154101.73it/s]
10709it [00:01, 9556.15it/s]
10709it [00:00, 5535034.08it/s]
2023-04-13 08:40:51,128 INFO: Done creating spectra objects
2023-04-13 08:40:51,225 INFO: Len of train: 6141
2023-04-13 08:40:51,225 INFO: Len of val: 1070
2023-04-13 08:40:51,225 INFO: Len of test: 819
2023-04-13 08:40:51,247 INFO: Created a temporary directory at /var/folders/t3/hcfdx0qs0rsd9bm4230xv_zc0000gn/T/tmp3ont5vbm
2023-04-13 08:40:51,247 INFO: Writing /var/folders/t3/hcfdx0qs0rsd9bm4230xv_zc0000gn/T/tmp3ont5vbm/_remote_module_non_scriptable.py
2023-04-13 08:40:51,307 INFO: Starting fold: Fold_100_0
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:611: UserWarning: Checkpoint directory /Users/michaelvolk/Documents/projects/mist/results/model_train_demos/mist_fp_model/Fold_100_0 exists and is not empty.
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")

  | Name            | Type       | Params
-----------------------------------------------
0 | bce_loss        | BCELoss    | 0     
1 | spectra_encoder | ModuleList | 15.0 M
-----------------------------------------------
15.0 M    Trainable params
8.2 K     Non-trainable params
15.0 M    Total params
59.924    Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]Traceback (most recent call last):
  File "run_scripts/train_mist.py", line 8, in <module>
    train_mist.run_training()
  File "/Users/michaelvolk/Documents/projects/mist/src/mist/train_mist.py", line 78, in run_training
    test_loss = model.train_model(
  File "/Users/michaelvolk/Documents/projects/mist/src/mist/models/base.py", line 320, in train_model
    trainer.fit(self, module)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
    self._call_and_handle_interrupt(
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
    results = self._run_stage()
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
    return self._run_train()
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1345, in _run_train
    self._run_sanity_check()
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1413, in _run_sanity_check
    val_loop.run()
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance
    dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 199, in run
    self.on_run_start(*args, **kwargs)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 88, in on_run_start
    self._data_fetcher = iter(data_fetcher)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/pytorch_lightning/utilities/fetching.py", line 178, in __iter__
    self.dataloader_iter = iter(self.dataloader)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 442, in __iter__
    return self._get_iterator()
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 388, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1043, in __init__
    w.start()
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "/Users/michaelvolk/opt/miniconda3/envs/ms-gen/lib/python3.8/site-packages/h5py/_hl/base.py", line 368, in __getnewargs__
    raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled
  • Environment.
(ms-gen) michaelvolk@M1-MV mist % conda list                                                 8:40
# packages in environment at /Users/michaelvolk/opt/miniconda3/envs/ms-gen:
#
# Name                    Version                   Build  Channel
absl-py                   1.4.0                    pypi_0    pypi
aiohttp                   3.8.4                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
alembic                   1.10.3                   pypi_0    pypi
appnope                   0.1.3                    pypi_0    pypi
asttokens                 2.2.1                    pypi_0    pypi
async-timeout             4.0.2                    pypi_0    pypi
attrs                     22.2.0                   pypi_0    pypi
backcall                  0.2.0                    pypi_0    pypi
bzip2                     1.0.8                h3422bc3_4    conda-forge
ca-certificates           2022.12.7            h4653dfc_0    conda-forge
cachetools                5.3.0                    pypi_0    pypi
cairocffi                 1.5.0                    pypi_0    pypi
cairosvg                  2.7.0                    pypi_0    pypi
certifi                   2022.12.7                pypi_0    pypi
cffi                      1.15.1                   pypi_0    pypi
charset-normalizer        3.1.0                    pypi_0    pypi
click                     8.1.3                    pypi_0    pypi
cloudpickle               2.2.1                    pypi_0    pypi
cmaes                     0.9.1                    pypi_0    pypi
colorlog                  6.7.0                    pypi_0    pypi
comm                      0.1.3                    pypi_0    pypi
contourpy                 1.0.7                    pypi_0    pypi
cssselect2                0.7.0                    pypi_0    pypi
cycler                    0.11.0                   pypi_0    pypi
debugpy                   1.6.7                    pypi_0    pypi
decorator                 5.1.1                    pypi_0    pypi
defusedxml                0.7.1                    pypi_0    pypi
dill                      0.3.6                    pypi_0    pypi
distlib                   0.3.6                    pypi_0    pypi
einops                    0.6.0                    pypi_0    pypi
executing                 1.2.0                    pypi_0    pypi
filelock                  3.11.0                   pypi_0    pypi
fonttools                 4.39.3                   pypi_0    pypi
frozenlist                1.3.3                    pypi_0    pypi
fsspec                    2023.4.0                 pypi_0    pypi
future                    0.18.3                   pypi_0    pypi
google-auth               2.17.3                   pypi_0    pypi
google-auth-oauthlib      1.0.0                    pypi_0    pypi
grpcio                    1.49.1                   pypi_0    pypi
h5py                      3.8.0                    pypi_0    pypi
hyperopt                  0.2.7                    pypi_0    pypi
idna                      3.4                      pypi_0    pypi
importlib-metadata        6.3.0              pyha770c72_0    conda-forge
importlib-resources       5.12.0                   pypi_0    pypi
importlib_metadata        6.3.0                hd8ed1ab_0    conda-forge
ipykernel                 6.22.0                   pypi_0    pypi
ipython                   8.12.0                   pypi_0    pypi
jedi                      0.18.2                   pypi_0    pypi
jinja2                    3.1.2                    pypi_0    pypi
joblib                    1.2.0                    pypi_0    pypi
jsonschema                4.17.3                   pypi_0    pypi
jupyter_client            8.1.0              pyhd8ed1ab_0    conda-forge
jupyter_core              5.3.0            py38h10201cd_0    conda-forge
kiwisolver                1.4.4                    pypi_0    pypi
libcxx                    16.0.1               h75e25f2_0    conda-forge
libffi                    3.4.2                h3422bc3_5    conda-forge
libsodium                 1.0.18               h27ca646_1    conda-forge
libsqlite                 3.40.0               h76d750c_0    conda-forge
libzlib                   1.2.13               h03a7124_4    conda-forge
llvmlite                  0.39.1                   pypi_0    pypi
mako                      1.2.4                    pypi_0    pypi
markdown                  3.4.3                    pypi_0    pypi
markupsafe                2.1.2                    pypi_0    pypi
matplotlib                3.7.1                    pypi_0    pypi
matplotlib-inline         0.1.6                    pypi_0    pypi
mist                      -version.number-.0.0.1           dev_0    <develop>
mpmath                    1.3.0                    pypi_0    pypi
msgpack                   1.0.5                    pypi_0    pypi
multidict                 6.0.4                    pypi_0    pypi
multiprocess              0.70.14                  pypi_0    pypi
nb_conda_kernels          2.3.1            py38h10201cd_2    conda-forge
ncurses                   6.3                  h07bb92c_1    conda-forge
nest-asyncio              1.5.6                    pypi_0    pypi
networkx                  3.1                      pypi_0    pypi
numba                     0.56.4                   pypi_0    pypi
numpy                     1.23.5                   pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
openssl                   3.1.0                h03a7124_0    conda-forge
optuna                    3.1.1                    pypi_0    pypi
packaging                 23.1                     pypi_0    pypi
pandas                    2.0.0                    pypi_0    pypi
parso                     0.8.3                    pypi_0    pypi
pathos                    0.3.0                    pypi_0    pypi
pexpect                   4.8.0                    pypi_0    pypi
pickleshare               0.7.5                    pypi_0    pypi
pillow                    9.5.0                    pypi_0    pypi
pip                       23.0.1             pyhd8ed1ab_0    conda-forge
pkgutil-resolve-name      1.3.10                   pypi_0    pypi
platformdirs              3.2.0              pyhd8ed1ab_0    conda-forge
pox                       0.3.2                    pypi_0    pypi
ppft                      1.7.6.6                  pypi_0    pypi
prompt-toolkit            3.0.38                   pypi_0    pypi
protobuf                  3.20.1                   pypi_0    pypi
psutil                    5.9.4                    pypi_0    pypi
ptyprocess                0.7.0                    pypi_0    pypi
pure-eval                 0.2.2                    pypi_0    pypi
py4j                      0.10.9.7                 pypi_0    pypi
pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
pycparser                 2.21                     pypi_0    pypi
pydeprecate               0.3.2                    pypi_0    pypi
pygments                  2.15.0                   pypi_0    pypi
pynndescent               0.5.8                    pypi_0    pypi
pyparsing                 3.0.9                    pypi_0    pypi
pyrsistent                0.19.3                   pypi_0    pypi
python                    3.8.16          h3ba56d0_1_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.8                      3_cp38    conda-forge
pytorch-lightning         1.6.5                    pypi_0    pypi
pytz                      2023.3                   pypi_0    pypi
pyyaml                    6.0                      pypi_0    pypi
pyzmq                     25.0.2           py38hb72be9f_0    conda-forge
radam                     0.0.1                    pypi_0    pypi
ray                       2.3.1                    pypi_0    pypi
ray-lightning             0.3.0                    pypi_0    pypi
rdkit                     2022.9.5                 pypi_0    pypi
readline                  8.2                  h92ec313_1    conda-forge
requests                  2.28.2                   pypi_0    pypi
requests-oauthlib         1.3.1                    pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
scikit-learn              1.2.2                    pypi_0    pypi
scipy                     1.10.1                   pypi_0    pypi
seaborn                   0.12.2                   pypi_0    pypi
setuptools                59.5.0                   pypi_0    pypi
six                       1.16.0             pyh6c4a22f_0    conda-forge
sqlalchemy                2.0.9                    pypi_0    pypi
stack-data                0.6.2                    pypi_0    pypi
sympy                     1.11.1                   pypi_0    pypi
tabulate                  0.9.0                    pypi_0    pypi
tensorboard               2.12.1                   pypi_0    pypi
tensorboard-data-server   0.7.0                    pypi_0    pypi
tensorboard-plugin-wit    1.8.1                    pypi_0    pypi
tensorboardx              2.6                      pypi_0    pypi
threadpoolctl             3.1.0                    pypi_0    pypi
tinycss2                  1.2.1                    pypi_0    pypi
tk                        8.6.12               he1e0b03_0    conda-forge
torch                     2.0.0                    pypi_0    pypi
torchaudio                2.0.1                    pypi_0    pypi
torchmetrics              0.11.4                   pypi_0    pypi
torchvision               0.15.1                   pypi_0    pypi
tornado                   6.2              py38hb991d35_1    conda-forge
tqdm                      4.65.0                   pypi_0    pypi
traitlets                 5.9.0              pyhd8ed1ab_0    conda-forge
typing-extensions         4.5.0                hd8ed1ab_0    conda-forge
typing_extensions         4.5.0              pyha770c72_0    conda-forge
tzdata                    2023.3                   pypi_0    pypi
umap-learn                0.5.3                    pypi_0    pypi
urllib3                   1.26.15                  pypi_0    pypi
virtualenv                20.21.0                  pypi_0    pypi
wcwidth                   0.2.6                    pypi_0    pypi
webencodings              0.5.1                    pypi_0    pypi
werkzeug                  2.2.3                    pypi_0    pypi
wheel                     0.40.0                   pypi_0    pypi
xz                        5.2.6                h57fd34a_0    conda-forge
yarl                      1.8.2                    pypi_0    pypi
zeromq                    4.3.4                hbdafb3b_1    conda-forge
zipp                      3.15.0             pyhd8ed1ab_0    conda-forge

csi2022 Data

If we would like to rebuild that csi2022 data to do some benchmarking, what is the best way to go about this? After getting the necessary licensing for NIST20 I'd like make sure our dataset will be as similar to csi2022 as possible. Thanks for any recommendations.

Reproducing CANOPUS benchmark

Dear Sam,

Very interesting work and it is great that you made the data and code publicly available. I am trying to reproduce the evaluation on the CANOPUS benchmark proposed in section 2.6 and have a few questions. I would appreciate your help in clarifying them:

  • When retrieving candidate molecules, do you consider only a set of isomers with unique fingerprints or all the PubChem formula isomers? I am asking because it significantly impacts the accuracy at k > 1.
  • Do you compute accuracy based on the first 14 characters of the InChiKey?
  • The folder canopus_train_public/retrieval_hdf seems to be empty. Do I understand it correctly that I should obtain candidate isomers by simply filtering cid_smiles.txt by the ground truth chemical formulas for each sample?
  • Could you please explain the following lines? Probably they answer my first question but I cannot understand their meaning. :) For all ties, the optimistic lower rank of the tied options is chosen. Ties are broken by selecting the minimum rank.

Thank you in advance!

Roman

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.