Implementation for our paper, submitted to NeurIPS 2021 (also check this high-level blog post).
This is a minimum working version of the code used for the paper, which is extracted from the internal repository of the Mila Molecule Discovery project. Original commits are lost here, but the credit for this code goes to @bengioe, @MJ10 and @MKorablyov (see paper).
Requirements for base experiments:
torch numpy scipy tqdm
Additional requirements for active learning experiments:
botorch gpytorch
Additional requirements:
pandas rdkit torch_geometric h5py
- a few biochemistry programs, see
mols/Programs/README
For rdkit
in particular we found it to be easier to install through (mini)conda. torch_geometric
has non-trivial installation instructions.
We compress the 300k molecule dataset for size. To uncompress it, run cd mols/data/; gunzip docked_mols.h5.gz
.
We omit docking routines since they are part of a separate contribution still to be submitted. These are available on demand, please do reach out to [email protected] or [email protected].