Giter Site home page Giter Site logo

minkaixu / geoldm Goto Github PK

View Code? Open in Web Editor NEW
190.0 190.0 37.0 23.23 MB

Geometric Latent Diffusion Models for 3D Molecule Generation

License: MIT License

Python 100.00%
deep-generative-model diffusion-models drug-discovery geometric-deep-learning icml-2023 molecule molecule-generation

geoldm's People

Contributors

minkaixu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

geoldm's Issues

Question about datasets_config.py script

Hello Minkai! My team and I had a question about datasets_config.py script: what do the parameters 'distances' and 'radius_dic' mean? Thank you so much, we would really appreciate your response!

Question about autoencoder training stage

Hello, regarding the autoencoder phase, I have a question. Is the latent invariant feature dimension k mentioned in the paper referring to the dimension of the encoder's output μh, while the dimension of μx remains 3?

how to use for nonQM9 and nonDrug?

HI MinkaiXu,

The paper looks very exciting. I have a small input dataset with SMILES and an associated experimental property. Is there a simplified documentation on how I can try your algorithm to read such input and try training-testing along with my choice of property prediction performance, prediction evaluation and structure generation?

To non-expert, I am not sure how to go about trying your algorithm on my own input dataset, thanks so much. For my input, I cannot use your QM9 and drug training datasets.

Any guidance/help/code is appreciated. Thanks,
JL

Joint Training

Hi Minkai,

Thanks for the amazing work! I am wondering if in the code the AE and the Diffusion are trained jointly by default (with --train_diffusion and --trainable_ae), instead of training separately?

question while loading pretrained models

Thanks for your code @MinkaiXu a lot : ), it's cool.
I met a problem in loading pretrained models, the error message is as follows:
Traceback (most recent call last): File "eval_analyze.py", line 198, in <module> main() File "eval_analyze.py", line 127, in main with open(join(eval_args.model_path, 'args.pickle'), 'rb') as f: FileNotFoundError: [Errno 2] No such file or directory: 'outputs/qm9_p/args.pickle'
I'm confused how to solve it. Hope you'll help me with it!

Other output format

Hey Minkai,

thanks for sharing your promising work. Is there a way to convert the output to some other format like pdbtq/pdb, Chem.Mol, or a smiles/SELFies Object? (I can only find the .txt files which only contain the positions of atoms)

Thanks,
Lennart Jaretzki

z_h data formatting issue in EnLatentDiffusion model

Hi Minkai!

Thank you for sharing the code! Just one quick question regarding the format of h data throughout the training and sampling process.

At first, h is defined as {'categorical': one_hot, 'integer': charges} and the data is concatenated with categorical at the front of integer. However, at line 1310 of the EnLatentDiffusion model, z_h is formatted as z_h = {'categorical': torch.zeros(0).to(z_h), 'integer': z_h}, meaning that the charges part is placed before the categorical part.

Then, take sampling for instance: here z0[:, :, -1:] is used as charges, meaning z0 has a format different from that of z_h in the diffusion model.

Should z_h = {'categorical': torch.zeros(0).to(z_h), 'integer': z_h} be changed to z_h = {'categorical': z_h, 'integer': torch.zeros(0).to(z_h)} instead?

Thanks!
Tianyi

Autoencoder is identity function on atom coordinates? Equivalence to EDM

Hi Minkai,

Thank you for sharing this work! When I analyze the sampling results of GeoLDM, I found the latent variable z_x is almost equal to the decoded atom positions. Below are molecules I reconstructed with decoded atom pos and atom type (left) and z_x and decoded atom type (right) respectively. They are almost same.

z_x + recon atom type recon atom pos + recon atom type

A further analysis on the reconstruction results of the auto encoder in GeoLDM indicates that both encoder and decoder are almost identity functions on atom coordinates. If so, can I consider GeoLDM is actually equivalent to 3D space diffusion (i.e. EDM) since #latent variables is equal to #atoms and both encoder and decoder are identity functions on atom coordinates, except that there is an auto-encoder part on atom types?

If this is correct, I’m also wondering how did you train the autoencoder in your published version. I can understand the training will lead to identity functions with the reconstruction loss only, but you mentioned in the repo that the encoder is remained untrained. If so, why is the encoder not a random mapping but a identity function instead?

Thanks!

Drug data split

Hi,
I try to use the main_geom_drugs,py to run , but it seems to have some error,

image

And I also try to solve it, but it maybe is the build_geom_dataset.py and line 101 data_list = [data_list[i] for i in perm], this problem is because the data_list contains subarrays of varying shapes,and in line 107 np.spilt need same shape,
So how can I solve this problem?

best regard,
Zhongyu

Training time for QM9 dataset

How long does it take to train the model on qm9 for 3000 epochs with batch size of 64? On my machine it seems like even one epoch would take 5 hours with a batch size of 64. Are the hyper parameters I am using correct?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.