atomistic-machine-learning / schnetpack Goto Github PK

View Code? Open in Web Editor NEW

752.0 752.0 210.0 39.72 MB

SchNetPack - Deep Neural Networks for Atomistic Systems

License: Other

Python 98.26% C++ 1.48% Shell 0.26%

condensed-matter machine-learning molecular-dynamics neural-network quantum-chemistry

schnetpack's People

Contributors

Stargazers

Watchers

Forkers

amstrongli nicolikim ianfoster p16i lauri-codes jhwann mrnp wardlt wowml peterbjorgensen robertnf followjack c1niv dahburj jhrmnn chaoxianhu somnambwl farnazh jkitchin ruanyangry lmj1029123 cgrambow fenggo h4gen dom1l jduerholt haibaraes sunqm mandar5335 tsenapathi z5476t4508 wcreus medford-group sxie22 boyuezhong byael biofreak95 nzhan jnsls ivicts leucinw dumkar alvarovm yueyericardo zhenghh04 plin1112 erweihuang inspectordidi sirmarcel bkmi mrnp95 jonathanschmidt1 stanos4 sailfish009 jwz360 frosinastojanovska gqcg-oss mike575 hwpang yingli2009 martinsipka otayfuroglu adeeshkolluru himaghna sharcdyn mihailbogojeski philippthoelke jstoppelman xinglong-zhang dankomaister arnabmaj jagarridotorres pgg1610 jkha-unist jimmy-inl dmclark17 cocteautwins lqy12321 cleanit alexanderdkazakov qikuizhu q-posev exalearn noegroup zhongxiang117 zyt0y niuxf16 felixmusil pk-organics rdguerrerom aspirincode jxzhangjhu zqwjohng kingscolour truongquocchien juliawestermayr fermiq wendlerc divide-by-0 maxh1996

schnetpack's Issues

cleaning up the parser arguments in schnetpack_qm9.py

I was wondering, couldn't the call of schnetpack_qm9 be further simplified, at least for the evaluation? Some of the arguments, like property, are general arguments, not training arguments, however, if the model was trained for a certain property it makes little sense to evaluate it on another, right?

Then in line 308 and 310, the train_args are set to args or loaded from the json file created when training the model, however at some points it seems a bit arbitrary when train_args or args are used, e.g. in line 314 when the qm9 dataset is loaded, train_args.property is used, but then below when the atomref is loaded, args.property is used instead.

I also noticed that pool_mode has disappeared from the arguments, but I assume it is still relevant to set this to 'avg' instead of 'sum' for properties like LUMO, so could that be added back in (or set automatically depending on the property)?

WarmRestartHook wrong relational operator?

Hey,
in the function 'WarmRestartHook', self.best_current is initialized with infinity.
on_validation_end it calls an if-condition, when the val_loss is bigger than best_current. This can not be reached. I think it should be val_loss is smaller than best_current.
Than there is a condition, where it asks if best_current is bigger than best_previous. I think also here it should be reverse.

Maybe I misunderstood some things. In this case, I would be very happy for a short explanation.

I have 2 more questions, which I want to put in this ticket. I hope you can give your opinions.

does it make sense to combine warmResarts with ReduceLROnPlateauHook?
Did you have experience with a huge amount of space, which is necessary? Have small areas of proteins (10x10x10a) with cutoff 6a.
Schnet has n_atom_basis 64, filters 64, n_interactions 10. A batchsize bigger than 2 seems to big for a 8GB 1080GTX. An increment of the network to 128/128 with Batch_size 1 seems to be the maximum. Have you experience with this and an idea how to handle this? Batch_size of 1 or 2 seems very small. Did I assume a to big network or cutoff maybe? Maybe you can share your experience, which could help me at tuning.

"strict" argument in md.load_model()

schnetpack/src/schnetpack/md.py

Line 378 in 9caa03c

model.load_state_dict(torch.load(os.path.join(modelpath, 'best_model')))

First of all, thank you for the excellent library (nice design patterns, easy-to-use, extendable, and so on...).

I think above line should be
model.load_state_dict(torch.load(os.path.join(modelpath, 'best_model')), strict=False)

Otherwise, in my case, md.load_model() doesn't work...

max() arg is an empty sequence error

(schnetpack) victor@DEEPLEARN3:~/chemistry/Codes/schnetpack_reproduce$ python3 src/scripts/schnetpack_qm9.py train schnet data/seed_2000/energy_U0_cosine/data/qm9.db data/seed_2000/energy_U0_cosine/model --split 109000 1000 --cuda --property energy_U0 --cutoff_function cosine --seed 2000 --logger tensorboard
INFO:root:Random state initialized with seed 2000      
INFO:root:QM9 will be loaded...
INFO:schnetpack.data.atoms:The dataset has already been downloaded and stored at data/seed_2000/energy_U0_cosine/data/qm9.db
INFO:root:create splits...
INFO:root:load data...
INFO:root:calculate statistics...
INFO:root:cached statistics was loaded...
INFO:root:The model you built has: 1676033 parameters
INFO:root:training...
Traceback (most recent call last):
  File "src/scripts/schnetpack_qm9.py", line 606, in <module>
    train(args, model, train_loader, val_loader, device)
  File "src/scripts/schnetpack_qm9.py", line 328, in train
    args.modelpath, model, loss, optimizer, train_loader, val_loader, hooks=hooks
  File "/home/victor/anaconda3/envs/schnetpack/lib/python3.6/site-packages/schnetpack-0.2.1-py3.6.egg/schnetpack/train/trainer.py", line 60, in __init__
  File "/home/victor/anaconda3/envs/schnetpack/lib/python3.6/site-packages/schnetpack-0.2.1-py3.6.egg/schnetpack/train/trainer.py", line 119, in restore_checkpoint
ValueError: max() arg is an empty sequence

Hi, if I run the src/scripts/schnetpack_qm9.py it gives me error of "max() arg is an empty sequence error", I have tried uninstall schnetpack and install again from the source.

qm9_*.py in example folder not work...

python qm9_tutorial.py
INFO:root:get dataset
INFO:schnetpack.data.atoms:Starting download
INFO:root:Downloading GDB-9 data...
INFO:root:Done.
INFO:root:Extracting files...
INFO:root:Done.
INFO:root:Parse xyz files...
INFO:root:Parsed: 10000 / 133885
INFO:root:Parsed: 20000 / 133885
INFO:root:Parsed: 30000 / 133885
INFO:root:Parsed: 40000 / 133885
INFO:root:Parsed: 50000 / 133885
INFO:root:Parsed: 60000 / 133885
INFO:root:Parsed: 70000 / 133885
INFO:root:Parsed: 80000 / 133885
INFO:root:Parsed: 90000 / 133885
INFO:root:Parsed: 100000 / 133885
INFO:root:Parsed: 110000 / 133885
INFO:root:Parsed: 120000 / 133885
INFO:root:Parsed: 130000 / 133885
INFO:root:Write atoms to db...
INFO:root:Done.
INFO:root:Downloading GDB-9 atom references...
INFO:root:Done.
INFO:schnetpack.data.loader:statistics will be calculated...
INFO:root:build model
Traceback (most recent call last):
File "qm9_tutorial.py", line 38, in
spk.Atomwise(
AttributeError: module 'schnetpack' has no attribute 'Atomwise'

and

python qm9_schnet.py
Traceback (most recent call last):
File "qm9_schnet.py", line 20, in
output = schnetpack.atomistic.Atomwise()
TypeError: init() missing 1 required positional argument: 'n_in'

help help, I use the latest code pulled from github and python3.7, it seems the schnetpack version I use is wrong?

Eval mode of all scripts broken by torch.load

I think recently a change was made to save the model instead of the state as a checkpoint (which I agree is an improvement). Anyway, all scripts are still using load_state_dict instead of just model = torch.load(...), so the scripts now give an error when running eval like: AttributeError: 'DataParallel' object has no attribute 'copy'.

LoggingHook for training data

Hi,

There is a problem in the LoggingHook for training batch.

def on_batch_end(self, trainer, train_batch, result, loss):
        if self.log_train_loss:
            self._train_loss += float(loss.data)
            self._counter += 1

The self.train_loss is never divided by self._counter so when it is logged, it depends on the how many batches the training data is divided into. It is not simply divided by the count, since the batch size might not be a factor of the total size, so I don't know what is the best way to fix it. Could you fix that? Thanks!

qm9 dataset has no attribute "properties"

qm9 dataset has an attribute called "available_properties", but I think it should be "properties" to be consistent with the rest. The "schnetpack_qm9.py" script also refers to "properties" which currently doesn't exist.

hdnn.py distances not correct for periodically repeated bulk structure

Hi,

It seems that the distances calculated in the hdnn.py is not correct.

 # Compute radial functions
        if self.RDF is not None:
            # Get atom type embeddings
            Z_rad = self.radial_Z(Z)
            # Get atom types of neighbors
            Z_ij = snn.neighbor_elements(Z_rad, neighbors)
            # Compute distances
            distances = snn.atom_distances(
                positions, neighbors, neighbor_mask=neighbor_mask
            )
            radial_sf = self.RDF(
                distances, elemental_weights=Z_ij, neighbor_mask=neighbor_mask
            )
        else:
            radial_sf = None

If I have a Cu bulk with a single Cu atom repeated in all 3 directions, the distances vector looks like [0, 0, 0,....,0]. I believe it should be [a, a, a, ..... , 2a etc] if a is the lattice constant. This is probably because the cell_offset is not used in the calculation? Could you have a look at it? Thanks!

Extremely low GPU utilization

Hi, so I was playing a bit around with training SchNetPack. I noticed that when using --cuda on my Titan V it has very very low GPU utilization. It uses around 10GB of GPU memory but is practically 90% of the time at 0% GPU utilization with some very short spikes in between (of around a second) where it seems to compute. I tried it with the QM9 and ANI1 examples. QM9 has slightly better utilization than ANI1 but still around 60-70% idle.

Have you also noticed such behavior?

Improve documentation

Add missing doc strings
Write tutorials
Include into main doc (index.rst)

You can check the results at https://schnetpack.readthedocs.io/en/latest/

Accidental letter in schnetpack_matproj.py

Thank you for providing such a nice package together with the source code. I noticed a very small issue in the script "schnetpack_matproj.py", where on line 210 there is an additional letter "f" on column 18. This prevents the script from running.

RMSE logging when training data contains molecules of varying sizes

I'm unsure whether or not I'm getting representative RMSE logging when training on a data set that contains molecules of varying sizes (from 55 atoms to 190 atoms).

If I'm not mistaken, the following piece of code is run (per batch?) when logging the RMSE of a target:

https://github.com/atomistic-machine-learning/schnetpack/blob/master/src/schnetpack/metrics.py#L68

    def add_batch(self, batch, result):
        y = batch[self.target]
        if self.model_output is None:
            yp = result
        else:
            yp = result[self.model_output]

        diff = self._get_diff(y, yp)
        self.l2loss += torch.sum(diff.view(-1)).detach().cpu().data.numpy()
        self.n_entries += np.prod(y.shape)

https://github.com/atomistic-machine-learning/schnetpack/blob/master/src/schnetpack/metrics.py#L132

    def aggregate(self):
        return np.sqrt(self.l2loss / self.n_entries)

As far as I understand, there's some zero padding happening in the background when the training batch contains molecules of varying sizes. Am I correct in assuming that this zero padding increases the number self.n_entries (when using np.prod(y.shape)) such that my mean (for the smaller molecules) becomes a lot smaller than it should be?

Evaluation.py _get_predicted(self, device) for clusters with different number of atoms

Hi,

I am trying to predict the forces of clusters using the best_model. However, it seems that if the clusters are of various size, the _get_predicted function would not work. The problem is in

for p in predicted.keys():
            predicted[p] = np.vstack(predicted[p])

ValueError: all the input array dimensions except for the concatenation axis must match exactly

Thanks!

readme should be updated?

there is no schnetpack_qm9.py in folder. so is the readme outdated?

sorry, but master branch still not work...

python spk_run.py train schnet qm9 qm9_data qm9_model --split 1000 200
Traceback (most recent call last):
File "spk_run.py", line 5, in
from schnetpack.utils.script_utils import settings
ImportError: cannot import name 'settings' from 'schnetpack.utils.script_utils'

Test Layers

create tests for schnetpack/nn

Storing the symmetry functions calculated

Hi,

I am wondering whether there is an easy way to store the symmetry functions calculated for the data, so it doesn't need to calculate them for every epoch. It is a very time consuming step for my model. Thanks!

Best,
Mingjie

Possible incorrect variable used in schnetpack_matproj.py

The following part in "schnetpack_matproj.py" near line 257:

    split_path = os.path.join(args.modelpath, 'split.npz')
    if args.mode == 'train':
        if args.split_path is not None:
            copyfile(args.split_path, split_path)

    data_train, data_val, data_test = mp.create_splits(*train_args.split, split_file=split_path)

seems to create problems when the split is being specified as two integers in train_args.split instead of a split file in args.split_path. The mp.create_splits()-function will ignore the train_args.split parameters as the split_path-variable will always be a valid string. One way to correct the behaviour could be with:

    split_path = None
    if args.mode == 'train':
        if args.split_path is not None:
            split_path = os.path.join(args.modelpath, 'split.npz')
            copyfile(args.split_path, split_path)

    data_train, data_val, data_test = mp.create_splits(*train_args.split, split_file=split_path)

Using trained model as ASE calculator

I'm trying to used a trained model as an calculator object in ASE, however i get an error.

Traceback (most recent call last):
  File "opti.py", line 30, in <module>
    dyn.run(fmax=0.01)
  File "/usr/local/lib/python3.6/dist-packages/ase/optimize/optimize.py", line 174, in run
    f = self.atoms.get_forces()
  File "/usr/local/lib/python3.6/dist-packages/ase/atoms.py", line 735, in get_forces
    forces = self._calc.get_forces(self)
  File "/usr/local/lib/python3.6/dist-packages/ase/calculators/calculator.py", line 460, in get_forces
    return self.get_property('forces', atoms)
  File "/usr/local/lib/python3.6/dist-packages/ase/calculators/calculator.py", line 493, in get_property
    self.calculate(atoms, [name], system_changes)
  File "/usr/local/lib/python3.6/dist-packages/schnetpack/md.py", line 94, in calculate
    model_results = self.model(model_inputs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/schnetpack/atomistic.py", line 55, in forward
    inputs['representation'] = self.representation(inputs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/schnetpack/representation/schnet.py", line 199, in forward
    v = interaction(x, r_ij, neighbors, neighbor_mask, f_ij=f_ij)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/schnetpack/representation/schnet.py", line 52, in forward
    v = self.cfconv.forward(x, r_ij, neighbors, neighbor_mask, f_ij=f_ij)
  File "/usr/local/lib/python3.6/dist-packages/schnetpack/nn/cfconv.py", line 67, in forward
    y = torch.gather(y, 1, nbh)
RuntimeError: Invalid index in gather at /pytorch/aten/src/TH/generic/THTensorMath.cpp:620

Have been trying with different ASE scripts, geometry optimization, phonon dispersion calculation etc, but getting similar errors. Attached are the files for a geometry optimization of a 6,0 SWCNT with the trained model.

Any tips on how to fix this?

6,0_opti.zip

Error when running schnetpack_qm9.py in wacsf mode

Hi,
when running the schnetpack_qm9.py script this way:
schnetpack_qm9.py train wacsf qm9.db model2/ --split 100 100

I got the output:

INFO:root:Random state initialized with seed 3298687774
INFO:root:QM9 will be loaded...
INFO:schnetpack.data.atoms:The dataset has already been downloaded and stored at qm9.db
INFO:root:create splits...
INFO:root:load data...
INFO:root:calculate statistics...
INFO:schnetpack.data.loader:statistics will be calculated...
Traceback (most recent call last):
  File "/home/oliviermt/miniconda3/bin/schnetpack_qm9.py", line 78, in <module>
    representation = get_representation(args, train_loader=train_loader)
  File "/home/oliviermt/miniconda3/lib/python3.6/site-packages/scripts/script_utils/model.py", line 12, in get_representation
    if args.cutoff_function == "hard":
AttributeError: 'Namespace' object has no attribute 'cutoff_function'

And there is no error when using the schnet mode.

In file src/scripts/script_utils/model.py, the # build cutoff module block should be inside the if args.model == "schnet" block if it is not used in wacsf.

I corrected it here and commented the old block and it seems to be working.

Thanks a lot!

Force training with angular symmetry functions for clusters with different number of atoms

Hi,

It seems that the trainer is not working properly when I did force training with angular symmetry functions for clusters with different number of atoms. (ElementalEnergy)

When I did the training for clusters with same number of atoms, it is OK. By looking at the log, it seems that the losses are going down. However, if I use it for clusters with different number of atoms, the losses become nan. This applies to both "Behler" and "Weighted" mode. Changing the learning rate of the Adam optimizer does not help. I don't know exactly why this is the case. My guess is that it could be due to the padding you do to the "representation" when you have clusters of different size. Could you have a look at that? Thanks!

from schnetpack.data import AtomsData
import torch
import torch.nn.functional as F
from torch.optim import Adam
from torch.optim import LBFGS
import pickle 
import schnetpack as spk
import schnetpack.atomistic as atm
import schnetpack.representation as rep
from schnetpack.datasets import *
import schnetpack.evaluation as eva
from schnetpack.metrics import MeanSquaredError

Name = 'debug'  # (ref:Name)

if not os.path.exists(Name):
    os.makedirs(Name)


data = AtomsData('./db/master-jb.db', required_properties=['energy','forces'],collect_triples=True)

#40 training data, 5 val data, rest are for test
train, val, test = data.create_splits(30, 30)
train_loader = spk.data.AtomsLoader(train, batch_size=3, num_workers=0)
val_loader = spk.data.AtomsLoader(val)
test_loader = spk.data.AtomsLoader(test)
loader = [train_loader, val_loader, test_loader]
pickle.dump(loader, open('./'+Name+'/loader.sav','wb'))

reps = rep.BehlerSFBlock(n_radial=2, n_angular=2, elements=frozenset([46,79]), cutoff_radius=6.0, mode = 'weighted')
#print(reps.n_symfuncs)
output = ElementalEnergy(n_in=reps.n_symfuncs,n_layers=3,n_hidden=10, elements=frozenset([46,79]),return_force=True,create_graph=True)
model = atm.AtomisticModel(reps, output)

trainable_params = filter(lambda p: p.requires_grad, model.parameters())

metric_E = [MeanSquaredError(target = 'energy',model_output='y'),MeanSquaredError(target = 'forces',model_output='dydx')]
hook_E = spk.train.CSVHook("./"+Name, metric_E)

opt = Adam(trainable_params, lr=1e-4)

loss = lambda b, p: F.mse_loss(p["y"], b['energy'])+F.mse_loss(p["dydx"], b['forces'])
trainer = spk.train.Trainer(Name+"/", model, loss,
                      opt, train_loader, val_loader,hooks = [hook_E])

# start training
trainer.train(torch.device("cpu"))

Best,
Mingjie

ValueError: invalid filename or file not found

Hi all,
I have installed schentpack on windows .
python version is 3.7.3
when i run pytest it returns the following error:
ValueError: invalid filename or file not found "c:\users\fariba\appdata\local\programs\python\python37\lib\site-packages\schnetpack-0.2.1-py3.7.egg\schnetpack\sacred\calculator_ingredients.py"
Any comment is appreciated.

No inheritance of ASEEnvironmentProvider

schnetpack/src/schnetpack/environment.py

Line 55 in 9caa03c

class ASEEnvironmentProvider:

There are no functional issues, but I think ASEEnvironmentProvider should inherit BaseEnvironmentProvider like below.

class ASEEnvironmentProvider(BaseEnvironmentProvider):

And in my opinion, the name of the class should be changed to AseEnvironmentProvider because in the other module (e.g., md) the word Ase is used for ASE related classes (e.g., md.AseInterface). The naming conventions should be used consistently.

Restarting training in CUDA mode

Hi!
I am still working with the src/scripts/schnetpack_qm9.py script.
When the training is stopped, I suppose I should be able to restart it from the checkpoint-xx.pth.tar files.
However, if the training is done with CUDA, an error occurs the second time:

INFO:root:training...
Traceback (most recent call last):
  File "/home/olimt/projects/rrg-cotemich-ac/olimt/programs/schnetpack/src/scripts/schnetpack_qm9.py", line 134, in <module>
    train(args, model, train_loader, val_loader, device, metrics=metrics)
  File "/home/olimt/miniconda3/lib/python3.7/site-packages/scripts/script_utils/training.py", line 54, in train
    trainer.train(device, n_epochs=args.n_epochs)
  File "/home/olimt/miniconda3/lib/python3.7/site-packages/schnetpack/train/trainer.py", line 245, in train
    raise e
  File "/home/olimt/miniconda3/lib/python3.7/site-packages/schnetpack/train/trainer.py", line 175, in train
    self.optimizer.step()
  File "/home/olimt/miniconda3/lib/python3.7/site-packages/torch/optim/adam.py", line 93, in step
    exp_avg.mul_(beta1).add_(1 - beta1, grad)
RuntimeError: expected backend CPU and dtype Float but got backend CUDA and dtype Float

There's probably a missing .to('cuda') somewhere.
I'll try to look into it more, but I wanted to let you know.
Thanks!

GDML from the sgdml package

I talked to @stefanch, and maybe it makes more sense to have the GDML code in the sgdml package, and just import it in schnetpack and have sgdml as a dependency. What do you think about that? If you agree, should I make sgdml a hard or an optional "extras" dependency of schnetpack?

Reproducing paper result

Hi,

Can you let me know what setting of parameters, optimizers, split and seed to reproduce the result of
https://arxiv.org/pdf/1712.06113.pdf?

For homo, I tried:
python3 src/scripts/schnetpack_qm9_new.py train schnet data/seed_2000/homo/data/qm9.db data/seed_2000/homo/model --split 109000 1000 --cuda --property homo --seed 2000 --parallel

I got train/val/test mae as 0.00497/0.04701/0.04877, but the test mae stated in the paper is 0.041. What should I do to get this result?

Should I change the split to "--split 110000 1000" and batch size to "--batch_size 32" following the arxiv paper?

An error when running schnetpack_qm9.py

Force training

Was wondering what is the proper way to add forces to the loss function?
Currently I have my loss function defined like this

def loss(batch, result):
	
	N = torch.sum((batch['_atomic_numbers'] != 0).float(), 1, keepdim=True) # Nr of atoms per image
	
	gamma = (result['y']/N - batch['energy']/N)**2
	
	return torch.sum(gamma)

which works fine.
However if i try to add forces to the loss function like this

def loss(batch, result):
	
	N = torch.sum((batch['_atomic_numbers'] != 0).float(), 1, keepdim=True) # Nr of atoms per image
	
	gammaE = (result['y']/N - batch['energy']/N)**2
	gammaF = torch.sum(torch.sum((result['dydx'] - batch['forces'])**2, 2), 1, keepdim=True)/(3*N)
	
	return torch.sum(gammaE + gammaF)

it stops working. I get a value of the loss function for the first image then the function just returns nan. Not sure what the problem is here, is there a better way to include forces in the loss function? or is there a bug somewhere.

md.py load_model error

By trying to load an existing model for md-calculations, I run into this error:

model = md.load_model("test_model")
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.6/site-packages/schnetpack/md.py", line 377, in load_model
model.load_state_dict(torch.load(os.path.join(modelpath, 'best_model')))
File "/usr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 769, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for AtomisticModel:
Unexpected key(s) in state_dict: "output_modules.atomref.weight".

Element wise MeanSquaredError

Hi,

I'm getting an error when i try and run RootMeanSquaredError('forces','dydx',element_wise=True) in my training script.

I fixed the problem by changing
self.n_entries += torch.sum(batch[Structure.atom_mask]) * y.shape[-1]
in line 135 in metrics.py to
self.n_entries += torch.sum(batch[Structure.atom_mask]).detach().cpu().data.numpy() * y.shape[-1]
is this a good solution?

/Daniel

Benchmark results in docs

As suggested in Issue #77, we should have a table of benchmark results in the docs for our scripts.

It would be great to have a script for that, so we can run it before every release to update the results table. It would take a set of cmd arguments for the scripts, run the models on our cluster and create a file with the table that can directly be parsed by the docs (probably rst).

wrong split argument in evaluation when getting loaders

For spk_run the split argument for train is [n_train, n_val] while for eval it's ["train", "validation", "test"]. When getting the loaders for train/val/test data (line 34), args is passed, which works in train mode but in eval mode it's not the number of samples anymore but instead the list with up to 3 fold names (with all 3 that's also one argument to many that is passed to the train_test_split function). Not sure though if it should be replaced by train_args instead since then the batch_size is not what was specified.

Total energy (Uo) reproduction (QM9)

Good day. I just found your tool and wanted to get acquainted with it by reproducing the results for QM9 data. I set the available parameters according to the values given in the articles [J.Chem.Phys,148,241722(2018) and Adv.N.Inf.Proc.Sys,30 (2017),pp.992-1002]. The parameter 'an exponential moving average over weights with decay rate 0.99' I set in schnetpack_qm9.py in Adam optimizer (line 185) "weight_decay=0.99".
Thus, my command is:
python3 schnetpack_qm9.py train schnet qm9.db output --split 110000 1000 --batch_size 32 --lr 0.001 --lr_decay 0.96 --features 64 --cutoff 1000 --cuda
For the first run, after ~2500 epochs the min RMSE_U0 was 1.2 kcal/mol. For the second run the min. was 2.6 kcal/mol at ~ 100th epoch, after which the RMSE started to increase and turned to NaN at the end.
I would thus appreciate any guidance in parameters setting to obtain published results of 0.3-0.4 kkcal/mol.

KeyError: '_neighbor_pairs_j'

Hi, I was trying to run the hdnn model with my own ase database but it is giving me an error.

`-------------------------------------------------------------------------
KeyError Traceback (most recent call last)
in ()
32
33 # start training
---> 34 trainer.train(torch.device("cpu"))

~\Anaconda3\lib\site-packages\schnetpack\train\trainer.py in train(self, device)
214 h.on_train_failed(self)
215
--> 216 raise e

~\Anaconda3\lib\site-packages\schnetpack\train\trainer.py in train(self, device)
142 }
143
--> 144 result = self._model(train_batch)
145 loss = self.loss_fn(train_batch, result)
146

~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in call(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)

~\Anaconda3\lib\site-packages\schnetpack\atomistic.py in forward(self, inputs)
53 if self.requires_dr:
54 inputs[Structure.R].requires_grad_()
---> 55 inputs['representation'] = self.representation(inputs)
56
57 if isinstance(self.output_modules, nn.ModuleList):

~\Anaconda3\lib\site-packages\schnetpack\representation\hdnn.py in forward(self, inputs)
195 if self.ADF is not None:
196 # Get pair indices
--> 197 idx_j = inputs[Structure.neighbor_pairs_j]
198 idx_k = inputs[Structure.neighbor_pairs_k]
199 neighbor_pairs_mask = inputs[Structure.neighbor_pairs_mask]

KeyError: '_neighbor_pairs_j'`

The database I use is a ase db with 50 Au-Pd clusters with 13 atoms.

import torch
import torch.nn.functional as F
from torch.optim import Adam

import schnetpack as spk
import schnetpack.atomistic as atm
import schnetpack.representation as rep
from schnetpack.datasets import *

data = AtomsData('./db/Icosahedron-2-large-unique-50-test.db', properties=['energy'])
# split in train and val
train, val, test = data.create_splits(10, 10)
loader = spk.data.AtomsLoader(train, batch_size=2, num_workers=1)
val_loader = spk.data.AtomsLoader(val)

# create model
reps = rep.BehlerSFBlock(n_radial=22, n_angular=5, elements=frozenset([46,79]))
output = atm.ElementalAtomwise(reps.n_symfuncs)
model = atm.AtomisticModel(reps, output)

# filter for trainable parameters (https://github.com/pytorch/pytorch/issues/679)
trainable_params = filter(lambda p: p.requires_grad, model.parameters())

# create trainer
opt = Adam(trainable_params, lr=1e-4)
loss = lambda b, p: F.mse_loss(p["y"], b['energy'])
trainer = spk.train.Trainer("wacsf/", model, loss,
                      opt, loader, val_loader)

# start training
trainer.train(torch.device("cpu"))```

Did I do something wrong in the code? Thanks!

Bug in Train Embeddings for Elemental References?

I think the train_embeddings value of Atomwise is used incorrectly. When setting the default references in the constructor for Atomwise, the value of freeze for the embeddings class is equal to train_embeddings which means that if train_embeddings == True then the embeddings are not trained.

Do I understand that correctly? And, if so, could I fix it by changing it to freeze=not train_embeddings?

One of the two relevant lines:

schnetpack/src/schnetpack/atomistic.py

Line 141 in 9caa03c

freeze=train_embeddings)

CUDA memory issue during optimization

I have been training on various size water clusters and trying to optimize a water 256 using the trained model. The training process worked fine, but when I try to optimize, I always get the following CUDA memory error. When I run on CPU, the optimization is pretty slow(5min per step). I would like to ask, is the requirement of this much memory a legitimate thing and what is causing it so memory intensive? Or do you have any idea what might went wrong?
Thank you in advance for any help your could possibly provide!

Traceback (most recent call last):
File "optimize_water_256_wacsf.py", line 15, in
print("forces:", atoms.get_forces())
File "/home/xiaowei/.local/lib/python3.6/site-packages/ase/atoms.py", line 714, in get_forces
forces = self._calc.get_forces(self)
File "/home/xiaowei/.local/lib/python3.6/site-packages/ase/calculators/calculator.py", line 519, in get_forces
return self.get_property('forces', atoms)
File "/home/xiaowei/.local/lib/python3.6/site-packages/ase/calculators/calculator.py", line 552, in get_property
self.calculate(atoms, [name], system_changes)
File "/home/xiaowei/miniconda3/lib/python3.6/site-packages/schnetpack-0.2.1-py3.6.egg/schnetpack/ase_interface.py", line 92, in calculate
File "/home/xiaowei/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 494, in call
result = self.forward(*input, **kwargs)
File "/home/xiaowei/miniconda3/lib/python3.6/site-packages/schnetpack-0.2.1-py3.6.egg/schnetpack/atomistic.py", line 61, in forward
File "/home/xiaowei/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 494, in call
result = self.forward(*input, **kwargs)
File "/home/xiaowei/miniconda3/lib/python3.6/site-packages/schnetpack-0.2.1-py3.6.egg/schnetpack/representation/hdnn.py", line 366, in forward
File "/home/xiaowei/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 494, in call
result = self.forward(*input, **kwargs)
File "/home/xiaowei/miniconda3/lib/python3.6/site-packages/schnetpack-0.2.1-py3.6.egg/schnetpack/representation/hdnn.py", line 238, in forward
File "/home/xiaowei/miniconda3/lib/python3.6/site-packages/schnetpack-0.2.1-py3.6.egg/schnetpack/nn/neighbors.py", line 204, in neighbor_elements
RuntimeError: CUDA out of memory. Tried to allocate 1.68 GiB (GPU 0; 7.77 GiB total capacity; 6.02 GiB already allocated; 648.62 MiB free; 17.94 MiB cached)

The following is the code I have been using to run optimization.

import torch
from schnetpack.ase_interface import SpkCalculator
from ase import Atoms
from ase.io import read
from ase.optimize import BFGS

path_to_model = "XX_water_wacsf/best_model"
model = torch.load(path_to_model)

atoms = read('water_256.xyz')
calc = SpkCalculator(model, device="cuda")
atoms.set_calculator(calc)
print("forces:", atoms.get_forces())
print("total_energy", atoms.get_potential_energy())
dyn = BFGS(atoms,trajectory='water_256_opt_BFGS_wacsf.traj',restart='water_256_opt_BFGS_wacsf.pckl')
dyn.run(fmax=0.05)

Training restart broken after change in LRReduceOnPlateauHook

The schedule in the hooked is not initialized when the model is restored, which leads to a crash.

JCTC paper training example

Thanks a lot for the great code!

If I tried to run the training example from the paper (Chart 1), I got some errors. If the ´import´ section should stay untouched the following lines should be changed to:
'''
15: loader = spk.data.AtomLoader...
18: val_loader = spk.data.AtomsLoader...
28: trainer = spk.train.Trainer...

run qm9

question about wACSFs

With the wACSF descriptors, is there still a separate neural network per element in the system as with the ACSFs?

Share trained model

Hi,

I am really interested in the SchNet's performance in crystals. I wonder if you could share the best trained models, especailly the one trained with Materials Project data?

Thanks!

Weike

Evaluation on validation split fails

The following command produces an empty evaluation.txt:

spk_run.py eval modeldir --split validation

The reason is that the split name in code is expected to be "val". However, the util parsing script only allows the full word, i.e. validation. So I think the code where "val" is still used should be changed to "validation" as well.

Increase test coverage to >90%

Just as a motivational goal...

No activation function in last dense layer in filter generator

Hey!

I have a question regarding the filter generator network. In your code you don't use the SSP activation after the last dense layer in the filter generator, whereas in all publication figures there seems to be one.

schnetpack/src/schnetpack/representation/schnet.py

Line 40 in cdecaed

Dense(n_filters, n_filters),

Was this on purpose?
It's probably really minor, but I was curious.

Thanks!

Problem with schnetpack_qm9.py

Hello,

I would like to ask if schnetpack_qm9.py is still working. No matter what I try, I eventually run into RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:441.

I only create a new virtualenv, pip install schnetpack and then I run:
schnetpack_qm9.py train schnet QM9 TestRun --split 10000 10000 --cuda

Output is following (when I already downloaded QM9):

INFO:root:Random state initialized with seed 3679303364
INFO:root:QM9 will be loaded...
INFO:root:create splits...
INFO:root:load data...
INFO:root:calculate statistics...
INFO:root:cached statistics was loaded...
INFO:root:The model you built has: 1676133 parameters
INFO:root:training...
Traceback (most recent call last):
  File "/home/kubaw/.virtualenvs/schnet/bin/schnetpack_qm9.py", line 357, in <module>
    train(args, model, train_loader, val_loader, device)
  File "/home/kubaw/.virtualenvs/schnet/bin/schnetpack_qm9.py", line 177, in train
    trainer.train(device)
  File "/home/kubaw/.virtualenvs/schnet/lib/python3.7/site-packages/schnetpack/train/trainer.py", line 216, in train
    raise e
  File "/home/kubaw/.virtualenvs/schnet/lib/python3.7/site-packages/schnetpack/train/trainer.py", line 144, in train
    result = self._model(train_batch)
  File "/home/kubaw/.virtualenvs/schnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/kubaw/.virtualenvs/schnet/lib/python3.7/site-packages/schnetpack/atomistic.py", line 55, in forward
    inputs['representation'] = self.representation(inputs)
  File "/home/kubaw/.virtualenvs/schnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/kubaw/.virtualenvs/schnet/lib/python3.7/site-packages/schnetpack/representation/schnet.py", line 191, in forward
    r_ij = self.distances(positions, neighbors, cell, cell_offset)
  File "/home/kubaw/.virtualenvs/schnet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/kubaw/.virtualenvs/schnet/lib/python3.7/site-packages/schnetpack/nn/neighbors.py", line 76, in forward
    return atom_distances(positions, neighbors, cell, cell_offsets, return_directions=self.return_directions)
  File "/home/kubaw/.virtualenvs/schnet/lib/python3.7/site-packages/schnetpack/nn/neighbors.py", line 36, in atom_distances
    offsets = cell_offsets.bmm(cell)
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:441

I am not sure where this error comes from because for me a basic pytorch nn is working.
I have also noticed that the version installed with pip needs a folder as datadir, the version installed with setup.py needs a .db file as datadir. It also seems that there is recent work on scripts using sacred, so it may also be, that old scripts are not working anymore. It would be great if you could confirm, that the script is working as then I could search for the origin of the error elsewhere; or if you had an idea where this error comes from.

Scalability of SchNet

Hi guys,

I was playing around with the package and wanted to know what the limits of SchNet are. So I tried to feed a protein (trypsin, 1700 atoms) into the network using the default settings and ran into some Cuda out of memory errors (TitanV 12GB).
I tried to scale the features, number of interactions blocks etc. down while using a batch size of 1 and still did not get it to work.
So what is your experience with the scalability of this network? Do you think it is due to the model itself (including distance matrices, features, rbf etc.), the implementation (optimizing it a bit more) or did I approach it wrong?

In case you want to reproduce it, I made a small script and a .db file including 100x trypsin with a dummy energy value.

Thanks for your help and this nice package.

import torch
import torch.nn.functional as F
from torch.optim import Adam

import schnetpack as spk
from schnetpack.data import AtomsData
import schnetpack.atomistic as atm
import schnetpack.representation as rep

data = AtomsData('3ptb.db', properties=['energy'])

# split in train and val
train, val, test = data.create_splits(80, 20)
loader = spk.data.AtomsLoader(train, batch_size=1, num_workers=1)
val_loader = spk.data.AtomsLoader(val)

# create model
reps = rep.SchNet(
    n_atom_basis=32,
    n_filters=128,
    n_interactions=1,
    cutoff=5.0,
    n_gaussians=25,
    normalize_filter=False,
    coupled_interactions=True,
    return_intermediate=False,
    max_z=100,
    trainable_gaussians=False,
    distance_expansion=None)
output = atm.Atomwise()
model = atm.AtomisticModel(reps, output).cuda()

opt = Adam(model.parameters(), lr=1e-4)
loss = lambda b, p: F.mse_loss(p["y"], b['energy'])
trainer = spk.train.Trainer("output/", model, loss, opt, loader, val_loader)

trainer.train(torch.device("cuda"))

3ptbdb.zip

Issue with materials project test script

Hi,

I am getting an error when trying to train the network using the included script for training the materials project data. The model seems to train fine when I set --property to anything other than band_gap, but when I use band_gap I am getting some KeyError:

Traceback (most recent call last):
  File "matproj.py", line 273, in <module>
    mean, stddev = train_loader.get_statistics(train_args.property, False)
  File "/home/paperspace/anaconda3/envs/ml/lib/python3.6/site-packages/schnetpack/data.py", line 408, in get_statistics
    for row in self:
  File "/home/paperspace/anaconda3/envs/ml/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 336, in __next__
    return self._process_next_batch(batch)
  File "/home/paperspace/anaconda3/envs/ml/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
KeyError: 'Traceback (most recent call last):\n  File "/home/paperspace/anaconda3/envs/ml/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop\n    samples = collate_fn([dataset[i] for i in batch_indices])\n  File "/home/paperspace/anaconda3/envs/ml/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in <listcomp>\n    samples = collate_fn([dataset[i] for i in batch_indices])\n  File "/home/paperspace/anaconda3/envs/ml/lib/python3.6/site-packages/schnetpack/data.py", line 100, in __getitem__\n    prop = row.data[p]\nKeyError: \'band_gap\'\n'

Any help would be greatly appreciated. Thank you.

Dataloader stuck with AseEnvironmentProvider

Hi guys,

thanks for the suggestions with the AseEnvironmentProvider.
It worked well with my dummy test set of 100 structures, but I discovered that for a larger database size the whole data loading procedure gets stuck. This means when the data is initialized the first time (either during loader.get_statistics() or in the trainer function later) it doesn't finish. I also had an issue where it continued to fill up my ram and I needed to stop at 90Gb.
I set the number of workers in the dataloader to 0 to debug a bit and it seems to be stuck in the ase neighbour list function.
I tested this also with small molecules and its reproducible with the qm9 dataset.

Have you experienced something similar?

I used the same script as before with the AseEnvironment addition. The issue already arises with a database size of 3000 molecules.

Thanks for your help!

import torch
import torch.nn.functional as F
from torch.optim import Adam

import schnetpack as spk
from schnetpack.data import AtomsData
import schnetpack.atomistic as atm
import schnetpack.representation as rep

cutoff = 5.  # Angstrom
environment_provider = spk.environment.AseEnvironmentProvider(cutoff)
data = AtomsData('trypsin.db', required_properties=['energy'], environment_provider=environment_provider)

# split in train and val
train, val, test = data.create_splits(2800, 100)
loader = spk.data.AtomsLoader(train, batch_size=10, num_workers=4)
val_loader = spk.data.AtomsLoader(val)

# create model
reps = rep.SchNet(
    n_atom_basis=128,
    n_filters=128,
    n_interactions=1,
    cutoff=5.0,
    n_gaussians=25,
    normalize_filter=False,
    coupled_interactions=False,
    return_intermediate=False,
    max_z=100,
    trainable_gaussians=False,
    distance_expansion=None)
output = atm.Atomwise()
model = atm.AtomisticModel(reps, output).cuda()

opt = Adam(model.parameters(), lr=1e-3)
loss = lambda b, p: F.mse_loss(p["y"], b['energy'])
out_dir = 'test'
trainer = spk.train.Trainer(out_dir, model, loss, opt, loader, val_loader,
                            hooks=[spk.train.CSVHook(f'{out_dir}/log',
                                                     [spk.metrics.MeanAbsoluteError('energy', "y"),
                                                      spk.metrics.RootMeanSquaredError('energy', "y")],
                                                     every_n_epochs=1)])
trainer.train(torch.device("cuda"))

Division by Zero

Hey folks,

I'm trying this out with the ROCM version of Pytorch so chances are there's an issue there

python3 src/scripts/schnetpack_qm9.py train schnet ./tests/data/test_qm9.db .  --split 2 2 --cuda
INFO:root:Random state initialized with seed 1534720078
INFO:root:QM9 will be loaded...
INFO:schnetpack.data.atoms:The dataset has already been downloaded and stored at ./tests/data/test_qm9.db
INFO:root:create splits...
INFO:root:load data...
INFO:root:calculate statistics...
INFO:root:cached statistics was loaded...
INFO:root:The model you built has: 1676033 parameters
INFO:root:training...
Traceback (most recent call last):
  File "src/scripts/schnetpack_qm9.py", line 443, in <module>
    train(args, model, train_loader, val_loader, device)
  File "src/scripts/schnetpack_qm9.py", line 221, in train
    trainer.train(device)
  File "/home/philix/.pyenv/versions/3.7.2/lib/python3.7/site-packages/schnetpack-0.2.1-py3.7.egg/schnetpack/train/trainer.py", line 239, in train
  File "/home/philix/.pyenv/versions/3.7.2/lib/python3.7/site-packages/schnetpack-0.2.1-py3.7.egg/schnetpack/train/trainer.py", line 217, in train
ZeroDivisionError: float division by zero

atomistic-machine-learning / schnetpack Goto Github PK

schnetpack's People

Contributors

Stargazers

Watchers

Forkers

schnetpack's Issues

Recommend Projects

Recommend Topics

Recommend Org