Comments (7)
Here is one code I am running right now for ~1500 structures in the training with 108 atoms in each structure.
from ase.io.trajectory import Trajectory from pymatgen.io.ase import AseAtomsAdaptor import json import numpy as np from maml.utils import pool_from, convert_docs from maml.base import SKLModel from maml.describers import BispectrumCoefficients from sklearn.linear_model import LinearRegression from maml.apps.pes import SNAPotential from ase.io import * train_energies=[] train_forces = [] train_structures=[] train_stresses = [] ase_adap = AseAtomsAdaptor() images = read('train2.traj',':') for i,atoms in enumerate(images): train_energies.append(atoms.get_potential_energy()) train_forces.append(atoms.get_forces()) atoms.set_pbc([1,1,1]) train_structures.append(ase_adap.get_structure(atoms)) train_pool = pool_from(train_structures, train_energies, train_forces) _, df = convert_docs(train_pool, include_stress=False) weights = np.ones(len(df['dtype']), ) weights[df['dtype'] == 'force'] = 1 weights[df['dtype'] == 'energy'] = 100000 element_profile = {'Cu': {'r': 5, 'w': 1}, 'Zr': {'r': 5, 'w': 1}, 'Al': {'r': 5, 'w': 1}, 'Nb': {'r': 5, 'w': 1}} describer = BispectrumCoefficients(rcutfac=1, twojmax=10, element_profile=element_profile, quadratic=True, pot_fit=True, include_stress=False, n_jobs=8, verbose=True) ml_model = LinearRegression() skl_model = SKLModel(describer=describer, model=ml_model) snap = SNAPotential(model=skl_model) snap.train(train_structures, train_energies, train_forces, include_stress=False, sample_weight=weights) snap.write_param()
I am running into mostly memory issues since after the descriptors have been calculated the job just gets killed.
I see. From my understanding, quadratic=True and twojmax=10 will significantly increase the model complexity. There will be more than 1000 hyper parameters in the descriptors (also depends on No. of atomic types in your system), much larger than 31 in the case I discussed above. This explains why there is memory error.
I won't be able to make any straight forward suggestion in this case, but from my knowledge, SNAP potentials can be already accurate for many systems with quadratic=False and twojmax=6 or 8. You may consider decrease your model complexity.
By the way, you may close the issue if you feel the problem is already resolved. Good luck!
from maml.
@vsumaria were are able to solve the problem? Does the descriptor work fine using serial computations or on other machines?
from maml.
I had to reduce the number of cores over which I was running the job and reduce the training set size significantly to be able to run without memory issues.
from maml.
I had to reduce the number of cores over which I was running the job and reduce the training set size significantly to be able to run without memory issues.
Memory can be an issue when the size of the features is large, while the allowed memory is insufficient. For bispectrum coefficients, the training data, the twojmax parameter as well as the linear or quadratic form can affect the feature dimension. However, 7500 structures should be fine.
Have you tired running the simulation with n_job = 1 and stilled failed with 7500 structure?
Have you tried to use a machine with larger allowed memory and got the issue somehow abated?
Above all, this issue should resolve when we allow sufficient memory, and 7500 structures should not be a challenge. If you still cannot use the full 7500 set for your training, you may provide us with your training set as well as the script you used for your training. We will look into it and see if we can reproduce the error.
from maml.
I agree. I had to move on from this particular project so could not try to run the calculation with n_job = 1, but I think it would have worked too.
Can there be a way to estimate the memory before generating the descriptors (which when using n_job=1 would take quite some time) so that a training run can be planned accordingly ?
from maml.
I agree. I had to move on from this particular project so could not try to run the calculation with n_job = 1, but I think it would have worked too.
Can there be a way to estimate the memory before generating the descriptors (which when using n_job=1 would take quite some time) so that a training run can be planned accordingly ?
From my knowledge, the largest memory consumption comes from the array of bispectrum coefficients of the structure list. For one structure with 60 atoms, we have (1 (energy) + 3 * 60 (forces) + 6 (stress)) = 187 set of descriptors. And the size of the descriptor depends on twojmax and linear of quadratic form of the SNAP. Here, we may assume twojmax = 6 and the SNAP is not quadratic, so there are 187 * 31 = 5797 hyper parameters for one structure. 7500 structures (assume all have 60 atoms) refers to 7500 * 5797 = 43477500 hyper parameters. A numpy array of this size consumes around 350 MB memory. So I really don't think 7500 structure can cause a large memory consumption.
It can be that your bispectrum coefficients have a very high complexity, or the space on your machine is very limited, or there is uncleaned history that shrinks the space. Again, if you still cannot use the full 7500 set for your training, you may provide us with your training set as well as the script you used for your training. We may look into it and see if we can reproduce the error. 😁
from maml.
Here is one code I am running right now for ~1500 structures in the training with 108 atoms in each structure.
from ase.io.trajectory import Trajectory
from pymatgen.io.ase import AseAtomsAdaptor
import json
import numpy as np
from maml.utils import pool_from, convert_docs
from maml.base import SKLModel
from maml.describers import BispectrumCoefficients
from sklearn.linear_model import LinearRegression
from maml.apps.pes import SNAPotential
from ase.io import *
train_energies=[]
train_forces = []
train_structures=[]
train_stresses = []
ase_adap = AseAtomsAdaptor()
images = read('train2.traj',':')
for i,atoms in enumerate(images):
train_energies.append(atoms.get_potential_energy())
train_forces.append(atoms.get_forces())
atoms.set_pbc([1,1,1])
train_structures.append(ase_adap.get_structure(atoms))
train_pool = pool_from(train_structures, train_energies, train_forces)
_, df = convert_docs(train_pool, include_stress=False)
weights = np.ones(len(df['dtype']), )
weights[df['dtype'] == 'force'] = 1
weights[df['dtype'] == 'energy'] = 100000
element_profile = {'Cu': {'r': 5, 'w': 1}, 'Zr': {'r': 5, 'w': 1}, 'Al': {'r': 5, 'w': 1}, 'Nb': {'r': 5, 'w': 1}}
describer = BispectrumCoefficients(rcutfac=1, twojmax=10, element_profile=element_profile, quadratic=True, pot_fit=True, include_stress=False, n_jobs=8, verbose=True)
ml_model = LinearRegression()
skl_model = SKLModel(describer=describer, model=ml_model)
snap = SNAPotential(model=skl_model)
snap.train(train_structures, train_energies, train_forces, include_stress=False, sample_weight=weights)
snap.write_param()
I am running into mostly memory issues since after the descriptors have been calculated the job just gets killed.
from maml.
Related Issues (20)
- ValueError: could not convert string to float: ' command mindist does not exist.\n' HOT 14
- Assigning weights for each group in SNAP.train HOT 4
- Pip package contains mypy cache HOT 3
- Bowsr `minimize()` still raises IndexError
- Question about radius in BOWSR readme example HOT 7
- BayesianOptimizer.optimize() too verbose HOT 1
- A code running error HOT 1
- Training data HOT 1
- `citation.cff`
- a Code running error (Example for MTP fitting) HOT 12
- How to use potential in lammps? HOT 1
- AtomSets | Validation set splitting HOT 1
- Difference compared to FitSNAP
- The problem of model training efficiency HOT 2
- MTP example notebook (issue with imports) HOT 2
- MLIP MTP notebook HOT 6
- Running error in the mtp example in notebook HOT 1
- [Bug]: Running error in the nnp example in notebook HOT 1
- An Error occurred when I try to run GAP code in notebooks/pes/gap/example.ipynb HOT 1
- [Bug]: `Parallel`/`multiprocessing` do not work for `Describer`s
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from maml.