Giter Site home page Giter Site logo

Memory handling about maml HOT 7 CLOSED

vsumaria avatar vsumaria commented on July 19, 2024
Memory handling

from maml.

Comments (7)

JiQi535 avatar JiQi535 commented on July 19, 2024 1

Here is one code I am running right now for ~1500 structures in the training with 108 atoms in each structure.

from ase.io.trajectory import Trajectory
from pymatgen.io.ase import AseAtomsAdaptor
import json
import numpy as np
from maml.utils import pool_from, convert_docs
from maml.base import SKLModel
from maml.describers import BispectrumCoefficients
from sklearn.linear_model import LinearRegression
from maml.apps.pes import SNAPotential
from ase.io import *

train_energies=[]
train_forces = []
train_structures=[]
train_stresses = []
ase_adap = AseAtomsAdaptor()

images = read('train2.traj',':')
for i,atoms in enumerate(images):
    train_energies.append(atoms.get_potential_energy())
    train_forces.append(atoms.get_forces())
    atoms.set_pbc([1,1,1])
    train_structures.append(ase_adap.get_structure(atoms))

train_pool = pool_from(train_structures, train_energies, train_forces)
_, df = convert_docs(train_pool, include_stress=False)
weights = np.ones(len(df['dtype']), )
weights[df['dtype'] == 'force'] = 1
weights[df['dtype'] == 'energy'] = 100000
element_profile = {'Cu': {'r': 5, 'w': 1}, 'Zr': {'r': 5, 'w': 1}, 'Al': {'r': 5, 'w': 1}, 'Nb': {'r': 5, 'w': 1}}

describer = BispectrumCoefficients(rcutfac=1, twojmax=10, element_profile=element_profile, quadratic=True, pot_fit=True, include_stress=False, n_jobs=8, verbose=True)

ml_model = LinearRegression()
skl_model = SKLModel(describer=describer, model=ml_model)
snap = SNAPotential(model=skl_model)

snap.train(train_structures, train_energies, train_forces, include_stress=False, sample_weight=weights)
snap.write_param()

I am running into mostly memory issues since after the descriptors have been calculated the job just gets killed.

I see. From my understanding, quadratic=True and twojmax=10 will significantly increase the model complexity. There will be more than 1000 hyper parameters in the descriptors (also depends on No. of atomic types in your system), much larger than 31 in the case I discussed above. This explains why there is memory error.

I won't be able to make any straight forward suggestion in this case, but from my knowledge, SNAP potentials can be already accurate for many systems with quadratic=False and twojmax=6 or 8. You may consider decrease your model complexity.

By the way, you may close the issue if you feel the problem is already resolved. Good luck!

from maml.

chc273 avatar chc273 commented on July 19, 2024

@vsumaria were are able to solve the problem? Does the descriptor work fine using serial computations or on other machines?

from maml.

vsumaria avatar vsumaria commented on July 19, 2024

I had to reduce the number of cores over which I was running the job and reduce the training set size significantly to be able to run without memory issues.

from maml.

JiQi535 avatar JiQi535 commented on July 19, 2024

I had to reduce the number of cores over which I was running the job and reduce the training set size significantly to be able to run without memory issues.

Memory can be an issue when the size of the features is large, while the allowed memory is insufficient. For bispectrum coefficients, the training data, the twojmax parameter as well as the linear or quadratic form can affect the feature dimension. However, 7500 structures should be fine.

Have you tired running the simulation with n_job = 1 and stilled failed with 7500 structure?

Have you tried to use a machine with larger allowed memory and got the issue somehow abated?

Above all, this issue should resolve when we allow sufficient memory, and 7500 structures should not be a challenge. If you still cannot use the full 7500 set for your training, you may provide us with your training set as well as the script you used for your training. We will look into it and see if we can reproduce the error.

from maml.

vsumaria avatar vsumaria commented on July 19, 2024

I agree. I had to move on from this particular project so could not try to run the calculation with n_job = 1, but I think it would have worked too.

Can there be a way to estimate the memory before generating the descriptors (which when using n_job=1 would take quite some time) so that a training run can be planned accordingly ?

from maml.

JiQi535 avatar JiQi535 commented on July 19, 2024

I agree. I had to move on from this particular project so could not try to run the calculation with n_job = 1, but I think it would have worked too.

Can there be a way to estimate the memory before generating the descriptors (which when using n_job=1 would take quite some time) so that a training run can be planned accordingly ?

From my knowledge, the largest memory consumption comes from the array of bispectrum coefficients of the structure list. For one structure with 60 atoms, we have (1 (energy) + 3 * 60 (forces) + 6 (stress)) = 187 set of descriptors. And the size of the descriptor depends on twojmax and linear of quadratic form of the SNAP. Here, we may assume twojmax = 6 and the SNAP is not quadratic, so there are 187 * 31 = 5797 hyper parameters for one structure. 7500 structures (assume all have 60 atoms) refers to 7500 * 5797 = 43477500 hyper parameters. A numpy array of this size consumes around 350 MB memory. So I really don't think 7500 structure can cause a large memory consumption.

It can be that your bispectrum coefficients have a very high complexity, or the space on your machine is very limited, or there is uncleaned history that shrinks the space. Again, if you still cannot use the full 7500 set for your training, you may provide us with your training set as well as the script you used for your training. We may look into it and see if we can reproduce the error. 😁

from maml.

vsumaria avatar vsumaria commented on July 19, 2024

Here is one code I am running right now for ~1500 structures in the training with 108 atoms in each structure.

from ase.io.trajectory import Trajectory
from pymatgen.io.ase import AseAtomsAdaptor
import json
import numpy as np
from maml.utils import pool_from, convert_docs
from maml.base import SKLModel
from maml.describers import BispectrumCoefficients
from sklearn.linear_model import LinearRegression
from maml.apps.pes import SNAPotential
from ase.io import *

train_energies=[]
train_forces = []
train_structures=[]
train_stresses = []
ase_adap = AseAtomsAdaptor()

images = read('train2.traj',':')
for i,atoms in enumerate(images):
    train_energies.append(atoms.get_potential_energy())
    train_forces.append(atoms.get_forces())
    atoms.set_pbc([1,1,1])
    train_structures.append(ase_adap.get_structure(atoms))

train_pool = pool_from(train_structures, train_energies, train_forces)
_, df = convert_docs(train_pool, include_stress=False)
weights = np.ones(len(df['dtype']), )
weights[df['dtype'] == 'force'] = 1
weights[df['dtype'] == 'energy'] = 100000
element_profile = {'Cu': {'r': 5, 'w': 1}, 'Zr': {'r': 5, 'w': 1}, 'Al': {'r': 5, 'w': 1}, 'Nb': {'r': 5, 'w': 1}}

describer = BispectrumCoefficients(rcutfac=1, twojmax=10, element_profile=element_profile, quadratic=True, pot_fit=True, include_stress=False, n_jobs=8, verbose=True)

ml_model = LinearRegression()
skl_model = SKLModel(describer=describer, model=ml_model)
snap = SNAPotential(model=skl_model)

snap.train(train_structures, train_energies, train_forces, include_stress=False, sample_weight=weights)
snap.write_param()

I am running into mostly memory issues since after the descriptors have been calculated the job just gets killed.

from maml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.