Giter Site home page Giter Site logo

Comments (5)

JiQi535 avatar JiQi535 commented on August 20, 2024

Hi Rana, I can give two pieces of advice:

  1. Try to parallelize your grid search for the best combination of parameters.
    Since each combination of parameters is independent to each other, we can let them run in parallel and make selection afterwards. There are Python packages helping us to parallelize the grid search, for example, the multiprocessing package. If you can divide your search into 24 parallel processes, then the search is likely accelerated for a few times or over 10 times.
  2. Make a reasonable size of search space for the optimal parameters.
    In your case, 200x200x200 number of searches seem to include too many cases which are not practical or necessary. I won't suggest an exact range for your search, but you may decide the intervals and total number of searches depends on the available resources you have.

Hi Rana, I can give two pieces of advice:

  1. Try to parallelize your grid search for the best combination of parameters.
    Since each combination of parameters is independent to each other, we can let them run in parallel and make selection afterwards. There are Python packages helping us to parallelize the grid search, for example, the multiprocessing package. If you can divide your search into 24 parallel processes, then the search is likely accelerated for a few times or over 10 times.

  2. Make a reasonable size of search space for the optimal parameters.
    In your case, 200x200x200 number of searches seem to include too many cases which are not practical or necessary. I won't suggest an exact range for your search, but you may decide the intervals and total number of searches depends on the available resources you have.

from maml.

Rana-Phy avatar Rana-Phy commented on August 20, 2024

Thanks for your suggestion. It is fast now!
Is there any technical reason behind 'divide your search into 24 parallel processes'?

from maml.

JiQi535 avatar JiQi535 commented on August 20, 2024

Thanks for your suggestion. It is fast now! Is there any technical reason behind 'divide your search into 24 parallel processes'?

Happy to know that it helps! I used "24" as an example, as there are 24 cores on each node of the computer cluster resources our group have access to. This value should be modified on different machines to achieve best efficiency.

from maml.

Rana-Phy avatar Rana-Phy commented on August 20, 2024

Dear Ji Qi,

Ok, now I am seeing that multiprocessing is at least three times slower than 'n_jobs'=24 of sci-kit learn for the larger dataset.
Maybe this could be because of our cluster setup or my script.
I am trying to understand the maml base model classes.
and my understanding could be completely wrong.
So,
skl_model=SKLModel(describer=describer, model=LinearRegression())
Is not the SKLModel is the model and describer is containing the hyperparameter (cut,wt,jmax) and model=LinearRegression() parameters will be learned during training/fitting?
May I put element_profile in a hyperparameter optimization package like optuna, hyperopt or anything you suggest instead of doing for loop. It's not possible for some reason.
I will be waiting to hear from you.

Best regards,
Rana

from maml.

JiQi535 avatar JiQi535 commented on August 20, 2024

Rana-Phy

The describer here is the local environment describer of the SNAP potential, which describes the material structures in a math form. The LinearRegression is the model used in the ML training process to connect the local environment describers (input) to target properties, which are energies, forces and stresses.

For parameter tuning, I'm not aware any existing automatic algorithms for SNAP training. Please let me know if there is, which I would be interested in. In previous works from our group, we used differential evolution implemented in scipy for parameter tuning of a SNAP for Mo (http://dx.doi.org/10.1103/PhysRevMaterials.1.043603), and we also used stepwise grid search for SNAPs for alloy systems (http://dx.doi.org/10.1038/s41524-020-0339-0). Those maybe good references for you.

from maml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.