Comments (2)
Hi @davidleocadio, which command did you run for the restart: It will not work if you run nequip-train
with a restart config, you'll have to use nequip-restart
for a restart. I also see you're running nequip 0.3.2., can you try this with the latest version 0.3.3?
Regarding restart vs requeue: requeue is an option that automatically figures out whether it's your first run or a restarted ran, so if it's your first run, it trains a new model, if it's a restarted ran, it restarts an existing one. Requeue is helpful for queueing systems in which you might get interrupted and your job is restarted, in this case you don't need to take care of anything.
I find it quite useful, here's a config I usually use:
nequip-requeue config.yaml
and then config.yaml has the following lines (you can also find an example under configs/requeue.yaml):
root: example-root
run_name: example-run_name
workdir: example-workdir
requeue: true
append: true
from nequip.
Hi @simonbatzner. I was training my networks using the procedure and code outlined in the Developer's Tutorial, and I was under the impression I could keep training the networks that way and simply modify the config.yaml with requeue: True ; append: True
But now I realize it's meant to work by training the networks like you comment above.
Also, with nequip 0.3.3, it all seems to work now.
Thanks for your help.
from nequip.
Related Issues (20)
- 🐛 [BUG] Tutorial example runs infinitely HOT 3
- 🐛 [BUG] lammps/build/lmp: No such file (tutorial) HOT 3
- wrong ValueError text HOT 1
- 🐛 [BUG] NotADirectoryError when attempting to run git HOT 2
- ❓ [QUESTION] CIF files & other target properties HOT 1
- ❓ [QUESTION] Newton pair when running lammps HOT 2
- 🌟 [FEATURE] OpenMM HOT 46
- Nequip memory requirements ❓ [QUESTION] HOT 10
- 🌟 [FEATURE] Support for newer PyTorch HOT 1
- ❓ [QUESTION] Sweeping hyperparemeters with Weights and Biases HOT 2
- 🌟 [FEATURE] Masking out some labels (e.g. constrained atoms) HOT 2
- 🐛 [BUG] Does not work with RTX 4080 GPU HOT 6
- RuntimeError when using nequip-evaluate HOT 2
- Reduce LR on plateau but not increase HOT 8
- 🐛 [BUG] NequIP running problem on A100 machine HOT 5
- 🐛 [BUG] Batchsize Problem in PerAtomMSELoss HOT 4
- The difference between two validation metrics in file log❓ [QUESTION] HOT 1
- ❓ [QUESTION] Colab tutorial HOT 1
- ❓ [QUESTION] Restart run HOT 1
- issue when using nequip-deploy 🐛 [BUG] HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nequip.