Comments (5)
In GitLab by @RandomDefaultUser on Feb 9, 2021, 11:47
I discovered that Sandia was doing a little trick I wasn't aware of; by not shuffling the DataLoaders but instead shuffling the files they could always cache one file in RAM. This causes a MASSIVE speedup. Therefore, this issue is not as urgent anymore. It would improve performance, but not by that much.
By Fiedler, Lenz (FWU) - 146409 on 2021-02-09T11:47:20 (imported from GitLab)
from mala.
In GitLab by @RandomDefaultUser on Feb 9, 2021, 16:01
This is an important point. We should talk about this in more detail in the weekly meeting and/or in the bigger meeting with @schmer52 and @Kotik79.
By Cangi, Dr. Attila (FWU) - 139621 on 2021-02-09T16:01:47 (imported from GitLab)
from mala.
In GitLab by @RandomDefaultUser on Feb 9, 2021, 14:12
To expand on that: If you use horovod with 4 nodes and have 6000 data points, this is what the nodes will get:
node 0: datapoints 0, 4, 8, ...
node 1: datapoints 1, 5, 9, ...
node 2: datapoints 2, 6, 10, ...
node 3: datapoints: 3, 7, 11, ...
Using the caching algorithm I described earlier this is ok. Every node will do 4 I/O operations (just as a single node would), and the speedup will come by doing the batch processing itself. If I am not mistaken, we get a speedup for large files, which is good, because we have large files. Now, if we could make it so that:
node 0: datapoints 0, 1, 2, 3
node 1: datapoints 1499, 1500, 1501, 1502
node 2: datapoints 2999, 3000, 3001, 3002
node 3: datapoints 4499, 4500, 4501, 4502
we would could have some additional speed. node 0 and node 2 would only do 1 file I/O and node 1 and node 3 only two!
I think this would be the goal.
By Fiedler, Lenz (FWU) - 146409 on 2021-02-09T16:01:48 (imported from GitLab)
from mala.
In GitLab by @RandomDefaultUser on Feb 9, 2021, 23:08
Sounds good! I also talked to Austin concerning this issue. He told me he is already planning on extending the lazy loading functionality in the way I outlined above (io the Sandia). Maybe this could be a first potential point of code collaboration, since this functionality is missing in both codes at the moment.
By Fiedler, Lenz (FWU) - 146409 on 2021-02-09T23:08:32 (imported from GitLab)
from mala.
Closed because obsolete with Josh's changes.
from mala.
Related Issues (20)
- Update MALA logos
- Zero validation data loss during hyperparameter optimization HOT 1
- GPU Graphs fail when used with batch size that is not divisor of data set size
- Optuna v3.x.x no longer compatible with zombie trial cleaning HOT 2
- Adopt a code formatting standard HOT 4
- SNAP data overwritten for simultaneous pre-processing in same directory HOT 1
- Unused imports HOT 2
- Code duplications in `predictor.py` HOT 1
- Issue creating an arbitrary number of snapshots HOT 1
- Avoid CI runs when PR is still in "draft" mode HOT 2
- Optuna resume workflow overhaul
- Remove old container images from package registry HOT 1
- Use tempfile to handle LAMMPS and QE temporary files
- Align Python versions througout MALA
- New containers are added to the registry, which should not actually be different
- Clean up cache entires after successfull merge of PR HOT 2
- Remove potentially obsolete step in CI CPU test workflow HOT 1
- Make OpenPMD consistent HOT 1
- Improve CI with regards to data repo updates HOT 3
- Delete caches after pushes tp `develop`, `master` etc.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mala.