In GitLab by <a class="user-mention notranslate" data-hovercard-type="user" data-hover

Use horovod for lazy-loading-like functionality about mala HOT 5 CLOSED

RandomDefaultUser commented on June 13, 2024

Use horovod for lazy-loading-like functionality

from mala.

Comments (5)

RandomDefaultUser commented on June 13, 2024

In GitLab by @RandomDefaultUser on Feb 9, 2021, 11:47

I discovered that Sandia was doing a little trick I wasn't aware of; by not shuffling the DataLoaders but instead shuffling the files they could always cache one file in RAM. This causes a MASSIVE speedup. Therefore, this issue is not as urgent anymore. It would improve performance, but not by that much.

By Fiedler, Lenz (FWU) - 146409 on 2021-02-09T11:47:20 (imported from GitLab)

from mala.

RandomDefaultUser commented on June 13, 2024

In GitLab by @RandomDefaultUser on Feb 9, 2021, 16:01

This is an important point. We should talk about this in more detail in the weekly meeting and/or in the bigger meeting with @schmer52 and @Kotik79.

By Cangi, Dr. Attila (FWU) - 139621 on 2021-02-09T16:01:47 (imported from GitLab)

from mala.

RandomDefaultUser commented on June 13, 2024

In GitLab by @RandomDefaultUser on Feb 9, 2021, 14:12

To expand on that: If you use horovod with 4 nodes and have 6000 data points, this is what the nodes will get:
node 0: datapoints 0, 4, 8, ...
node 1: datapoints 1, 5, 9, ...
node 2: datapoints 2, 6, 10, ...
node 3: datapoints: 3, 7, 11, ...

Using the caching algorithm I described earlier this is ok. Every node will do 4 I/O operations (just as a single node would), and the speedup will come by doing the batch processing itself. If I am not mistaken, we get a speedup for large files, which is good, because we have large files. Now, if we could make it so that:

node 0: datapoints 0, 1, 2, 3
node 1: datapoints 1499, 1500, 1501, 1502
node 2: datapoints 2999, 3000, 3001, 3002
node 3: datapoints 4499, 4500, 4501, 4502

we would could have some additional speed. node 0 and node 2 would only do 1 file I/O and node 1 and node 3 only two!
I think this would be the goal.

By Fiedler, Lenz (FWU) - 146409 on 2021-02-09T16:01:48 (imported from GitLab)

from mala.

RandomDefaultUser commented on June 13, 2024

In GitLab by @RandomDefaultUser on Feb 9, 2021, 23:08

Sounds good! I also talked to Austin concerning this issue. He told me he is already planning on extending the lazy loading functionality in the way I outlined above (io the Sandia). Maybe this could be a first potential point of code collaboration, since this functionality is missing in both codes at the moment.

By Fiedler, Lenz (FWU) - 146409 on 2021-02-09T23:08:32 (imported from GitLab)

from mala.

RandomDefaultUser commented on June 13, 2024

Closed because obsolete with Josh's changes.

from mala.

Recommend Projects

Use horovod for lazy-loading-like functionality about mala HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent