Giter Site home page Giter Site logo

Comments (5)

RandomDefaultUser avatar RandomDefaultUser commented on June 13, 2024

In GitLab by @RandomDefaultUser on Feb 9, 2021, 11:47

I discovered that Sandia was doing a little trick I wasn't aware of; by not shuffling the DataLoaders but instead shuffling the files they could always cache one file in RAM. This causes a MASSIVE speedup. Therefore, this issue is not as urgent anymore. It would improve performance, but not by that much.

By Fiedler, Lenz (FWU) - 146409 on 2021-02-09T11:47:20 (imported from GitLab)

from mala.

RandomDefaultUser avatar RandomDefaultUser commented on June 13, 2024

In GitLab by @RandomDefaultUser on Feb 9, 2021, 16:01

This is an important point. We should talk about this in more detail in the weekly meeting and/or in the bigger meeting with @schmer52 and @Kotik79.

By Cangi, Dr. Attila (FWU) - 139621 on 2021-02-09T16:01:47 (imported from GitLab)

from mala.

RandomDefaultUser avatar RandomDefaultUser commented on June 13, 2024

In GitLab by @RandomDefaultUser on Feb 9, 2021, 14:12

To expand on that: If you use horovod with 4 nodes and have 6000 data points, this is what the nodes will get:
node 0: datapoints 0, 4, 8, ...
node 1: datapoints 1, 5, 9, ...
node 2: datapoints 2, 6, 10, ...
node 3: datapoints: 3, 7, 11, ...

Using the caching algorithm I described earlier this is ok. Every node will do 4 I/O operations (just as a single node would), and the speedup will come by doing the batch processing itself. If I am not mistaken, we get a speedup for large files, which is good, because we have large files. Now, if we could make it so that:

node 0: datapoints 0, 1, 2, 3
node 1: datapoints 1499, 1500, 1501, 1502
node 2: datapoints 2999, 3000, 3001, 3002
node 3: datapoints 4499, 4500, 4501, 4502

we would could have some additional speed. node 0 and node 2 would only do 1 file I/O and node 1 and node 3 only two!
I think this would be the goal.

By Fiedler, Lenz (FWU) - 146409 on 2021-02-09T16:01:48 (imported from GitLab)

from mala.

RandomDefaultUser avatar RandomDefaultUser commented on June 13, 2024

In GitLab by @RandomDefaultUser on Feb 9, 2021, 23:08

Sounds good! I also talked to Austin concerning this issue. He told me he is already planning on extending the lazy loading functionality in the way I outlined above (io the Sandia). Maybe this could be a first potential point of code collaboration, since this functionality is missing in both codes at the moment.

By Fiedler, Lenz (FWU) - 146409 on 2021-02-09T23:08:32 (imported from GitLab)

from mala.

RandomDefaultUser avatar RandomDefaultUser commented on June 13, 2024

Closed because obsolete with Josh's changes.

from mala.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.