Giter Site home page Giter Site logo

Comments (4)

MasterIzumi avatar MasterIzumi commented on September 23, 2024

@zhaone Hi, have you found the reason?
I just tried to train the network using the default settings, and I also found the training is around twice as slow as the paper described. It cost 14 hours for 20 epochs (11.5 hours for 36 epochs in the paper).

Here's my environment: 4 * Titan RTX, batch size 128 (4*32), distributed training using Horovod.

Btw, one more thing I notice is that my log shows one epoch takes over 2440 while ~900 in the provided log file, and in #2 they report ~1200 (4 * RTX2080Ti). But the evaluation results are similar.

Here's my training log:

Epoch 20.000, lr 0.00100, time 2440.21 
loss 0.5037 0.2001 0.3036, ade1 1.6102, fde1 3.5928, ade 0.7662, fde 1.1754

Provided log file:

Epoch 20.000, lr 0.00100, time 872.52
loss 0.5018 0.2001 0.3016, ade1 1.5967, fde1 3.5560, ade 0.7638, fde 1.1651

from lanegcn.

zhaone avatar zhaone commented on September 23, 2024

No, I have not solved this problem yet, but your speed is not so ridiculously slow compared with mine (3 times slower than yours). Have you checked where the speed bottleneck is? for example IO?

from lanegcn.

chenyuntc avatar chenyuntc commented on September 23, 2024
  1. Make sure you use preprocessed data. Otherwise, io and preprocessing is a heavy load.
  2. watch nvidia-smi or watch gpustat to see the gpu utilization while running code. The utilization is usually above 80%.
  3. htop to see the cpu utilization, make sure you have sufficient cpu resource.

from lanegcn.

wuhaoran111 avatar wuhaoran111 commented on September 23, 2024

@MasterIzumi i have the same question. And when i use free -h, i see that the memory are exhausted. As i have 128G memory with 4 Titan XP GPU, i think it may use too much memory in the code ?

from lanegcn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.