Giter Site home page Giter Site logo

training speed about shufflenet-series HOT 3 CLOSED

 avatar commented on August 21, 2024
training speed

from shufflenet-series.

Comments (3)

nmaac avatar nmaac commented on August 21, 2024

@acoder-fin We use 8 2080ti to train ShuffleNetV2 and we use normal conv operations. In your case (I have never meet such a low speed) I would suggest that you test the hardware environment.

from shufflenet-series.

 avatar commented on August 21, 2024

@nmaac Thank you very much! No idea whats wrong.
Then I used the example code from DALI package, and the speed was quite good. With 2 V100, DALI, mixture precision and distributed training, 8 hours are enough to train the ImageNet dataset (90 epochs, ShuffleNetV2).

[2021-05-03 19:32:27] Epoch: [1][0/626]	Time 1.565 (1.565)	Speed 1308.544 (1308.544)	Loss 6.9308290482 (6.9308)	Prec@1 0.342 (0.342)	Prec@5 0.732 (0.732)
[2021-05-03 19:32:32] Epoch: [1][10/626]	Time 0.430 (0.998)	Speed 4763.635 (2053.110)	Loss 6.9179043770 (6.9244)	Prec@1 0.049 (0.195)	Prec@5 0.537 (0.635)
[2021-05-03 19:32:37] Epoch: [1][20/626]	Time 0.483 (0.826)	Speed 4237.726 (2479.119)	Loss 6.9008212090 (6.9165)	Prec@1 0.098 (0.163)	Prec@5 0.439 (0.570)
[2021-05-03 19:32:41] Epoch: [1][30/626]	Time 0.452 (0.732)	Speed 4535.367 (2796.036)	Loss 6.8893446922 (6.9097)	Prec@1 0.049 (0.134)	Prec@5 0.586 (0.574)
[2021-05-03 19:32:46] Epoch: [1][40/626]	Time 0.441 (0.674)	Speed 4640.996 (3037.542)	Loss 6.8476972580 (6.8973)	Prec@1 0.244 (0.156)	Prec@5 1.514 (0.762)
[2021-05-03 19:32:50] Epoch: [1][50/626]	Time 0.465 (0.639)	Speed 4404.104 (3203.197)	Loss 6.8406362534 (6.8879)	Prec@1 0.098 (0.146)	Prec@5 1.025 (0.806)
[2021-05-03 19:32:55] Epoch: [1][60/626]	Time 0.492 (0.618)	Speed 4163.065 (3312.298)	Loss 6.8032770157 (6.8758)	Prec@1 0.195 (0.153)	Prec@5 1.367 (0.886)
[2021-05-03 19:33:00] Epoch: [1][70/626]	Time 0.477 (0.601)	Speed 4290.095 (3409.433)	Loss 6.7874403000 (6.8647)	Prec@1 0.342 (0.177)	Prec@5 1.416 (0.952)
[2021-05-03 19:33:05] Epoch: [1][80/626]	Time 0.468 (0.586)	Speed 4376.600 (3495.255)	Loss 6.7290878296 (6.8497)	Prec@1 0.439 (0.206)	Prec@5 2.246 (1.096)
[2021-05-03 19:33:09] Epoch: [1][90/626]	Time 0.475 (0.575)	Speed 4314.122 (3562.883)	Loss 6.6708250046 (6.8318)	Prec@1 0.488 (0.234)	Prec@5 2.148 (1.201)
-----

[2021-05-04 03:11:07] Epoch: [90][580/626]	Time 0.469 (0.466)	Speed 4370.227 (4393.691)	Loss 1.5688558817 (1.5205)	Prec@1 63.477 (64.519)	Prec@5 83.545 (84.800)
[2021-05-04 03:11:12] Epoch: [90][590/626]	Time 0.486 (0.466)	Speed 4215.507 (4390.598)	Loss 1.4887752533 (1.5199)	Prec@1 65.820 (64.540)	Prec@5 85.693 (84.815)
[2021-05-04 03:11:17] Epoch: [90][600/626]	Time 0.477 (0.467)	Speed 4297.617 (4389.041)	Loss 1.4767193794 (1.5192)	Prec@1 65.234 (64.552)	Prec@5 86.230 (84.838)
[2021-05-04 03:11:22] Epoch: [90][610/626]	Time 0.451 (0.466)	Speed 4538.312 (4391.371)	Loss 1.5950200558 (1.5205)	Prec@1 63.867 (64.541)	Prec@5 83.252 (84.813)
[2021-05-04 03:11:26] Epoch: [90][620/626]	Time 0.456 (0.466)	Speed 4488.785 (4392.884)	Loss 1.4577445984 (1.5195)	Prec@1 65.723 (64.559)	Prec@5 85.938 (84.831)
[2021-05-04 03:11:28] Test: [0/24]	Time 0.143 (0.143)	Speed 14316.629 (14316.629)	Loss 1.2471 (1.2471)	Prec@1 69.775 (69.775)	Prec@5 87.842 (87.842)
[2021-05-04 03:11:32] Test: [10/24]	Time 0.426 (0.378)	Speed 4812.863 (5422.077)	Loss 1.3152 (1.3583)	Prec@1 68.262 (66.895)	Prec@5 88.184 (87.318)
[2021-05-04 03:11:36] Test: [20/24]	Time 0.386 (0.389)	Speed 5311.530 (5260.344)	Loss 1.5729 (1.3950)	Prec@1 62.549 (66.055)	Prec@5 84.961 (87.028)

from shufflenet-series.

OpencvW avatar OpencvW commented on August 21, 2024

@nmaac Thank you very much! No idea whats wrong. Then I used the example code from DALI package, and the speed was quite good. With 2 V100, DALI, mixture precision and distributed training, 8 hours are enough to train the ImageNet dataset (90 epochs, ShuffleNetV2).

[2021-05-03 19:32:27] Epoch: [1][0/626]	Time 1.565 (1.565)	Speed 1308.544 (1308.544)	Loss 6.9308290482 (6.9308)	Prec@1 0.342 (0.342)	Prec@5 0.732 (0.732)
[2021-05-03 19:32:32] Epoch: [1][10/626]	Time 0.430 (0.998)	Speed 4763.635 (2053.110)	Loss 6.9179043770 (6.9244)	Prec@1 0.049 (0.195)	Prec@5 0.537 (0.635)
[2021-05-03 19:32:37] Epoch: [1][20/626]	Time 0.483 (0.826)	Speed 4237.726 (2479.119)	Loss 6.9008212090 (6.9165)	Prec@1 0.098 (0.163)	Prec@5 0.439 (0.570)
[2021-05-03 19:32:41] Epoch: [1][30/626]	Time 0.452 (0.732)	Speed 4535.367 (2796.036)	Loss 6.8893446922 (6.9097)	Prec@1 0.049 (0.134)	Prec@5 0.586 (0.574)
[2021-05-03 19:32:46] Epoch: [1][40/626]	Time 0.441 (0.674)	Speed 4640.996 (3037.542)	Loss 6.8476972580 (6.8973)	Prec@1 0.244 (0.156)	Prec@5 1.514 (0.762)
[2021-05-03 19:32:50] Epoch: [1][50/626]	Time 0.465 (0.639)	Speed 4404.104 (3203.197)	Loss 6.8406362534 (6.8879)	Prec@1 0.098 (0.146)	Prec@5 1.025 (0.806)
[2021-05-03 19:32:55] Epoch: [1][60/626]	Time 0.492 (0.618)	Speed 4163.065 (3312.298)	Loss 6.8032770157 (6.8758)	Prec@1 0.195 (0.153)	Prec@5 1.367 (0.886)
[2021-05-03 19:33:00] Epoch: [1][70/626]	Time 0.477 (0.601)	Speed 4290.095 (3409.433)	Loss 6.7874403000 (6.8647)	Prec@1 0.342 (0.177)	Prec@5 1.416 (0.952)
[2021-05-03 19:33:05] Epoch: [1][80/626]	Time 0.468 (0.586)	Speed 4376.600 (3495.255)	Loss 6.7290878296 (6.8497)	Prec@1 0.439 (0.206)	Prec@5 2.246 (1.096)
[2021-05-03 19:33:09] Epoch: [1][90/626]	Time 0.475 (0.575)	Speed 4314.122 (3562.883)	Loss 6.6708250046 (6.8318)	Prec@1 0.488 (0.234)	Prec@5 2.148 (1.201)
-----

[2021-05-04 03:11:07] Epoch: [90][580/626]	Time 0.469 (0.466)	Speed 4370.227 (4393.691)	Loss 1.5688558817 (1.5205)	Prec@1 63.477 (64.519)	Prec@5 83.545 (84.800)
[2021-05-04 03:11:12] Epoch: [90][590/626]	Time 0.486 (0.466)	Speed 4215.507 (4390.598)	Loss 1.4887752533 (1.5199)	Prec@1 65.820 (64.540)	Prec@5 85.693 (84.815)
[2021-05-04 03:11:17] Epoch: [90][600/626]	Time 0.477 (0.467)	Speed 4297.617 (4389.041)	Loss 1.4767193794 (1.5192)	Prec@1 65.234 (64.552)	Prec@5 86.230 (84.838)
[2021-05-04 03:11:22] Epoch: [90][610/626]	Time 0.451 (0.466)	Speed 4538.312 (4391.371)	Loss 1.5950200558 (1.5205)	Prec@1 63.867 (64.541)	Prec@5 83.252 (84.813)
[2021-05-04 03:11:26] Epoch: [90][620/626]	Time 0.456 (0.466)	Speed 4488.785 (4392.884)	Loss 1.4577445984 (1.5195)	Prec@1 65.723 (64.559)	Prec@5 85.938 (84.831)
[2021-05-04 03:11:28] Test: [0/24]	Time 0.143 (0.143)	Speed 14316.629 (14316.629)	Loss 1.2471 (1.2471)	Prec@1 69.775 (69.775)	Prec@5 87.842 (87.842)
[2021-05-04 03:11:32] Test: [10/24]	Time 0.426 (0.378)	Speed 4812.863 (5422.077)	Loss 1.3152 (1.3583)	Prec@1 68.262 (66.895)	Prec@5 88.184 (87.318)
[2021-05-04 03:11:36] Test: [20/24]	Time 0.386 (0.389)	Speed 5311.530 (5260.344)	Loss 1.5729 (1.3950)	Prec@1 62.549 (66.055)	Prec@5 84.961 (87.028)

Hi, could you share your DALI training code? Thanks!

from shufflenet-series.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.