Giter Site home page Giter Site logo

Comments (4)

Brummi avatar Brummi commented on June 10, 2024 1

Hi,
sorry, I thought you were talking about the W pthreadpool-cpp.cc:90] Warning: Leaking Caffe2 thread-pool after fork. (function pthreadpool) line.

  • Yes, evaluation is not based on epochs, but iterations. This was very useful when dealing with datasets of different sizes. For example RealEstate10K has a huge number of datapoints. Here, evaluating only every epoch would not give you a good overview of the training.
  • is_preprocessed speeds up training because you don't have to load the full-sized images, but the already resized ones. (Theses files can be obtained from the script in the datasets/kitti-360 directory.
  • I have not really used multi-GPU training during development. Therefore, this issue might have not appeared

from behindthescenes.

Brummi avatar Brummi commented on June 10, 2024

Hi!
I dont observe the same behaviour on any of our machines. I think this is an issue with your system setup.
Best,
Felix

from behindthescenes.

Nicholas-Autio-Mitchell avatar Nicholas-Autio-Mitchell commented on June 10, 2024

I see the same kind of logs as described by @zsz-pro.

Is the reason simply because there are multiple validation/visualisation steps per epoch? These lines of the default configuration file suggest that is true.

Running the KITTI-360 experiment: The things I changed:

I am kind of surprised this is not an issue for the repo's given conda environment as it uses pytorch-cuda=11.6.

Below is a chunk of the training logs that show Epochs 13, 14 and 15. Like all other epochs, some logs are printed multiple times.

Expand to see logs

Trained on single A100 GPU

[2023-07-12 14:11:15,398][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
Evaluation (val): [1/1] 100%|████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-12 14:11:22,900][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:06.947
[2023-07-12 14:11:22,902][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:07.502
2023-07-12 14:11:23,019 kitti_360 INFO:
Epoch 13 - Evaluation time (seconds): 7.50 - Vis metrics:
        abs_rel: 0.08554093948269809
        sq_rel: 0.6244590130093236
        rmse: 3.4999701248984034
        rmse_log: 0.19446041207287681
        a1: 0.898856520652771
        a2: 0.94862300157547
        a3: 0.9731572866439819
2023-07-12 14:17:47,428 kitti_360 INFO: Epoch[13] Complete. Time taken: 05:34:09.799
[2023-07-12 14:37:25,148][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
[2023-07-12 14:40:19,743][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:02:54.265
[2023-07-12 14:40:20,375][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:02:55.225
2023-07-12 14:40:20,487 kitti_360 INFO:
Epoch 14 - Evaluation time (seconds): 175.22 - Test metrics:
        abs_rel: 0.11480931034032174
        sq_rel: 0.6637996911791815
        rmse: 3.7153490281445607
        rmse_log: 0.21579428924271787
        a1: 0.8733996527735144
        a2: 0.9506910068448633
        a3: 0.9723349793348461
[2023-07-12 14:40:20,488][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
Evaluation (val): [1/1] 100%|████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-12 14:40:25,339][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:04.576
[2023-07-12 14:40:25,340][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:04.852
2023-07-12 14:40:25,478 kitti_360 INFO:
Epoch 14 - Evaluation time (seconds): 4.85 - Vis metrics:
        abs_rel: 0.08070596869247891
        sq_rel: 0.5057030975542547
        rmse: 3.585757716686993
        rmse_log: 0.20001286568907975
        a1: 0.8933269381523132
        a2: 0.949320912361145
        a3: 0.9724593758583069
[2023-07-12 15:06:29,778][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
Evaluation (val): [1/1] 100%|████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-12 15:06:37,338][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:07.219
[2023-07-12 15:06:37,339][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:07.559
2023-07-12 15:06:37,456 kitti_360 INFO:
Epoch 14 - Evaluation time (seconds): 7.56 - Vis metrics:
        abs_rel: 0.08622214936775972
        sq_rel: 0.6045194685131523
        rmse: 3.762468358054172
        rmse_log: 0.19322613491407561
        a1: 0.8915016055107117
        a2: 0.952703058719635
        a3: 0.9756804704666138
[2023-07-12 15:33:00,013][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
Evaluation (val): [1/1] 100%|████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-12 15:33:06,200][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:05.874
[2023-07-12 15:33:06,201][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:06.187
2023-07-12 15:33:06,333 kitti_360 INFO:
Epoch 14 - Evaluation time (seconds): 6.19 - Vis metrics:
        abs_rel: 0.08986934827280171
        sq_rel: 0.6382087551104668
        rmse: 3.652709459751095
        rmse_log: 0.2037586631711583
        a1: 0.8867235779762268
        a2: 0.946153461933136
        a3: 0.9722446203231812
[2023-07-12 15:59:45,414][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
Evaluation (val): [1/1] 100%|████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-12 15:59:53,692][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:07.959
[2023-07-12 15:59:53,693][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:08.278
2023-07-12 15:59:53,835 kitti_360 INFO:
Epoch 14 - Evaluation time (seconds): 8.28 - Vis metrics:
        abs_rel: 0.08863173995398226
        sq_rel: 0.7067783246498143
        rmse: 3.700866907608836
        rmse_log: 0.2063800082754662
        a1: 0.9044398069381714
        a2: 0.9475492835044861
        a3: 0.9684866070747375
[2023-07-12 16:26:36,550][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
[2023-07-12 16:29:40,259][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:03:03.395
[2023-07-12 16:29:40,260][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:03:03.709
2023-07-12 16:29:40,373 kitti_360 INFO:
Epoch 14 - Evaluation time (seconds): 183.71 - Test metrics:
        abs_rel: 0.1219142335086286
        sq_rel: 0.8344372441241323
        rmse: 3.9208930653213243
        rmse_log: 0.2237279891813233
        a1: 0.874464736552909
        a2: 0.9490371746942401
        a3: 0.970153178088367
[2023-07-12 16:29:40,374][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
Evaluation (val): [1/1] 100%|████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-12 16:29:45,322][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:04.692
[2023-07-12 16:29:45,324][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:04.950
2023-07-12 16:29:45,456 kitti_360 INFO:
Epoch 14 - Evaluation time (seconds): 4.95 - Vis metrics:
        abs_rel: 0.08858976934422431
        sq_rel: 0.7473974579866556
        rmse: 3.7746566867415696
        rmse_log: 0.20740206238851125
        a1: 0.8989638686180115
        a2: 0.9448649883270264
        a3: 0.9682719111442566
[2023-07-12 16:56:27,753][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
Evaluation (val): [1/1] 100%|████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-12 16:56:35,994][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:07.910
[2023-07-12 16:56:35,995][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:08.241
2023-07-12 16:56:36,136 kitti_360 INFO:
Epoch 14 - Evaluation time (seconds): 8.24 - Vis metrics:
        abs_rel: 0.08209793065801733
        sq_rel: 0.5959427187126923
        rmse: 3.6324440924085453
        rmse_log: 0.21365165189088933
        a1: 0.8960111737251282
        a2: 0.944381833076477
        a3: 0.9666076302528381
[2023-07-12 17:23:36,441][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
Evaluation (val): [1/1] 100%|████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-12 17:23:45,494][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:08.728
[2023-07-12 17:23:45,495][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:09.054
2023-07-12 17:23:45,627 kitti_360 INFO:
Epoch 14 - Evaluation time (seconds): 9.05 - Vis metrics:
        abs_rel: 0.0834830856404954
        sq_rel: 0.5985272677240563
        rmse: 3.5497722144365085
        rmse_log: 0.19657217609012884
        a1: 0.8934342861175537
        a2: 0.9491061568260193
        a3: 0.9759489297866821
[2023-07-12 17:50:43,280][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
Evaluation (val): [1/1] 100%|████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-12 17:50:53,359][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:09.749
[2023-07-12 17:50:53,361][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:10.080
2023-07-12 17:50:53,477 kitti_360 INFO:
Epoch 14 - Evaluation time (seconds): 10.08 - Vis metrics:
Epoch 14 - Evaluation time (seconds): 9.05 - Vis metrics:
        abs_rel: 0.0834830856404954
        sq_rel: 0.5985272677240563
        rmse: 3.5497722144365085
        rmse_log: 0.19657217609012884
        a1: 0.8934342861175537
        a2: 0.9491061568260193
        a3: 0.9759489297866821
[2023-07-12 17:50:43,280][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
Evaluation (val): [1/1] 100%|████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-12 17:50:53,359][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:09.749
[2023-07-12 17:50:53,361][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:10.080
2023-07-12 17:50:53,477 kitti_360 INFO:
Epoch 14 - Evaluation time (seconds): 10.08 - Vis metrics:
        abs_rel: 0.08506939417545267
        sq_rel: 0.7149858369983374
        rmse: 3.7219778391891336
        rmse_log: 0.20664187296963754
        a1: 0.8980512619018555
        a2: 0.9488377571105957
        a3: 0.9705266952514648
[2023-07-12 18:17:51,587][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
[2023-07-12 18:21:12,396][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:03:20.469
[2023-07-12 18:21:13,021][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:03:21.433
2023-07-12 18:21:13,181 kitti_360 INFO:
Epoch 14 - Evaluation time (seconds): 201.43 - Test metrics:
        abs_rel: 0.11314249655844683
        sq_rel: 0.7012545761475909
        rmse: 3.7195900263479533
        rmse_log: 0.21453171459305867
        a1: 0.8770193115342408
        a2: 0.9501760615967214
        a3: 0.9723244274500757
[2023-07-12 18:21:13,182][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
Evaluation (val): [1/1] 100%|████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-12 18:21:18,254][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:04.809
[2023-07-12 18:21:18,256][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:05.074
2023-07-12 18:21:18,366 kitti_360 INFO:
Epoch 14 - Evaluation time (seconds): 5.07 - Vis metrics:
        abs_rel: 0.07578395172581752
        sq_rel: 0.5024875081138518
        rmse: 3.4798857048593157
        rmse_log: 0.194624591324213
        a1: 0.8971922397613525
        a2: 0.9471734762191772
        a3: 0.9724593758583069
[2023-07-12 18:48:05,497][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
Evaluation (val): [1/1] 100%|████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-12 18:48:13,010][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:07.221
[2023-07-12 18:48:13,011][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:07.512
2023-07-12 18:48:13,150 kitti_360 INFO:
Epoch 14 - Evaluation time (seconds): 7.51 - Vis metrics:
        abs_rel: 0.08523062198386379
        sq_rel: 0.6145007152112157
        rmse: 3.817461628217154
        rmse_log: 0.21513257298443078
        a1: 0.8903741836547852
        a2: 0.9435228705406189
        a3: 0.966446578502655
[2023-07-12 19:14:46,552][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
Evaluation (val): [1/1] 100%|████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-12 19:14:53,621][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:06.755
[2023-07-12 19:14:53,622][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:07.068
2023-07-12 19:14:53,732 kitti_360 INFO:
Epoch 14 - Evaluation time (seconds): 7.07 - Vis metrics:
        abs_rel: 0.08337322146351114
        sq_rel: 0.6457452155974626
        rmse: 3.5514027400976587
        rmse_log: 0.2043717522969885
        a1: 0.9019165635108948
        a2: 0.9489988088607788
        a3: 0.9712782502174377
[2023-07-12 19:41:40,019][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
Evaluation (val): [1/1] 100%|████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-12 19:41:47,555][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:07.096
[2023-07-12 19:41:47,557][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:07.536
2023-07-12 19:41:47,695 kitti_360 INFO:
Epoch 14 - Evaluation time (seconds): 7.54 - Vis metrics:
        abs_rel: 0.08835147521652986
        sq_rel: 0.6003459391090917
        rmse: 3.6419516338590174
        rmse_log: 0.21035350637508463
        a1: 0.894454300403595
        a2: 0.9486766457557678
        a3: 0.9699361324310303
2023-07-12 19:55:05,461 kitti_360 INFO: Epoch[14] Complete. Time taken: 05:37:18.031
[2023-07-12 20:08:45,942][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
[2023-07-12 20:11:52,930][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:03:06.664
[2023-07-12 20:11:52,931][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:03:06.987
2023-07-12 20:11:53,049 kitti_360 INFO:
Epoch 15 - Evaluation time (seconds): 186.99 - Test metrics:
        abs_rel: 0.11556165418269916
        sq_rel: 0.7672341478655641
        rmse: 3.770929400000447
        rmse_log: 0.21133477194124622
        a1: 0.8862405351828784
        a2: 0.9514126554131508
        a3: 0.9731135093607008
[2023-07-12 20:11:53,050][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
Evaluation (val): [1/1] 100%|████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-12 20:11:57,990][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:04.633
[2023-07-12 20:11:57,991][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:04.941
2023-07-12 20:11:58,109 kitti_360 INFO:
Epoch 15 - Evaluation time (seconds): 4.94 - Vis metrics:
        abs_rel: 0.0882263961585384
        sq_rel: 0.6482227879655088
        rmse: 3.5954508476388582
        rmse_log: 0.19819850572473746
        a1: 0.8984270095825195
        a2: 0.9485155940055847
        a3: 0.9729425311088562
[2023-07-12 20:38:44,346][ignite.engine.engine.Engine][INFO] - Engine run starting with max_epochs=1.
Evaluation (val): [1/1] 100%|████████████████████████████████████████████████████████████████████████████ [00:00<?]Visualizing
[2023-07-12 20:38:50,976][ignite.engine.engine.Engine][INFO] - Epoch[1] Complete. Time taken: 00:00:06.318
[2023-07-12 20:38:50,977][ignite.engine.engine.Engine][INFO] - Engine run complete. Time taken: 00:00:06.630
2023-07-12 20:38:51,096 kitti_360 INFO:
Epoch 15 - Evaluation time (seconds): 6.63 - Vis metrics:
        abs_rel: 0.08941568872297122
        sq_rel: 0.7008410126920626
        rmse: 3.5520300447032693
        rmse_log: 0.20019262230258883
        a1: 0.9054061770439148
        a2: 0.9499651193618774
        a3: 0.9685403108596802

I think we see Epoch [1] for every round of evaluation.

After epochs 13 and 14 finish (in my logs above), we do actually see a log line stating that is finished:

2023-07-12 19:55:05,461 kitti_360 INFO: Epoch[14] Complete. Time taken: 05:37:18.031

from behindthescenes.

zsz-pro avatar zsz-pro commented on June 10, 2024

I got it!Thanks for your reply!

from behindthescenes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.