I'm attempting to reproduce the numbers for SimCLR as mentioned in the <a href="https:

thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

Thanks for the response <a class="user-mention notranslate" data-hovercard-type="user"

thank you for reporting <a class="user-mention notranslate" data-hovercard-type="user"

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Based on the responses above and the report from <a class="user-mention notranslate" d

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Reproducing model zoos simCLR top 1 accuracy about vissl HOT 10 CLOSED

facebookresearch commented on August 30, 2024 1

Reproducing model zoos simCLR top 1 accuracy

from vissl.

Comments (10)

prigoyal commented on August 30, 2024 2

thank you @iseessel for the debugging and for all the data points above. We expect the results to reproduce between the 8gpu and 1gpu accounting for all the differences you spotted (sync BN etc).
With that said, I will take this task on further from here, as it's important to understand this further for our research. Thank you for your efforts on this. All above data points are helpful.

from vissl.

spurra commented on August 30, 2024 1

Thanks for the response @prigoyal ! I'll let you know what numbers I get

from vissl.

prigoyal commented on August 30, 2024 1

thank you for reporting @spurra , let me try to rerun the benchmark and come back to you. We should be able to repro exactly. I'll try the original setting and your setting as well.

from vissl.

doulemint commented on August 30, 2024 1

@spurra Thank you for your detailed reports and procedures summary so that I also can reproduce this benchmark using one gpu.

from vissl.

iseessel commented on August 30, 2024 1

@prigoyal Sorry to cross wires here -- I've actually been looking into this -- see below and lmk if you have anything to add.

Hi @spurra + @doulemint,

I was able to reproduce your 1-gpu numbers, as well as our reported 8-gpu numbers. See below for the full results. These represent the best reported accuracy for train/test for each layer. As a side note, I believe the lr appears to be zero because we are rounding before logging to tensroboard: https://github.com/facebookresearch/vissl/blob/main/vissl/hooks/tensorboard_hook.py#L265

1GPU: 
rn50_in1k_simclr_100ep_eval_resnet_8gpu_transfer_in1k_linear_eval_resnet_1gpu_transfer_in1k_linear_14_10_21
None [ rn50_in1k_simclr_100ep_eval_resnet_8gpu_transfer_in1k_linear_eval_resnet_1gpu_transfer_in1k_linear_14_10_21 ] :
 - train.top_1.res5 : 0.628983 (50)
 - train.top_5.res5 : 0.843128 (52)
 - test.top_1.res5 : 0.62368 (47)
 - test.top_5.res5 : 0.85202 (43)

8 GPU: 
rn50_in1k_simclr_100ep_eval_resnet_8gpu_transfer_in1k_linear_eval_resnet_8gpu_transfer_in1k_linear_14_10_21
None [ rn50_in1k_simclr_100ep_eval_resnet_8gpu_transfer_in1k_linear_eval_resnet_8gpu_transfer_in1k_linear_14_10_21 ] :
 - train.top_1.res5 : 0.652946 (52)
 - train.top_5.res5 : 0.8587659999999999 (48)
 - test.top_1.res5 : 0.6443799999999998 (35)
 - test.top_5.res5 : 0.86064 (47)

I don't believe that results are guaranteed to be the same across the 8gpu and 1gpu schemes here. Note that scaling the lr is based on the Imagenet in 1 hour paper: https://arxiv.org/pdf/1706.02677.pdf. There are some differences with the papers results and these experiments. The paper tests global batch sizes of 256+ -- here the global batch size is 32. Note also the Imagenet in 1 hour paper does not calculate the BN statistics across all worker, whereas these transfer experiments do (By setting CONVERT_BN_TO_SYNC_BN: True). Since the training uses SYNC_BN, I think it would be interesting to increase your batchsize, depending on your GPU memory constraints.

So imo, I see reproducing the 8gpu numbers in a 1gpu scheme as an area of research. You could start by tuning some of the hyperaparameters -- imo batch size, LR, and weight decay values could be a good place to start.

(@prigoyal lmk if you disagree / If I've mischaracterized something).

from vissl.

prigoyal commented on August 30, 2024

Hi @spurra , thank you for reaching out. Yes, I would expect that you reproduce the number. The important thing is to ensure the learning rate is scaled properly as #gpus are changed. For this in VISSL, the https://github.com/facebookresearch/vissl/blob/master/configs/config/benchmark/linear_image_classification/imagenet1k/eval_resnet_8gpu_transfer_in1k_linear.yaml#L103 is provided to automatically adjust LR.

Please let us know if the numbers don't repro. :) we will look into it.

from vissl.

spurra commented on August 30, 2024

I finished running the experiment. This is the output of the log hook:

INFO 2021-05-12 09:38:02,635 log_hooks.py: 446: Rank: 0, name: test_accuracy_list_meter, value: {'top_1': {'conv1': 14.628, 'res2': 28.27, 'res3': 39.088, 'res4': 56.391999999999996, 'res5': 62.246}, 'top_5': {'conv1': 29.92, 'res2': 48.455999999999996, 'res3': 61.07, 'res4': 78.32000000000001, 'res5': 85.15599999999999}}

I assume the top 1 accuracy of res5 is relevant, I'm not quite sure what the others indicate.

I achieve 62.246 vs 64.4 which was reported for the RN50 model trained for 100 epochs. For completeness, I'm uploading the train_config.yaml file which was produced by the code which can be found here: https://gist.github.com/spurra/d5b89caccbd614522eb19e6bc3a9e2d9

Is this performance discrepancy within expected range?

EDIT: Also on a side note, it seems like the learning rate is 0 for the last few epochs. Is this desired?

EDIT2: I'm a little confused by your comment regarding adjusting batch size as #gpus are changed. You state that https://github.com/facebookresearch/vissl/blob/master/configs/config/benchmark/linear_image_classification/imagenet1k/eval_resnet_8gpu_transfer_in1k_linear.yaml#L103 does this in VISSL. However the note in the documentation here suggests that I have to do this manually. Also, the documentation on the configs found here suggest this just applies the linear scaling rule. If I understand correctly, this just scales the learning rate according to the batch size, not the number of GPUs. Could you please clarify if I need to normalize the SimCLR loss by the total batch size or this is already taken care of? Thanks!

from vissl.

prigoyal commented on August 30, 2024

EDIT2: I'm a little confused by your comment regarding adjusting batch size as #gpus are changed. You state that https://github.com/facebookresearch/vissl/blob/master/configs/config/benchmark/linear_image_classification/imagenet1k/eval_resnet_8gpu_transfer_in1k_linear.yaml#L103 does this in VISSL. However the note in the documentation here suggests that I have to do this manually. Also, the documentation on the configs found here suggest this just applies the linear scaling rule. If I understand correctly, this just scales the learning rate according to the batch size, not the number of GPUs. Could you please clarify if I need to normalize the SimCLR loss by the total batch size or this is already taken care of? Thanks!

correct. Only learning rate will be scaled. Not the batch size. You should keep the batch size per gpu consistent and then learning rate will be auto-scaled.

from vissl.

prigoyal commented on August 30, 2024

Based on the responses above and the report from @doulemint , it looks like this benchmark reproduces numbers . Please free to reopen the task if that's not the case still.

Also, as a follow-up, we will look into providing the clarification for the learning rate scaling feature in VISSL in our docs, code comments etc. :)

from vissl.

spurra commented on August 30, 2024

@iseessel Thanks for testing this!
@prigoyal Please let me know once you have some updates on this, as I find it a very interesting area.

from vissl.

Reproducing model zoos simCLR top 1 accuracy about vissl HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent