Thank you for amazing work again. I have some trouble to reproducing fid score..</

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Do you mind trying the default configuration: <div class="Box Bo

Hello! Thank you for your kind help. I run the with 8 GPU and batchsize you

Problem in reproducing fid score about ml-gmpi HOT 9 CLOSED

apple commented on August 20, 2024

Problem in reproducing fid score

from ml-gmpi.

Comments (9)

Xiaoming-Zhao commented on August 20, 2024

I guess the ffhq256x256.zip file format is same as preprocessed one.

This one I am unsure as I have never tried the link you referred to before.

I trained with this data and checked the training score.

This is weird. Do you mind sharing the hardware setup? E.g., the number of GPUs, etc. For GAN's training, the batch size matters a lot. If possible, please use the same batch size as specified in the paper.

I really want to refer your research but.. dataset problem is not easy for me...

I am sorry and I definitely understand that this is a headache. BTW, I recently encountered a tool, i.e., rclone, and it can interact with GDrive well (on the remote server, etc). Please refer to the documentation for how to set it up. The only thing that needs some effort is to set up a Google Cloud API, which is not hard by following its instruction.

Hope these help.

from ml-gmpi.

parkjh688 commented on August 20, 2024

Hi, @Xiaoming-Zhao!

I'm facing a similar issue. I trained my model with the following configuration: 256: {'batch_size': 64, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 16, 'gen_lr': 0.002, 'disc_lr': 0.002}, using the ffhq256x256 dataset from Kaggle. However, my FID score keeps increasing after the first 1000 steps, as shown in the graph below:

In the original paper, the model was trained with a batch size of 64, so I also tried that, but the FID score still increased. Previously, when I trained with a batch size of 8, the FID score was around 20 and remained stable until the end of 5000 steps. However, this time, even though I started with a lower FID score of 18, it kept increasing after around 1000 steps.

I'm wondering if using ffhq256x256 data from Kaggle instead of real 256-sized data could be causing overfitting. Because I think the size of the Kaggle set is a little smaller. Are there any other possible reasons for this behavior?

Thanks!

from ml-gmpi.

Xiaoming-Zhao commented on August 20, 2024

Hi @parkjh688, I need several more information if possible.

In the original paper, the model was trained with a batch size of 64, so I also tried that

How many GPUs did you use to train GMPI? The reason I am asking is that the batch_size specified in the curriculums.py is for batch size per GPU. The batch size of 64 stated in the paper comes from 8 (per GPU) x 8 (#GPUs) = 64.

Previously, when I trained with a batch size of 8, the FID score was around 20 and remained stable until the end of 5000 steps.

What is the dataset you used for this training? And how many GPUs were you using?

I'm wondering if using ffhq256x256 data from Kaggle instead of real 256-sized data could be causing overfitting.

One caveat I could see is that the Kaggle dataset uses a different resizing method from the one the pre-trained StyleGAN2 uses. Specifically:
a. The Kaggle one uses bicubic as specified on the webpage.
b. The StyleGAN2 is trained on images downscaled with lanczos (see here).

So maybe you want to process the dataset following the official instruction to have a double check.

I trained my model with the following configuration: 256: {'batch_size': 64, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 16, 'gen_lr': 0.002, 'disc_lr': 0.002}

I noticed that you have batch_split of 16. This means that you will split the 64 images into 16 mini-batches and accumulate gradients 16 times. Theoretically, this should be fine. However, I would recommend using only one single forward pass if possible to avoid any hidden issues.

Hope these help.

from ml-gmpi.

parkjh688 commented on August 20, 2024

The reason I am asking is that the batch_size specified in the curriculums.py is for batch size per GPU. The batch size of 64 stated in the paper comes from 8 (per GPU) x 8 (#GPUs) = 64.

Oh, I didn't know that. I used 6 GPUs and my configuration in curriculums.py was 256: {'batch_size': 64, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 16, 'gen_lr': 0.002, 'disc_lr': 0.002} this. Then if I want to train the model with 64 batch size I have to train it with 4 GPUs and the configuration will be 256: {'batch_size': 16, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 1, 'gen_lr': 0.002, 'disc_lr': 0.002} like this. 16 (per GPU) x 4 (#GPUs) = 64

What is the dataset you used for this training? And how many GPUs were you using?

I use this kaggle dataset..

Thanks!

from ml-gmpi.

Xiaoming-Zhao commented on August 20, 2024

Got you. So the Kaggle dataset is indeed able to reproduce FID based on your previous statement

Previously, when I trained with a batch size of 8, the FID score was around 20 and remained stable until the end of 5000 steps.

Then I would recommend reducing the batch_split to see whether this is the culprit for the weird FID curve you showed before. As I mentioned before, theoretically, having batch_split = 16 should be fine but I am not sure whether there will be some hidden issues there.

Hope this helps.

from ml-gmpi.

howtowhy commented on August 20, 2024

Hello, thank you for your detailed help.
I downloaded the 1024 image and preprocessing to 256 with script.
And I used following option with 8 GPU

"res_dict": {
256: {'batch_size': 64, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 16, 'gen_lr': 0.002, 'disc_lr': 0.002},
512: {'batch_size': 4, 'num_steps': 32, 'img_size': 512, 'tex_size': 512, 'batch_split': 1, 'gen_lr': 0.002, 'disc_lr': 0.002},
1024: {'batch_size': 4, 'num_steps': 32, 'img_size': 1024, 'tex_size': 1024, 'batch_split': 2, 'gen_lr': 0.002, 'disc_lr': 0.002},
},

"res_dict_learnable_param": {
    256: {'batch_size': 64, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 16, 'gen_lr': 0.002, 'disc_lr': 0.002},
    512: {'batch_size': 4, 'num_steps': 32, 'img_size': 512, 'tex_size': 512, 'batch_split': 2, 'gen_lr': 0.002, 'disc_lr': 0.002},
    1024: {'batch_size': 4, 'num_steps': 32, 'img_size': 1024, 'tex_size': 1024, 'batch_split': 2, 'gen_lr': 0.002, 'disc_lr': 0.002},
},

But the fid goes up and the result was like this.
Could you advice for this situation?

from ml-gmpi.

Xiaoming-Zhao commented on August 20, 2024

Do you mind trying the default configuration:

ml-gmpi/gmpi/curriculums.py

Lines 90 to 94 in 672294b

    
           "res_dict": { 
        
               256: {'batch_size': 8, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 1, 'gen_lr': 0.002, 'disc_lr': 0.002}, 
        
               512: {'batch_size': 4, 'num_steps': 32, 'img_size': 512, 'tex_size': 512, 'batch_split': 1, 'gen_lr': 0.002, 'disc_lr': 0.002}, 
        
               1024: {'batch_size': 4, 'num_steps': 32, 'img_size': 1024, 'tex_size': 1024, 'batch_split': 2, 'gen_lr': 0.002, 'disc_lr': 0.002}, 
        
           },

See the discussion above in this issue. Essentially:

We use batch_size of 8 as it is batch size per GPU.
a. One caveat is that a larger batch size does not always mean better results. Though a larger batch size could contribute to the generator's learning, it could also provide the discriminator with more power to break the balance between the generator and the discriminator.
b. I have not tried with a batch size of 64 x 8 = 512 therefore I am not sure whether this could work.
Maybe reduce the batch_split from 16 to 1 if your GPU memory allows it. Or maybe 2. Theoretically, having batch_split = 16 should be fine but I am not sure whether there will be some hidden issues there.

Hope these help.

from ml-gmpi.

howtowhy commented on August 20, 2024

Hello! Thank you for your kind help.
I run the script with 8 GPU and batchsize you suggested.

"res_dict": {
        256: {'batch_size': 8, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 1, 'gen_lr': 0.002, 'disc_lr': 0.002},
        512: {'batch_size': 4, 'num_steps': 32, 'img_size': 512, 'tex_size': 512, 'batch_split': 1, 'gen_lr': 0.002, 'disc_lr': 0.002},
        1024: {'batch_size': 4, 'num_steps': 32, 'img_size': 1024, 'tex_size': 1024, 'batch_split': 2, 'gen_lr': 0.002, 'disc_lr': 0.002},
    },

    "res_dict_learnable_param": {
        256: {'batch_size': 8, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 1, 'gen_lr': 0.002, 'disc_lr': 0.002},
        512: {'batch_size': 4, 'num_steps': 32, 'img_size': 512, 'tex_size': 512, 'batch_split': 2, 'gen_lr': 0.002, 'disc_lr': 0.002},
        1024: {'batch_size': 4, 'num_steps': 32, 'img_size': 1024, 'tex_size': 1024, 'batch_split': 2, 'gen_lr': 0.002, 'disc_lr': 0.002},
    },

The fid score with 256 was 18.92-21.52. (with one peak pertubation)
But the paper suggest its fid to 11.4
Is there any problem I missed?
Thank you for your quick help so much.

from ml-gmpi.

Xiaoming-Zhao commented on August 20, 2024

This curve looks reasonable to me. I am not sure about the peak but I guess it may due to some randomness.

Regarding the FID: FID score largely depends on the number of images used to compute the score. The more images you use, the large probability you will obtain a lower score.

However, FID with plenty of images is costly to compute. Therefore, during training, we use a small number of images to get a sense of the FID trend:

#real_images = 8000:

ml-gmpi/gmpi/train.py

Line 1013 in 672294b

fid_evaluation.setup_evaluation(
#fake_images=2048:

ml-gmpi/gmpi/train.py

Line 1025 in 672294b

fid_evaluation.output_images(

During the full evaluation, we use 50k fake and real images as stated in the paper and this follows StyleGAN's papers:

ml-gmpi/gmpi/eval/eval.sh

Line 17 in 672294b

N_IMGS=50000

Hope this resolves your confusion.

from ml-gmpi.

Problem in reproducing fid score about ml-gmpi HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	"res_dict": {
	256: {'batch_size': 8, 'num_steps': 32, 'img_size': 256, 'tex_size': 256, 'batch_split': 1, 'gen_lr': 0.002, 'disc_lr': 0.002},
	512: {'batch_size': 4, 'num_steps': 32, 'img_size': 512, 'tex_size': 512, 'batch_split': 1, 'gen_lr': 0.002, 'disc_lr': 0.002},
	1024: {'batch_size': 4, 'num_steps': 32, 'img_size': 1024, 'tex_size': 1024, 'batch_split': 2, 'gen_lr': 0.002, 'disc_lr': 0.002},
	},