Hi, I tried to start learning, but the following error occurred. I set requirements and my parameters are as follows. How can I solve this error?
NAME: ffhq
DATASET:
PART_COUNT: 1
SIZE: 85965
FFHQ_SOURCE:
/workspace/data/train_tf/
PATH: /workspace/data/train_tf/
PART_COUNT_TEST: 1
PATH_TEST: /workspace/data/test_tf/
SAMPLES_PATH: /workspace/data/test/
STYLE_MIX_PATH: ./style_mixing
MAX_RESOLUTION_LEVEL: 8
MODEL:
LATENT_SPACE_SIZE: 512
LAYER_COUNT: 7
MAX_CHANNEL_COUNT: 512
START_CHANNEL_COUNT: 64
DLATENT_AVG_BETA: 0.995
MAPPING_LAYERS: 8
BETA_KL: 0.2
BETA_REC: 0.1
BETA_NEG: [2048, 2048, 2048, 1024, 512, 512, 512, 512, 512]
SCALE: 0.000005
OUTPUT_DIR: ./output
TRAIN:
BASE_LEARNING_RATE: 0.002
EPOCHS_PER_LOD: 2
NUM_VAE: 1
LEARNING_DECAY_RATE: 0.1
LEARNING_DECAY_STEPS: []
TRAIN_EPOCHS: 300
4 8 16 32 64 128 256
LOD_2_BATCH_8GPU: [512, 256, 128, 64, 32, 32, 32, 32, 32] # If GPU memory ~16GB reduce last number from 32 to 24
LOD_2_BATCH_4GPU: [ 512, 256, 128, 64, 32, 32, 16, 8, 4 ]
LOD_2_BATCH_2GPU: [ 128, 128, 128, 64, 32, 16, 8 ]
LOD_2_BATCH_1GPU: [ 32, 32, 32, 32, 16, 8, 4 ]
LEARNING_RATES: [0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.002, 0.003, 0.003]
2021-01-16 14:11:11,875 logger INFO: Running with config:
DATASET:
FFHQ_SOURCE: /workspace/data/train_tf/
FLIP_IMAGES: False
MAX_RESOLUTION_LEVEL: 8
PART_COUNT: 1
PART_COUNT_TEST: 1
PATH: /workspace/data/train_tf/
PATH_TEST: /workspace/data/test_tf/
SAMPLES_PATH: /workspace/data/test/
SIZE: 85965
SIZE_TEST: 25000
STYLE_MIX_PATH: ./style_mixing
MODEL:
BETA_KL: 0.2
BETA_NEG: [2048, 2048, 2048, 1024, 512, 512, 512, 512, 512]
BETA_REC: 0.1
CHANNELS: 3
DLATENT_AVG_BETA: 0.995
ENCODER: EncoderDefault
GENERATOR: GeneratorDefault
LATENT_SPACE_SIZE: 512
LAYER_COUNT: 7
MAPPING_FROM_LATENT: MappingFromLatent
MAPPING_LAYERS: 8
MAPPING_TO_LATENT: MappingToLatent
MAX_CHANNEL_COUNT: 512
SCALE: 5e-06
START_CHANNEL_COUNT: 64
STYLE_MIXING_PROB: 0.9
TRUNCATIOM_CUTOFF: 8
TRUNCATIOM_PSI: 0.7
Z_REGRESSION: False
NAME: ffhq
OUTPUT_DIR: ./output
PPL_CELEBA_ADJUSTMENT: False
TRAIN:
ADAM_BETA_0: 0.0
ADAM_BETA_1: 0.99
BASE_LEARNING_RATE: 0.002
EPOCHS_PER_LOD: 2
LEARNING_DECAY_RATE: 0.1
LEARNING_DECAY_STEPS: []
LEARNING_RATES: [0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.002, 0.003, 0.003]
LOD_2_BATCH_1GPU: [32, 32, 32, 32, 16, 8, 4]
LOD_2_BATCH_2GPU: [128, 128, 128, 64, 32, 16, 8]
LOD_2_BATCH_4GPU: [512, 256, 128, 64, 32, 32, 16, 8, 4]
LOD_2_BATCH_8GPU: [512, 256, 128, 64, 32, 32, 32, 32, 32]
NUM_VAE: 1
REPORT_FREQ: [100, 80, 60, 30, 20, 10, 10, 5, 5]
SNAPSHOT_FREQ: [300, 300, 300, 100, 50, 30, 20, 20, 10]
TRAIN_EPOCHS: 300
Running on Quadro RTX 8000
2021-01-16 14:11:17,697 logger INFO: Trainable parameters decoder:
23984981
2021-01-16 14:11:17,698 logger INFO: Trainable parameters encoder:
26789952
2021-01-16 14:11:17,700 logger INFO: No checkpoint found. Initializing model from scratch
2021-01-16 14:11:17,700 logger INFO: Starting from epoch: 0
2021-01-16 14:11:18,534 logger INFO: ################################################################################
2021-01-16 14:11:18,534 logger INFO: # Switching LOD to 0
2021-01-16 14:11:18,534 logger INFO: # Starting transition
2021-01-16 14:11:18,534 logger INFO: ################################################################################
2021-01-16 14:11:18,534 logger INFO: ################################################################################
2021-01-16 14:11:18,534 logger INFO: # Transition ended
2021-01-16 14:11:18,535 logger INFO: ################################################################################
beta negative changed to: 2048
2021-01-16 14:11:18,535 logger INFO: Batch size: 32, Batch size per GPU: 32, LOD: 0 - 4x4, blend: 1.000, dataset size: 85965
Aborted (core dumped)