chenwu98 / cycle-diffusion Goto Github PK
View Code? Open in Web Editor NEW[ICCV 2023] A latent space for stochastic diffusion models
License: Other
[ICCV 2023] A latent space for stochastic diffusion models
License: Other
Hi, I trained “translate cat-to-dog” , and generated a folder called “translate_afhqcat256_to_afhqdog256_ddim_eta0142” in the output folder, but after training for a long time, no images were saved in it. When will the images be saved? Another question is what does “self.output_interval” in “trainer.py” control?
As we are training two diffusion models independently for two related domains, I've been considering whether there are any specific training techniques involved in the Cycle diffusion process?
Is there any loss function or weight update in Cycle Diffusion process?
If such techniques exist, could you please direct me to the section of your paper where they are explained?
Thanks!
When I checked the data load, I noticed len(dataset_splits['dev'])=0. It doesn't look like the data is loaded. Why is that?
Where else should I set the path to load the data.
Duplicate of #22
When running the above model, the output folder (translate_afhqcat256_to_afhqdog256_ddim_eta0142) is created, and the model starts to train, but no images (or anything else) is being saved inside that folder. I have been training for several hours, for 150+ iterations. Using all the default parameters on the repo's readme.
Hello, thanks for the great work.
I had some questions with plug-and-play guidance.
(1) In section 3.4, the latent code is updated with langevin dynamics, which includes term for scores for the latent code, z.
I am curious of how to obtain the scores for z.
(2) Also, for me it seems like plug-and-play guidance is exactly the same as classifier guidance if the energy term (CLIP or Face Recognition case) is taking target image + some level of noise as input. Could you explain the difference between the two?
Thank you.
Hi thanks for the wonderful work.
Can we interpolate two image latents and feed the modified latent to generator obtain a third image like in DDIM model??
Hi, Chen
I noticed that the early stop should not be used according C.2 in the supplementary.
However, in your implementation, the early stop is set as 850
Thanks for sharing your work.
I want to train my customed unpaired image to image tasks. And I found this tutorial . But I have a question about how to write customed yaml like afhq.yaml
:
data:
dataset: "AFHQ"
category: "dog"
image_size: 256
channels: 3
logit_transform: false
uniform_dequantization: false
gaussian_dequantization: false
random_flip: true
rescaled: true
num_workers: 0
diffusion:
beta_schedule: linear
beta_start: 0.0001
beta_end: 0.02
num_diffusion_timesteps: 1000
Is it automatically generated or can I customize it?
Looking for your replying. Thanks.
[Customized use for zero-shot image-to-image translation] I Follow this step to reproduce
An error occurred as follows:
Traceback (most recent call last):
File "main.py", line 150, in
main()
File "main.py", ine 124, in main
metrics = trainer.evaluate(
File "/home/zhangzhang/zz/cycle-diffusion-main/trainer/trainer.py", line 1048, in evaluate
metrics,num_samples = eval_loop(
File "/home/zhangzhang/zz/cycle-diffusion-main/trainer/trainer.py", line 867,in evaluation_loop
images, weighted_loss, losses= all_prediction_outputs
TypeError: cannot unpack non-iterable NoneType object
all_prediction_outputs printed return value is None
Can you tell me how to solve it? Thank you very much!
how to test with custom images
C:\Github_Code\GAN\cycle-diffusion>conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed
ResolvePackageNotFound:
my anaconada version is conda 23.7.4
OS: window 11
GPU: RTX2070
Thanks for sharing the great work!
How to train the unpaired image-to-image translation on one GPU?
export CUDA_VISIBLE_DEVICES=1
export RUN_NAME=translate_afhqcat256_to_afhqdog256_ddim_eta01
export SEED=42
nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1446 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 50 --metric_for_best_model CLIPEnergy --greater_is_better false --save_strategy steps --save_steps 50 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 4 --num_train_epochs 0 --adafactor false --learning_rate 1e-3 --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true > $RUN_NAME$SEED.log 2>&1 &
Hi! I am trying to run unpaired image-to-image translation on a custom dataset.
I am using Hugging Face Unconditional Image Generation Pipeline to train a DDPM model on both domains.
How can I use the saved checkpoints, which are stored as safetensors for the unpaired I2I translation?
I am interested in implementing super resolution using cycle diffusion on images.
My low-resolution images are 64x64
, while the high-resolution ones are 512x512
.
Although one solution could be to resize the low-resolution images to match the high-resolution ones but doing so will increase the model parameters for the low-resolution images.
Therefore, I am considering using the original image size and wondering if that is possible.
Thanks!
Hi Chen,
Sorry for bothering you again, you set "custom step=99" and "white_box_step=100" for stable diffusion. Could you please explain what are "custom step" and "white_box_step"? It will be very helpful for me to understand the code.
Thx
hello. I run Cycle-diffusion on a custom dataset, but there seems to be a problem with specifying the data.
I modified the "/data/translate-text.json" as you said. also, translate_text512.cfg in the "config" folder was also modified.
The code below is the modified "/config/tasks/translate_text512.cfg"
[raw_data]
data_program = ./raw_data/empty.py
data_cache_dir = ./data/dataset/
use_cache = True
[preprocess]
preprocess_program = translate_text512
expansion = 1
[evaluation]
evaluator_program = translate_text
Q1. It looks like I need to modify the data_program variable to suit me, is that correct?
However, the following error occurs due to incorrect modification.
Below is the text of the error that occurred
Q2. Please see the error below and let me know if you have a solution.
Rank 0 Trainer build successfully.
INFO:main:*** Evaluate ***
INFO:trainer.trainer:***** Running eval *****
INFO:trainer.trainer: Num examples = 0
INFO:trainer.trainer: Batch size = 1
0it [00:00, ?it/s] 0it [00:00, ?it/s]
Traceback (most recent call last):
File "/mnt/hdd0-4tb/home/cyclediffusion/cycle-diffusion-main/main.py", line 153, in
main()
File "/mnt/hdd0-4tb/home/cyclediffusion/cycle-diffusion-main/main.py", line 127, in main
metrics = trainer.evaluate(
File "/mnt/hdd0-4tb/home/cyclediffusion/cycle-diffusion-main/trainer/trainer.py", line 1047, in evaluate
metrics, num_samples = eval_loop(
File "/mnt/hdd0-4tb/home/cyclediffusion/cycle-diffusion-main/trainer/trainer.py", line 867, in evaluation_loop
images, weighted_loss, losses = all_prediction_outputs
TypeError: cannot unpack non-iterable NoneType object
Hi~
I use translate_afhqwild256_to_afhqdog256_ddim_eta01.cfg to excute Unpaired image-to-image translation with diffusion models trained on two domains. I have some problems:
1.I use sample_type = ddim, but I cannot understand what "custom_steps = 1000, refine_steps = 100, es_steps = 850" stand for?
2.Is the denoising steps when training the two models in the source and target domains equal to the customer_steps here?
3.What should I change if I want to use fewer diffusion steps in this code like ddim? I can't find parameters similar to "--timestep_respacing ddim250" in this code.
Thanks a lot!
Hi @ChenWu98 ,
Thanks for sharing the awesome work.
Would it be possible to share a jupyter-notebook to reproduce the results in fig2?
Hi ChenWu98,
can you take a look at the error about import dtasets and import transformers when directly running the main.py?
Hi, Chen, thanks for you great work.
I have tried to run zero-shot image-to-image translation with Stable Diffusion v1-4 with A100 single GPU, it costs 14.5 hours to finish the task, does this expected? Maybe the time consuming is too long.
export CUDA_VISIBLE_DEVICES=0
export RUN_NAME=translate_text2img256_stable_diffusion_stochastic_1
export SEED=42
nohup python -m torch.distributed.launch --nproc_per_node 1 --master_port 1405 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 50 --metric_for_best_model CLIPEnergy --greater_is_better false --save_strategy steps --save_steps 50 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 4 --num_train_epochs 0 --adafactor false --learning_rate 1e-3 --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 1 --per_device_eval_batch_size 4 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true > $RUN_NAME$SEED.log 2>&1 &
I find that the training part code seems incomplete,If we want to train custom data set with your code, what should be done. I also can't find your cycle consistent code, is that not complete yet? Thanks for sharing your code but please explain better.
Thanks in advance
Dear @ChenWu98 ,
Given two stochastic DPMs G1 and G2 that model two distributions D1 and D2, several researchers and practitioners have found that sampling with the same “random seed” leads to similar images (Nichol et al., 2022)
For the above claim, would it be possible to point out the corresponding results in this paper? https://arxiv.org/pdf/2112.10741.pdf
It seems that all the compared models are trained on the same domain in the GLIDE paper.
Hi, thanks for your answer, I understand you use the pretrained diffusion model,but I'm still confused why you call alogrithm1 as cycle diffusion. Is it similiar with the idea of cycleGan, I can't find the cycle consistent inside. Do we need extra training to hold the cycle beside the pretrained model?
Thanks again!
torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 613558) of binary: /home/*****/anaconda3/envs/joohoon_cd/bin/python
Traceback (most recent call last):
File "/home/*****/anaconda3/envs/generative_prompt/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/*****/anaconda3/envs/generative_prompt/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/*****/anaconda3/envs/generative_prompt/lib/python3.9/site-packages/torch/distributed/launch.py", line 196, in <module>
main()
File "/home/*****/anaconda3/envs/generative_prompt/lib/python3.9/site-packages/torch/distributed/launch.py", line 192, in main
launch(args)
File "/home/*****/anaconda3/envs/generative_prompt/lib/python3.9/site-packages/torch/distributed/launch.py", line 177, in launch
run(args)
File "/home/*****/anaconda3/envs/generative_prompt/lib/python3.9/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/home/*****/anaconda3/envs/generative_prompt/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/*****/anaconda3/envs/generative_prompt/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
main.py FAILED
Thanks
Thanks for this amazing work.
I wonder where I can find any code example for section 4.3 (plug-and-play guidance for diffusion models) with e.g., stable diffusion? I believe it is not included in the cycle_diffusion pipeline?
Thanks!
An error occurred while decompressing:
解压的时候出现了错误:
(ldm) home/InST-main/cycle-diffusion-main/ckpts$ unzip ldm_models.zip
Archive: ldm_models.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of ldm_models.zip or
ldm_models.zip.zip, and cannot find ldm_models.zip.ZIP, period.
run
python -m torch.distributed.launch --nproc_per_node 1 --master_port 1498 main.py --seed $SEED --cfg experiments/$RUN_NAME.cfg --run_name $RUN_NAME$SEED --logging_strategy steps --logging_first_step true --logging_steps 4 --evaluation_strategy steps --eval_steps 50 --metric_for_best_model CLIPEnergy --greater_is_better false --save_strategy steps --save_steps 50 --save_total_limit 1 --load_best_model_at_end --gradient_accumulation_steps 4 --num_train_epochs 0 --adafactor false --learning_rate 1e-3 --do_eval --output_dir output/$RUN_NAME$SEED --overwrite_output_dir --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --eval_accumulation_steps 4 --ddp_find_unused_parameters true --verbose true
an error occurred:
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
this seems to be due to the Pytorch version, would you tell me the version of Pytorch you are using.
这似乎是Pytorch版本导致的,你可以说一下你使用的pytorch版本吗?
complete logs:
833 256823.1875
834 262594.46875
835 269648.78125
836 278420.9375
837 289375.3125
838 301356.34375
839 319525.6875
840 336861.6875
841 366365.5
842 400465.03125
843 446937.90625
844 522279.5
845 637578.5
846 847171.375
847 1310254.0
848 2956609.0
at tensor([[[[0.7883]]]], device='cuda:0')
1it [00:50, 50.43s/it]
1it [00:00, 4.20it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.02it/s]
Traceback (most recent call last):
File "/ssd/xiedong/home/InST-main/cycle-diffusion-main/main.py", line 160, in <module>
main()
File "/ssd/xiedong/home/InST-main/cycle-diffusion-main/main.py", line 128, in main
metrics = trainer.evaluate(
File "/ssd/xiedong/home/InST-main/cycle-diffusion-main/trainer/trainer.py", line 1047, in evaluate
metrics, num_samples = eval_loop(
File "/ssd/xiedong/home/InST-main/cycle-diffusion-main/trainer/trainer.py", line 872, in evaluation_loop
metrics = self.compute_metrics(images,
File "/ssd/xiedong/home/InST-main/cycle-diffusion-main/evaluation/multi_task.py", line 65, in evaluate
summary_tmp = evaluator.evaluate(**eval_kwargs, split=split)
File "/ssd/xiedong/home/InST-main/cycle-diffusion-main/evaluation/translate_to_dog.py", line 81, in evaluate
kid_score = fid.compute_kid(
File "/ssd/xiedong/miniconda3/envs/generative_prompt/lib/python3.9/site-packages/cleanfid/fid.py", line 356, in compute_kid
feat_model = build_feature_extractor(mode, device)
File "/ssd/xiedong/miniconda3/envs/generative_prompt/lib/python3.9/site-packages/cleanfid/features.py", line 42, in build_feature_extractor
feat_model = feature_extractor(name="torchscript_inception", resize_inside=False, device=device)
File "/ssd/xiedong/miniconda3/envs/generative_prompt/lib/python3.9/site-packages/cleanfid/features.py", line 21, in feature_extractor
model = InceptionV3W(path, download=True, resize_inside=resize_inside).to(device)
File "/ssd/xiedong/miniconda3/envs/generative_prompt/lib/python3.9/site-packages/cleanfid/inception_torchscript.py", line 35, in __init__
self.base = torch.jit.load(path).eval()
File "/ssd/xiedong/miniconda3/envs/generative_prompt/lib/python3.9/site-packages/torch/jit/_serialization.py", line 162, in load
cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb:
wandb: Synced translate_afhqwild256_to_afhqdog256_ddim_eta0142: https://wandb.ai/xxcc/graphql/runs/13afq6e4
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20230807_152834-13afq6e4/logs
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2859870) of binary: /ssd/xiedong/miniconda3/envs/generative_prompt/bin/python
Traceback (most recent call last):
File "/ssd/xiedong/miniconda3/envs/generative_prompt/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/ssd/xiedong/miniconda3/envs/generative_prompt/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/ssd/xiedong/miniconda3/envs/generative_prompt/lib/python3.9/site-packages/torch/distributed/launch.py", line 195, in <module>
main()
File "/ssd/xiedong/miniconda3/envs/generative_prompt/lib/python3.9/site-packages/torch/distributed/launch.py", line 191, in main
launch(args)
File "/ssd/xiedong/miniconda3/envs/generative_prompt/lib/python3.9/site-packages/torch/distributed/launch.py", line 176, in launch
run(args)
File "/ssd/xiedong/miniconda3/envs/generative_prompt/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/ssd/xiedong/miniconda3/envs/generative_prompt/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/ssd/xiedong/miniconda3/envs/generative_prompt/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
main.py FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-08-07_15:29:47
host : gpu20
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 2859870)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
sorry,I do not know where can i set my parameter
Hello and thanks for sharing your code.
I want to use your implementation with LDM models. I have already trained two LDMs on two separate (but related) domains. How should I proceed so as to perform Image translation between these two domains using my pretrained LDM models.
Thanks in advance.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.