Giter Site home page Giter Site logo

janspiry / palette-image-to-image-diffusion-models Goto Github PK

View Code? Open in Web Editor NEW
1.4K 1.4K 182.0 6.86 MB

Unofficial implementation of Palette: Image-to-Image Diffusion Models by Pytorch

License: MIT License

Python 99.85% Shell 0.15%
ddp ddpm diffusion-model image-restoration implementation pytorch

palette-image-to-image-diffusion-models's Introduction

Hi there, Janspiry here ๐Ÿ‘‹

Github Badge Mail Badge Website Badge

GIF

I am a Graduate Student at Beihang University, pursuing a Masters in Computer Science.

Iโ€™m currently working on Computer Vision, including:

  • Image Restoration and Synthesis
  • Object Detection
  • Model Compression

Updates Visitor Badge GitHub hits

Some of my Github Public Stats :octocat:

GitHub Stats

Find me around the web ๐ŸŒ

csdn Badge cnblogs Badge

palette-image-to-image-diffusion-models's People

Contributors

bruce-willis avatar janspiry avatar puanysh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

palette-image-to-image-diffusion-models's Issues

inference results with noise

Hi, thanks for your excellent project !

I am new to diffusion model. Recently, I have trained a colorization model with this code and a dataset with 64 images for 1000 epoches (where the mse loss is quiet small). The inference results with the trained model are still with noise. I am wondering is this because the short training time or the small dataset hard to cover the space of random noise ? Can I leverage some tricks to fix it ?

Appreciate any help !

Validation results are fine but test results are false.

Hi

I have trained inpainting model on custom dataset.
checking validatoin results, I find that training is going just fine.
However, with the checkpoint that worked just fine during validation, I ran the model on test dataset, and it seems like nohting is made from random noises.

val result at epoch 356:
image

test result at epoch 356:
image

My test config file is as follows:

{
    "name": "inpainting_landmark", // experiments name
    "gpu_ids": [2,3], // gpu ids list, default is single 0
    "seed" : -1, // random seed, seed <0 represents randomization not used 
    "finetune_norm": false, // find the parameters to optimize

    "path": { //set every part file path
        "base_dir": "/mnt/storage1/jhkim/landmark/palette/base_dir/", // base path for all log except resume_state
        "code": "/mnt/storage1/jhkim/landmark/palette/code/", // code backup
        "tb_logger": "/mnt/storage1/jhkim/landmark/palette/tb_logger/", // path of tensorboard logger
        "results": "/mnt/storage1/jhkim/landmark/palette/test/",
        "checkpoint": "/mnt/storage1/jhkim/landmark/palette/checkpoint/",
        "resume_state": 355
        // "resume_state": null // ex: 100, loading .state  and .pth from given epoch and iteration
    },

    "datasets": { // train or test
        "train": { 
            "which_dataset": {  // import designated dataset using arguments 
                "name": ["data.dataset", "InpaintDataset"], // import Dataset() class / function(not recommend) from data.dataset.py (default is [data.dataset.py])
                "args":{ // arguments to initialize dataset
                    "data_root": "/mnt/storage1/jhkim/landmark/palette/flist/train.flist",
                    "data_len": -1,
                    "mask_config": {
                        "mask_mode": "onedirection"
                    }
                } 
            },
            "dataloader":{
                "validation_split": 2, // percent or number 
                "args":{ // arguments to initialize train_dataloader
                    "batch_size": 3, // batch size in each gpu
                    "num_workers": 4,
                    "shuffle": true,
                    "pin_memory": true,
                    "drop_last": true
                },
                "val_args":{ // arguments to initialize valid_dataloader, will overwrite the parameters in train_dataloader
                    "batch_size": 1, // batch size in each gpu
                    "num_workers": 4,
                    "shuffle": false,
                    "pin_memory": true,
                    "drop_last": false
                }
            }
        },
        "test": { 
            "which_dataset": {
                "name": "InpaintDataset", // import Dataset() class / function(not recommend) from default file
                "args":{
                    "data_root": "/mnt/storage1/jhkim/landmark/palette/flist/test.flist",
                    "mask_config": {
                        "mask_mode": "onedirection"
                    }
                }
            },
            "dataloader":{
                "args":{
                    "batch_size": 8,
                    "num_workers": 4,
                    "pin_memory": true
                }
            }
        }
    },

    "model": { // networks/metrics/losses/optimizers/lr_schedulers is a list and model is a dict
        "which_model": { // import designated  model(trainer) using arguments 
            "name": ["models.model", "Palette"], // import Model() class / function(not recommend) from models.model.py (default is [models.model.py])
            "args": {
                "sample_num": 8, // process of each image
                "task": "inpainting",
                "ema_scheduler": {
                    "ema_start": 1,
                    "ema_iter": 1,
                    "ema_decay": 0.9999
                },
                "optimizers": [
                    { "lr": 5e-5, "weight_decay": 0}
                ]
            }
        }, 
        "which_networks": [ // import designated list of networks using arguments
            {
                "name": ["models.network", "Network"], // import Network() class / function(not recommend) from default file (default is [models/network.py]) 
                "args": { // arguments to initialize network
                    "init_type": "kaiming", // method can be [normal | xavier| xavier_uniform | kaiming | orthogonal], default is kaiming
                    "module_name": "guided_diffusion", // sr3 | guided_diffusion
                    "unet": {
                        "in_channel": 6,
                        "out_channel": 3,
                        "inner_channel": 64,
                        "channel_mults": [
                            1,
                            2,
                            4,
                            8
                        ],
                        "attn_res": [
                            // 32,
                            16
                            // 8
                        ],
                        "num_head_channels": 32,
                        "res_blocks": 2,
                        "dropout": 0.2,
                        "image_size": 256
                    },
                    "beta_schedule": {
                        "train": {
                            "schedule": "linear",
                            "n_timestep": 2000,
                            // "n_timestep": 5, // debug
                            "linear_start": 1e-6,
                            "linear_end": 0.01
                        },
                        "test": {
                            "schedule": "linear",
                            "n_timestep": 1000,
                            "linear_start": 1e-4,
                            "linear_end": 0.09
                        }
                    }
                }
            }
        ],
        "which_losses": [ // import designated list of losses without arguments
            "mse_loss" // import mse_loss() function/class from default file (default is [models/losses.py]), equivalent to { "name": "mse_loss", "args":{}}
        ],
        "which_metrics": [ // import designated list of metrics without arguments
            "mae" // import mae() function/class from default file (default is [models/metrics.py]), equivalent to { "name": "mae", "args":{}}
        ]
    },

    "train": { // arguments for basic training
        "n_epoch": 1e8, // max epochs, not limited now
        "n_iter": 1e8, // max interations
        "val_epoch": 1, // valdation every specified number of epochs
        "save_checkpoint_epoch": 1,
        "log_iter": 1e4, // log every specified number of iterations
        "tensorboard" : true // tensorboardX enable
    },
    
    "debug": { // arguments in debug mode, which will replace arguments in train
        "val_epoch": 1,
        "save_checkpoint_epoch": 1,
        "log_iter": 10,
        "debug_split": 50 // percent or number, change the size of dataloder to debug_split.
    }
}

Could you help figuring out the problem?

Thank you

`Caught IndexError in DataLoader worker process 0` using `pip` installations

Setup

Running on Windows Subsystem for Linux 2 (WSL2).

git clone https://github.com/Janspiry/Palette-Image-to-Image-Diffusion-Models.git
cd Palette-Image-to-Image-Diffusion-Models
conda create -n pip-palette python==3.9.*
conda activate pip-palette
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt

Config

Same as #21

Directory Structure

Same as #21

Terminal

(pip-palette) sgbaird@Dell-G7:~/GitHub/Palette-Image-to-Image-Diffusion-Models$  cd /home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models ; /usr/bin/env /home/sgbaird/miniconda3/envs/palette/bin/python /home/sgbaird/.vscode-server/extensions/ms-python.python-2022.8.0/pythonFiles/lib/python/debugpy/launcher 36177 -- /home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py -p train -c config/inpainting_celebahq_dummy.json --debug 
export CUDA_VISIBLE_DEVICES=0
/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py:28: UserWarning: You have chosen to use cudnn for accleration. torch.backends.cudnn.enabled=True
  warnings.warn('You have chosen to use cudnn for accleration. torch.backends.cudnn.enabled=True')
(pip-palette) sgbaird@Dell-G7:~/GitHub/Palette-Image-to-Image-Diffusion-Models$  cd /home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models ; /usr/bin/env /home/sgbaird/miniconda3/envs/pip-palette/bin/python /home/sgbaird/.vscode-server/extensions/ms-python.python-2022.8.0/pythonFiles/lib/python/debugpy/launcher 41379 -- /home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py -p train -c config/inpainting_celebahq_dummy.json --debug 
export CUDA_VISIBLE_DEVICES=0
/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py:28: UserWarning: You have chosen to use cudnn for accleration. torch.backends.cudnn.enabled=True
  warnings.warn('You have chosen to use cudnn for accleration. torch.backends.cudnn.enabled=True')
  0%|                                                     | 0/16 [00:00<?, ?it/s]
Close the Tensorboard SummaryWriter.

Error

Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/dataset.py", line 471, in __getitem__
    return self.dataset[self.indices[idx]]
  File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/data/dataset.py", line 54, in __getitem__
    path = self.imgs[index]
IndexError: list index out of range
  File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/_utils.py", line 457, in reraise
    raise exception
  File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
    data.reraise()
  File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
    return self._process_data(data)
  File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
    data = self._next_data()
  File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/models/model.py", line 106, in train_step
    for train_data in tqdm.tqdm(self.phase_loader):
  File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/core/base_model.py", line 45, in train
    train_log = self.train_step()
  File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py", line 58, in main_worker
    model.train()
  File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py", line 92, in <module>
    main_worker(0, 1, opt)
  File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/runpy.py", line 197, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,

typo in readme.md

In readme under Usage / Environment there is pip install -r requirement.txt instead of pip install -r requirements.txt

mask_img = img*(1 - mask) + mask ValueError: operands could not be broadcast together with shapes (3,256,256) (256,256,1)

In dataset.py, the get mask function returns a mask of shape of (h, w, 1).

When the img is opened by PIL, and passed through the transform, self.tfs = transforms.Compose([

The toTensor transform changes the image shape from H,W,C to C,H,W.

When I try to calculate the mask image by doing mask_img = img*(1 - mask) + mask, I get the value error above. Should the image transform be returning the image in the form (C,H,W)?

This is my code:

tfs = transforms.Compose([
                transforms.Resize((256, 256)),
                transforms.ToTensor(),
                transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5,0.5, 0.5])
        ])

from PIL import Image
img = Image.open("C:/Users/Documents/Obama_256x256.jpg").convert('RGB')
img = tfs(img)

mask = face_mask((256,256), landmark_list) #returns facemask of shape (h,w,1)
#mask = np.swapaxes(mask, -1, 0)
mask_img = img*(1 - mask) + mask

Colorization

Hi @Janspiry :
I'd like to know if you can provide pre-trained model weightes for testing colorization? Thanks!

Can't run the train script

Hi, Thanks for the amazing work!

I followed the instructions to run it but getting this error:

Close the Tensorboard SummaryWriter.
Traceback (most recent call last):
  File "C:\Users\Hasan Sayeed\Documents\hasan\SR3\Palette\core\logger.py", line 112, in save_images
    Image.fromarray(outputs[i]).save(os.path.join(result_path, names[i]))
  File "C:\Users\Hasan Sayeed\anaconda3\lib\site-packages\PIL\Image.py", line 2169, in save
    fp = builtins.open(filename, "w+b")
FileNotFoundError: [Errno 2] No such file or directory: 'experiments\\train_inpainting_celebahq_220531_173900\\results\\val\\205\\GT_train.flist\\00037.jpg'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run.py", line 103, in <module>
    main_worker(0, 1, opt)
  File "run.py", line 69, in main_worker
    model.train()
  File "C:\Users\Hasan Sayeed\Documents\hasan\SR3\Palette\core\base_model.py", line 58, in train
    val_log = self.val_step()
  File "C:\Users\Hasan Sayeed\Documents\hasan\SR3\Palette\models\model.py", line 158, in val_step
    self.writer.save_images(self.save_current_results())
  File "C:\Users\Hasan Sayeed\Documents\hasan\SR3\Palette\core\logger.py", line 114, in save_images
    raise NotImplementedError('You must specify the context of name and result in save_current_results functions of model.')
NotImplementedError: You must specify the context of name and result in save_current_results functions of model.

I might have missed something. Do you know what's wrong here?

Question about calculating metrics

When calculating metrics, why are you comparing the input (conditional) image with the generated image? Shouldn't we compare the output with the ground truth image?

for phase_data in tqdm.tqdm(self.phase_loader):
self.set_input(phase_data)
if self.opt['distributed']:
if self.task in ['inpainting','uncropping']:
self.output, self.visuals = self.netG.module.restoration(self.cond_image, y_t=self.cond_image,
y_0=self.gt_image, mask=self.mask, sample_num=self.sample_num)
else:
self.output, self.visuals = self.netG.module.restoration(self.cond_image, sample_num=self.sample_num)
else:
if self.task in ['inpainting','uncropping']:
self.output, self.visuals = self.netG.restoration(self.cond_image, y_t=self.cond_image,
y_0=self.gt_image, mask=self.mask, sample_num=self.sample_num)
else:
self.output, self.visuals = self.netG.restoration(self.cond_image, sample_num=self.sample_num)
self.iter += self.batch_size
self.writer.set_iter(self.epoch, self.iter, phase='test')
for met in self.metrics:
key = met.__name__
value = met(self.cond_image, self.output)
self.val_metrics.update(key, value)
self.writer.add_scalar(key, value)

In particular:

value = met(self.cond_image, self.output)

Multi-task learning pretrained models

Hey,
Thank you for the implementations!
Are there any plans on releasing a pretrained version on the multi-task learning objective (section 5.7 in the paper)?

Thanks,
Eliahu

What exactly does the loss function compute?

Hi,

I'm just a bit confused on how the loss is computed. From my understanding, for a given training loop we have a ground truth image denoted as GT. GT is passed through a series of 1000 timesteps t, and at each timestep a small amount of random gaussian noise is added.

Lets say the network takes in as input the noisy GT image from timestep 50. The network should predict the small amount of noise that was generated at timestep 50 right? So when we compute the loss, it should be the noise the network predicted that was generated at timestep 50 vs the actual noise that was generated at timestep 50? Or am I understanding it wrong.

In that case, why when the loss is calculated, the value for the actual noise is computed as being torch.randn_like(y_0) and not the noise at t=50?

noise = default(noise, lambda: torch.randn_like(y_0))
        y_noisy = self.q_sample(
            y_0=y_0, sample_gammas=sample_gammas.view(-1, 1, 1, 1), noise=noise)

        if mask is not None:
            noise_hat = self.denoise_fn(torch.cat([y_cond, y_noisy*mask+(1.-mask)*y_0], dim=1), sample_gammas)
            loss = self.loss_fn(mask*noise, mask*noise_hat)

How to configure the dataset?

Thanks for putting together this software. Can you provide some direction for how to create a dataset for train and test for image inpainting?

In the configuration file I see references to *.flist file. I assume this is a text list of images to use for testing. But I'm not sure how to create it and format it.

Thanks,
Jay

Unet input channels number for colorization

Hi and thanks for the repo!

I have a question concerning the input dims of the Unet for colorization. If I understand correctly, the Unet is fed a 6 channel image containing noise and a 3-channel version of the black and white image as conditioning.
Is there an advantage in doing this compared to using a 1-channel black and white image as conditioning, given that the network would require less memory?
Is the conditioning less effective if only 1 channel is given?
thanks!

Paper implementation

Hello,

Thank you very much for sharing your code.

when I studied the paper I came across 2 implementation detail. I see that the setting you choose are the same as the hyper parameter for the inpainting?
Screenshot from 2022-10-03 20-11-20

is there an option in the code for regular training the one with 1024 batch size? what is the difference between option 1 and 2 in the screen above?

Conv2d vs ConvTranspose2d in Unet Upsample

Hi there,

Thanks for putting together this repository. I have a question about your implementation of Unet - why are you using Conv2d instead of ConvTranspose2d in your Upsample blocks?

Thanks

Cannot resume training on multiple GPUs

I'm trying to train an inpainting model using multiple GPUs.
Initial training worked fine, progressed well and saved checkpoints to experiments/.../checkpoint folder.

However, when I try to resume the same training (by modifying "resume_state" in the config) I get this error:

Traceback (most recent call last):
  File "/.../Palette-Image-to-Image-Diffusion-Models/run.py", line 58, in main_worker
    model.train()
  File "/.../Palette-Image-to-Image-Diffusion-Models/core/base_model.py", line 45, in train
    train_log = self.train_step()
  File "/.../Palette-Image-to-Image-Diffusion-Models/models/model.py", line 111, in train_step
    self.optG.step()
  File "/.../.conda/envs/.../lib/python3.10/site-packages/torch/optim/optimizer.py", line 109, in wrapper
    return func(*args, **kwargs)
  File "/.../.conda/envs/.../lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/.../.conda/envs/.../lib/python3.10/site-packages/torch/optim/adam.py", line 157, in step
    adam(params_with_grad,
  File "/.../.conda/envs/.../lib/python3.10/site-packages/torch/optim/adam.py", line 213, in adam
    func(params,
  File "/.../.conda/envs/.../lib/python3.10/site-packages/torch/optim/adam.py", line 255, in _single_tensor_adam
    assert not step_t.is_cuda, "If capturable=False, state_steps should not be CUDA tensors."
AssertionError: If capturable=False, state_steps should not be CUDA tensors.

It seems like in multi-GPU when resuming some tensors (parameters or optimizer's internal variables) are not moved to the right device.

Maximum resolution for inference

Is there any limitation for inferencing with higher resolution??
If the answer is no, does the model perform well on higher resolution such as 1024ร—1024 or more for in-painting?

Dockerfile example

Dockerfile that works for me if anyone's interested.

FROM nvidia/cuda:11.0.3-devel-ubuntu20.04

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
    apt-get install python3 -y && \
    apt-get install python3-pip -y && \
	apt-get install git ffmpeg libsm6 libxext6 -y

RUN cd ./home && \
	git clone https://github.com/Janspiry/Palette-Image-to-Image-Diffusion-Models
#I only clone this for the requirements.txt.  I run palette from mounted drive /root/data

RUN pip3 install \
	torch==1.7.0+cu110 \
	torchvision==0.8.1+cu110 \ 
	-f https://download.pytorch.org/whl/torch_stable.html

RUN pip3 install -r ./home/Palette-Image-to-Image-Diffusion-Models/requirements.txt
	
WORKDIR ./root/data

Run with something like

docker run --rm --gpus all -it -v <path_to_palette_repo>:/root/data -v palette

Question about the code of diffusion

Hi! Thank you for the awesome code. I have one question.
When calculating the sample_gamma for q-sample. What's the purpose of line 111 in the screenshot.
I think the formula of DDPM is sample_gamma = extract(self.gammas, t, x_shape=(1, 1))
Thank you

image

Recommended Learning Rate for Colorization

Hi!
Currently, I trained colorization model based on this repo and my own dataset which contains 1325 images. Training results were strange, so I want to improve my model.

My current training situation is as follows:

  • Training time: 40 hours
  • epoch: 1090
  • inters: 1439228
  • training_mse: 0.004237050597690872

According to the other issues, causes of poor result are lack of dataset or learning rate. Considering dataset, I will use Places2 to have enough dataset.

The things I want to know the most is managing learning rate. In "Pallete" paper, it did not change any parameters, so I would like to know the specific parameters for colorization.

My tested result
Out_00002
Out_00004

Custom Training Issue

Hello!

I am trying to create a custom image-to-image model. I've set up a custom dataset that pulls gt_image from one folder and cond_image from a parallel folder.

 img_path = f"{self.data_root}/train_A/{file_name}"
cond_image_path = f"{self.data_root}/train_B/{file_name}"

img = self.tfs(self.loader(img_path))
cond_image = self.tfs(self.loader(cond_image_path))

I've trained the model for 65 epochs and while the images are starting to converge nicely, they look like they are recreating the gt_image and not the conditional image.

I've done some debugging to ensure the image data is correct for both the gt and cond image in train_step(). Is there another place worth double-checking that the data flow is correct?

Is it normal for the outputs to match the original data set at first before beginning to approximate the conditional image? Should I just keep training and hope to see an improvement?

Thanks for this repo!

Input image
20382

Expected output
20382

Actual output
20382 (1)

How to correctly set up a custom dataset class to train the model?

Specifically,

I'm looking at line 58 in Palette-Image-to-Image-Diffusion-Models/data/dataset.py

I'm trying to set up my own custom mask function, to process my dataset, for a variation of the image2image cropping task, and this is what I have so far.

I have code that generates a custom mask like this for a given image. The mask has shape 2562563:
image

and a target image like this:
Obama_256x256

For the get_item method in data/dataset.py, I have some questions regarding the following lines:

img = self.tfs(self.loader(path))
mask = self.get_mask()
cond_image = img*(1. - mask) + masktorch.randn_like(img)
mask_img = img
(1. - mask) + mask

  1. what does that tfs method do, and is it necessary to use on my ground truth image?
  2. For my own get mask function, my mask has shape (h,w,3). Does this need to be (h,w,1) instead to account for the "hole", and valid regions? If so, how do I work around this so that I can include information about the lip position in the mask like I have in image 1.
  3. What does the cond_image calculation do and why is it done?
  4. Why is the masked image calculated this way, and not by doing a bitwise and multiplication between the img and mask? If I try to do this using my own data, the result is messed up completely.

Additionally, I have 1 extra question. Because I want to go from masked image with a drawing of the lips -> gt image, is this more suited for an image colorization task??

Thanks!

Training time

How long is the training given your computational resources?

Training Time + GPU

Just wondering for the models you've already trained using the reduced parameters, what are the specs of the machines you used to train them, and roughly how long did it take for the models to start converging?

GT_datasets in experiments folder missing

Hello,

So I am trying to train the model. It can train fine, but when it tries to val it can't find the GT_datasets folder in the experiments folder. Any advice on how to fix this?

Thanks,
Rory

running the script

Hello,

I have tried to run your code following the guideline.
I downloaded the celeb dataset from Kaggle in the readme file and insert it in the data folder then rename it to celeba_hq so the location of the data now is data/celeba_hq . I downloaded the flist file and insert it celeba_hq/flist. Then In the inpainting_celebahq.json file, I change the train data set like this:

"datasets": { // train or test
"train": {
"which_dataset": { // import designated dataset using arguments
"name": ["data.dataset", "InpaintDataset"], // import Dataset() class / function(not recommend) from data.dataset.py (default is [data.dataset.py])
"args":{ // arguments to initialize dataset
"data_root": "data/celebahq/flist/train.flist",
"data_len": -1,
"mask_config": {
"mask_mode": "hybrid"
}
}
},

but when I try to run the inpaiting script I get the following error:

File "/home/sss/Desktop/Palette-Image-to-Image-Diffusion-Models-main/run.py", line 92, in
main_worker(0, 1, opt)
File "/home/sss/Desktop/Palette-Image-to-Image-Diffusion-Models-main/run.py", line 37, in main_worker
phase_loader, val_loader = define_dataloader(phase_logger, opt) # val_loader is None if phase is test.
File "/home/sss/Desktop/Palette-Image-to-Image-Diffusion-Models-main/data/init.py", line 18, in define_dataloader
phase_dataset, val_dataset = define_dataset(logger, opt)
File "/home/sss/Desktop/Palette-Image-to-Image-Diffusion-Models-main/data/init.py", line 40, in define_dataset
phase_dataset = init_obj(dataset_opt, logger, default_file_name='data.dataset', init_type='Dataset')
File "/home/sss/Desktop/Palette-Image-to-Image-Diffusion-Models-main/core/praser.py", line 49, in init_obj
raise NotImplementedError('{} [{:s}() form {:s}] not recognized.'.format(init_type, class_name, file_name))
NotImplementedError: Dataset [InpaintDataset() form data.dataset] not recognized.

Is there any more step I should do for a successful run? I like to train from the beginning and not use the trained model.

can you guide me, Please?

Do you have a plan of using torch.amp?

It took a little bit long time per epoch.
So, I want to use 'torch.amp' & DDP, but it fails with following error, 'RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same'

Here is my code of 'models/models.py -> train_step:
with torch.cuda.amp.autocast(enabled=self.amp):
loss = self.netG(self.gt_image, self.cond_image, mask=self.mask)

I guess, this error caused from 'CheckpointFunction.backward', but I am not sure.

Could you add 'torch.amp' ?

max_period of gamma embeddings

Hi,

I see that you use the same embeddings for gammas that are used typically for time steps.
However time steps often go from 0 to 1,000.
Should the max_period be updated accordingly? I'm thinking it should be lowered to something following the new order of magnitude (maybe to 10).

`Model [Palette() form models.model] not recognized`, installation inside `conda` environment

setup

Running on Windows Subsystem for Linux 2 (WSL2).

git clone https://github.com/Janspiry/Palette-Image-to-Image-Diffusion-Models.git
cd Palette-Image-to-Image-Diffusion-Models

conda installation per #20

command

python run.py -p train -c config/inpainting_celebahq_dummy.json --debug

inpainting_celebahq_dummy.json

{
    "name": "inpainting_celebahq", // experiments name
    "gpu_ids": [
        0
    ], // gpu ids list, default is single 0
    "seed": -1, // random seed, seed <0 represents randomization not used 
    "finetune_norm": false, // find the parameters to optimize
    "path": { //set every part file path
        "base_dir": "experiments", // base path for all log except resume_state
        "code": "code", // code backup
        "tb_logger": "tb_logger", // path of tensorboard logger
        "results": "results",
        "checkpoint": "checkpoint",
        "resume_state": "experiments/train_inpainting_celebahq_220426_233652/checkpoint/190"
        // "resume_state": null // ex: 100, loading .state  and .pth from given epoch and iteration
    },
    "datasets": { // train or test
        "train": {
            "which_dataset": { // import designated dataset using arguments 
                "name": [
                    "data.dataset",
                    "InpaintDataset"
                ], // import Dataset() class / function(not recommend) from data.dataset.py (default is [data.dataset.py])
                "args": { // arguments to initialize dataset
                    "data_root": "datasets/celebahq_dummy/flist/train.flist",
                    "data_len": -1,
                    "mask_config": {
                        "mask_mode": "hybrid"
                    }
                }
            },
            "dataloader": {
                "validation_split": 2, // percent or number 
                "args": { // arguments to initialize train_dataloader
                    "batch_size": 3, // batch size in each gpu
                    "num_workers": 4,
                    "shuffle": true,
                    "pin_memory": true,
                    "drop_last": true
                },
                "val_args": { // arguments to initialize valid_dataloader, will overwrite the parameters in train_dataloader
                    "batch_size": 1, // batch size in each gpu
                    "num_workers": 4,
                    "shuffle": false,
                    "pin_memory": true,
                    "drop_last": false
                }
            }
        },
        "test": {
            "which_dataset": {
                "name": "InpaintDataset", // import Dataset() class / function(not recommend) from default file
                "args": {
                    "data_root": "datasets/celebahq_dummy/flist/test.flist",
                    "mask_config": {
                        "mask_mode": "center"
                    }
                }
            },
            "dataloader": {
                "args": {
                    "batch_size": 8,
                    "num_workers": 4,
                    "pin_memory": true
                }
            }
        }
    },
    "model": { // networks/metrics/losses/optimizers/lr_schedulers is a list and model is a dict
        "which_model": { // import designated  model(trainer) using arguments 
            "name": [
                "models.model",
                "Palette"
            ], // import Model() class / function(not recommend) from models.model.py (default is [models.model.py])
            "args": {
                "sample_num": 8, // process of each image
                "task": "inpainting",
                "ema_scheduler": {
                    "ema_start": 1,
                    "ema_iter": 1,
                    "ema_decay": 0.9999
                },
                "optimizers": [
                    {
                        "lr": 5e-5,
                        "weight_decay": 0
                    }
                ]
            }
        },
        "which_networks": [ // import designated list of networks using arguments
            {
                "name": [
                    "models.network",
                    "Network"
                ], // import Network() class / function(not recommend) from default file (default is [models/network.py]) 
                "args": { // arguments to initialize network
                    "init_type": "kaiming", // method can be [normal | xavier| xavier_uniform | kaiming | orthogonal], default is kaiming
                    "module_name": "guided_diffusion", // sr3 | guided_diffusion
                    "unet": {
                        "in_channel": 6,
                        "out_channel": 3,
                        "inner_channel": 64,
                        "channel_mults": [
                            1,
                            2,
                            4,
                            8
                        ],
                        "attn_res": [
                            // 32,
                            16
                            // 8
                        ],
                        "num_head_channels": 32,
                        "res_blocks": 2,
                        "dropout": 0.2,
                        "image_size": 256
                    },
                    "beta_schedule": {
                        "train": {
                            "schedule": "linear",
                            "n_timestep": 2000,
                            // "n_timestep": 10, // debug
                            "linear_start": 1e-6,
                            "linear_end": 0.01
                        },
                        "test": {
                            "schedule": "linear",
                            "n_timestep": 1000,
                            "linear_start": 1e-4,
                            "linear_end": 0.09
                        }
                    }
                }
            }
        ],
        "which_losses": [ // import designated list of losses without arguments
            "mse_loss" // import mse_loss() function/class from default file (default is [models/losses.py]), equivalent to { "name": "mse_loss", "args":{}}
        ],
        "which_metrics": [ // import designated list of metrics without arguments
            "mae" // import mae() function/class from default file (default is [models/metrics.py]), equivalent to { "name": "mae", "args":{}}
        ]
    },
    "train": { // arguments for basic training
        "n_epoch": 1e8, // max epochs, not limited now
        "n_iter": 1e8, // max interations
        "val_epoch": 5, // valdation every specified number of epochs
        "save_checkpoint_epoch": 10,
        "log_iter": 1e3, // log every specified number of iterations
        "tensorboard": true // tensorboardX enable
    },
    "debug": { // arguments in debug mode, which will replace arguments in train
        "val_epoch": 1,
        "save_checkpoint_epoch": 1,
        "log_iter": 2,
        "debug_split": 50 // percent or number, change the size of dataloder to debug_split.
    }
}

Directory Structure

image

Error

Exception has occurred: NotImplementedError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Model [Palette() form models.model] not recognized.
  File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/core/praser.py", line 41, in init_obj
    ret = attr(*args, **kwargs)
  File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/models/model.py", line 49, in __init__
    self.netG.set_new_noise_schedule(phase=self.phase)
  File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/models/network.py", line 36, in set_new_noise_schedule
    self.register_buffer('gammas', to_torch(gammas))
  File "/home/sgbaird/miniconda3/envs/palette/lib/python3.9/site-packages/torch/cuda/__init__.py", line 166, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")

During handling of the above exception, another exception occurred:

  File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/core/praser.py", line 49, in init_obj
    raise NotImplementedError('{} [{:s}() form {:s}] not recognized.'.format(init_type, class_name, file_name))
  File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/models/__init__.py", line 10, in create_model
    model = init_obj(model_opt, logger, default_file_name='models.model', init_type='Model')
  File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py", line 44, in main_worker
    model = create_model(
  File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py", line 92, in <module>
    main_worker(0, 1, opt)
  File "/home/sgbaird/miniconda3/envs/palette/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/sgbaird/miniconda3/envs/palette/lib/python3.9/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/sgbaird/miniconda3/envs/palette/lib/python3.9/runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/sgbaird/miniconda3/envs/palette/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/sgbaird/miniconda3/envs/palette/lib/python3.9/runpy.py", line 197, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,

What is the exact role of strict?

Hi I was trying out the test process with the provided pretrained models. But the load_networks function in the model.py file doesn't seem work with strict=True
I get the error message below
image

The model is loaded properly with strict=False and the inpainting also seems to work fine
image
Out_25532

By the way you mentioned that a small dataset was enough for face inpainting but a much larger dataset was required for more complex scene inpainting like the places2 dataset.
I'm currently working on a image to image conditional diffusion model for driving scenes and a dataset of 10k samples doesn't seem to converge at all. Do you think that this is due to the small dataset? I'm curious of how much portion of the 10 million samples were used to train your model from the places2 dataset.

Thanks a bunch.

Iterations used to train the pre-trained models

Hi,

Many thanks for this repo---it is of great help!
I'm wondering what the iterations used to train the pre-trained models specify? It seems I can reach, e.g. 660k iterations as specified in train.log in less than a day, but my models do not perform nearly as well as the pretrained Place2 inpainting model.
It will be very helpful if there is any training log saved while training the pretrained model, as an early reference and assessment of whether my trainings are likely to work at all.

Best regards,
Shengqu

About the linear schedule

Thanks for your nice code.

In the implementation details of the paper, the schedules of training / testing are different: (1e-6, 0.01) vs. (1e-4, 0.09).

Is there any reason for this setting? Intuitively we could achieve better and aligned result with a same schedule.
I notice that (1e-4, 0.09) performs better than (1e-6, 0.01) in a number of cases.

How to train colorization?

Hello, I'm considering to conduct a research about colorization, but I don't know how to train.
How can I train for colorization?
Can I train colorization by changing the code you mentioned in READ.me?

"which_dataset": { // import designated dataset using arguments "name": ["data.dataset", "InpaintDataset"], // import Dataset() class "args":{ // arguments to initialize dataset "data_root": "your data path", "data_len": -1, "mask_mode": "hybrid" } },

How to train for removing photo blur

Thank you for the awesome work and detailed documentation.

How would I go about creating a model to remove photo blur?
I want to remove blur from macro photos of flowers. I have a dataset of sharp macro photos.
Could I use any of the existing configs like ex. colorization or would I have to crate a completely new config for this task?

resume training

Hello,

I am trying to resume training for celeb dataset from your check point, I changed the inpainting_celebahq.json like the instructions.

  "path": { //set every part file path
        "base_dir": "experiments", // base path for all log except resume_state
        "code": "code", // code backup
        "tb_logger": "tb_logger", // path of tensorboard logger
        "results": "results",
        "checkpoint": "checkpoint",
        "resume_state": "experiments/train_inpainting_celebahq_221006_180531/checkpoint/200",
        "resume_state": "200"
        // "resume_state": null // ex: 100, loading .state  and .pth from given epoch and iteration
    },

I inserted the 200.state and 200_Network.pth in the "200" folder, but after running the command

python run.py -p train -c config/inpainting_celebahq.json

The training doesn't start. I don't even get an error, I only get Close the Tensorboard SummaryWriter. in the ouput.
What is the correct way of resuming the training.

Conditioning on y_cond

Hi, thanks for the awesome codes :)

One question for the inpainting task:

Looking at the following snippet from your code in networks.py, I cannot understand why you are conditioning your model on y_cond if you are already modifying your y_noisy based on the y_0 image using the expression "y_noisy*mask+(1.-mask)*y_0"?

Shouldn't concatenating with y_cond be redundant in this case? Your model is already seeing the ground truth parts of the image in the modified version of the y_noisy.

    def forward(self, y_0, y_cond=None, mask=None, noise=None):
        # sampling from p(gammas)
        b, *_ = y_0.shape
        t = torch.randint(1, self.num_timesteps, (b,), device=y_0.device).long()
        gamma_t1 = extract(self.gammas, t-1, x_shape=(1, 1))
        sqrt_gamma_t2 = extract(self.gammas, t, x_shape=(1, 1))
        sample_gammas = (sqrt_gamma_t2-gamma_t1) * torch.rand((b, 1), device=y_0.device) + gamma_t1
        sample_gammas = sample_gammas.view(b, -1)

        noise = default(noise, lambda: torch.randn_like(y_0))
        y_noisy = self.q_sample(
            y_0=y_0, sample_gammas=sample_gammas.view(-1, 1, 1, 1), noise=noise)

        if mask is not None:
            noise_hat = self.denoise_fn(torch.cat([y_cond, y_noisy*mask+(1.-mask)*y_0], dim=1), sample_gammas)
            loss = self.loss_fn(mask*noise, mask*noise_hat)
        else:
            noise_hat = self.denoise_fn(torch.cat([y_cond, y_noisy], dim=1), sample_gammas)
            loss = self.loss_fn(noise, noise_hat)
        return loss

The right way of inference

Hey guys,
can you please direct me on how to properly inference the trained model?
I wrote a small script for it, but not sure that I am doing everything right.
One of the conceptual questions for me is the robustness of the results. I tried to train the colorization model, and during inference it gives me different results for the same image. Is that because of randomness in a noise scheduling? Or is it something else?
Thanks in advance!

Cannot run the test script

Hi, thanks for the nice work. However, when I try to run the test scripts, it failed,
NotImplementedError: Model [Palette() form models.model] not recognized.
I use the checkpoints and state file you provide.
Do you know what's the problem?
Thanks a lot.

My installation instructions for conda

conda

conda create -n palette python==3.9.*
conda activate palette
conda install pytorch:pytorch pytorch:torchvision pytorch:cudatoolkit=11.3 numpy pandas tqdm scipy tensorboardx

(though mamba install preferred, especially if you get a dependency conflict)

For pytorch installation commands, see https://pytorch.org/get-started/locally/

then install the leftover pip dependencies:

pip install opencv-python clean-fid

Note that opencv had a conflict due to blas or something, which came up very quickly by trying to mamba install. Couldn't find an Anaconda distribution of clean-fid.

Colorization training isn't working

I downloaded the flicker25k dataset, preprocessed it and train a model with these modifications in the config file:

  • batch size of 256 among 4 GPUs (thus total batch size of 1024)
  • image resolution 64x64

The rest of the configurations remained as in the current config file.
Even after 1000 training epochs, the model still produces bad results.

Is there anything I'm missing? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.