Hi there, Janspiry here ๐
I am a Graduate Student at Beihang University, pursuing a Masters in Computer Science.
Iโm currently working on Computer Vision, including:
- Image Restoration and Synthesis
- Object Detection
- Model Compression
Unofficial implementation of Palette: Image-to-Image Diffusion Models by Pytorch
License: MIT License
I am a Graduate Student at Beihang University, pursuing a Masters in Computer Science.
Iโm currently working on Computer Vision, including:
Is there any way to set to uncrop on the right side of the pictures only ?
I try output error AssertionError: Validation set size is configured to be larger than entire dataset
%cd Palette-Image-to-Image-Diffusion-Models/
!python run.py -p train -c config/inpainting_celebahq.json
Hi, thanks for your excellent project !
I am new to diffusion model. Recently, I have trained a colorization model with this code and a dataset with 64 images for 1000 epoches (where the mse loss is quiet small). The inference results with the trained model are still with noise. I am wondering is this because the short training time or the small dataset hard to cover the space of random noise ? Can I leverage some tricks to fix it ?
Appreciate any help !
Hi
I have trained inpainting model on custom dataset.
checking validatoin results, I find that training is going just fine.
However, with the checkpoint that worked just fine during validation, I ran the model on test dataset, and it seems like nohting is made from random noises.
My test config file is as follows:
{
"name": "inpainting_landmark", // experiments name
"gpu_ids": [2,3], // gpu ids list, default is single 0
"seed" : -1, // random seed, seed <0 represents randomization not used
"finetune_norm": false, // find the parameters to optimize
"path": { //set every part file path
"base_dir": "/mnt/storage1/jhkim/landmark/palette/base_dir/", // base path for all log except resume_state
"code": "/mnt/storage1/jhkim/landmark/palette/code/", // code backup
"tb_logger": "/mnt/storage1/jhkim/landmark/palette/tb_logger/", // path of tensorboard logger
"results": "/mnt/storage1/jhkim/landmark/palette/test/",
"checkpoint": "/mnt/storage1/jhkim/landmark/palette/checkpoint/",
"resume_state": 355
// "resume_state": null // ex: 100, loading .state and .pth from given epoch and iteration
},
"datasets": { // train or test
"train": {
"which_dataset": { // import designated dataset using arguments
"name": ["data.dataset", "InpaintDataset"], // import Dataset() class / function(not recommend) from data.dataset.py (default is [data.dataset.py])
"args":{ // arguments to initialize dataset
"data_root": "/mnt/storage1/jhkim/landmark/palette/flist/train.flist",
"data_len": -1,
"mask_config": {
"mask_mode": "onedirection"
}
}
},
"dataloader":{
"validation_split": 2, // percent or number
"args":{ // arguments to initialize train_dataloader
"batch_size": 3, // batch size in each gpu
"num_workers": 4,
"shuffle": true,
"pin_memory": true,
"drop_last": true
},
"val_args":{ // arguments to initialize valid_dataloader, will overwrite the parameters in train_dataloader
"batch_size": 1, // batch size in each gpu
"num_workers": 4,
"shuffle": false,
"pin_memory": true,
"drop_last": false
}
}
},
"test": {
"which_dataset": {
"name": "InpaintDataset", // import Dataset() class / function(not recommend) from default file
"args":{
"data_root": "/mnt/storage1/jhkim/landmark/palette/flist/test.flist",
"mask_config": {
"mask_mode": "onedirection"
}
}
},
"dataloader":{
"args":{
"batch_size": 8,
"num_workers": 4,
"pin_memory": true
}
}
}
},
"model": { // networks/metrics/losses/optimizers/lr_schedulers is a list and model is a dict
"which_model": { // import designated model(trainer) using arguments
"name": ["models.model", "Palette"], // import Model() class / function(not recommend) from models.model.py (default is [models.model.py])
"args": {
"sample_num": 8, // process of each image
"task": "inpainting",
"ema_scheduler": {
"ema_start": 1,
"ema_iter": 1,
"ema_decay": 0.9999
},
"optimizers": [
{ "lr": 5e-5, "weight_decay": 0}
]
}
},
"which_networks": [ // import designated list of networks using arguments
{
"name": ["models.network", "Network"], // import Network() class / function(not recommend) from default file (default is [models/network.py])
"args": { // arguments to initialize network
"init_type": "kaiming", // method can be [normal | xavier| xavier_uniform | kaiming | orthogonal], default is kaiming
"module_name": "guided_diffusion", // sr3 | guided_diffusion
"unet": {
"in_channel": 6,
"out_channel": 3,
"inner_channel": 64,
"channel_mults": [
1,
2,
4,
8
],
"attn_res": [
// 32,
16
// 8
],
"num_head_channels": 32,
"res_blocks": 2,
"dropout": 0.2,
"image_size": 256
},
"beta_schedule": {
"train": {
"schedule": "linear",
"n_timestep": 2000,
// "n_timestep": 5, // debug
"linear_start": 1e-6,
"linear_end": 0.01
},
"test": {
"schedule": "linear",
"n_timestep": 1000,
"linear_start": 1e-4,
"linear_end": 0.09
}
}
}
}
],
"which_losses": [ // import designated list of losses without arguments
"mse_loss" // import mse_loss() function/class from default file (default is [models/losses.py]), equivalent to { "name": "mse_loss", "args":{}}
],
"which_metrics": [ // import designated list of metrics without arguments
"mae" // import mae() function/class from default file (default is [models/metrics.py]), equivalent to { "name": "mae", "args":{}}
]
},
"train": { // arguments for basic training
"n_epoch": 1e8, // max epochs, not limited now
"n_iter": 1e8, // max interations
"val_epoch": 1, // valdation every specified number of epochs
"save_checkpoint_epoch": 1,
"log_iter": 1e4, // log every specified number of iterations
"tensorboard" : true // tensorboardX enable
},
"debug": { // arguments in debug mode, which will replace arguments in train
"val_epoch": 1,
"save_checkpoint_epoch": 1,
"log_iter": 10,
"debug_split": 50 // percent or number, change the size of dataloder to debug_split.
}
}
Could you help figuring out the problem?
Thank you
Hello,
Thank you for your code.
Can you help us to run it on our own dataset?
Running on Windows Subsystem for Linux 2 (WSL2).
git clone https://github.com/Janspiry/Palette-Image-to-Image-Diffusion-Models.git
cd Palette-Image-to-Image-Diffusion-Models
conda create -n pip-palette python==3.9.*
conda activate pip-palette
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
Same as #21
Same as #21
(pip-palette) sgbaird@Dell-G7:~/GitHub/Palette-Image-to-Image-Diffusion-Models$ cd /home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models ; /usr/bin/env /home/sgbaird/miniconda3/envs/palette/bin/python /home/sgbaird/.vscode-server/extensions/ms-python.python-2022.8.0/pythonFiles/lib/python/debugpy/launcher 36177 -- /home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py -p train -c config/inpainting_celebahq_dummy.json --debug
export CUDA_VISIBLE_DEVICES=0
/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py:28: UserWarning: You have chosen to use cudnn for accleration. torch.backends.cudnn.enabled=True
warnings.warn('You have chosen to use cudnn for accleration. torch.backends.cudnn.enabled=True')
(pip-palette) sgbaird@Dell-G7:~/GitHub/Palette-Image-to-Image-Diffusion-Models$ cd /home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models ; /usr/bin/env /home/sgbaird/miniconda3/envs/pip-palette/bin/python /home/sgbaird/.vscode-server/extensions/ms-python.python-2022.8.0/pythonFiles/lib/python/debugpy/launcher 41379 -- /home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py -p train -c config/inpainting_celebahq_dummy.json --debug
export CUDA_VISIBLE_DEVICES=0
/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py:28: UserWarning: You have chosen to use cudnn for accleration. torch.backends.cudnn.enabled=True
warnings.warn('You have chosen to use cudnn for accleration. torch.backends.cudnn.enabled=True')
0%| | 0/16 [00:00<?, ?it/s]
Close the Tensorboard SummaryWriter.
Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/dataset.py", line 471, in __getitem__
return self.dataset[self.indices[idx]]
File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/data/dataset.py", line 54, in __getitem__
path = self.imgs[index]
IndexError: list index out of range
File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/_utils.py", line 457, in reraise
raise exception
File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data
data.reraise()
File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
data = self._next_data()
File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__
for obj in iterable:
File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/models/model.py", line 106, in train_step
for train_data in tqdm.tqdm(self.phase_loader):
File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/core/base_model.py", line 45, in train
train_log = self.train_step()
File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py", line 58, in main_worker
model.train()
File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py", line 92, in <module>
main_worker(0, 1, opt)
File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/runpy.py", line 268, in run_path
return _run_module_code(code, init_globals, run_name,
File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/sgbaird/miniconda3/envs/pip-palette/lib/python3.9/runpy.py", line 197, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
In readme under Usage / Environment there is pip install -r requirement.txt
instead of pip install -r requirements.txt
In dataset.py, the get mask function returns a mask of shape of (h, w, 1).
When the img is opened by PIL, and passed through the transform, self.tfs = transforms.Compose([
The toTensor transform changes the image shape from H,W,C to C,H,W.
When I try to calculate the mask image by doing mask_img = img*(1 - mask) + mask, I get the value error above. Should the image transform be returning the image in the form (C,H,W)?
This is my code:
tfs = transforms.Compose([
transforms.Resize((256, 256)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5,0.5, 0.5])
])
from PIL import Image
img = Image.open("C:/Users/Documents/Obama_256x256.jpg").convert('RGB')
img = tfs(img)
mask = face_mask((256,256), landmark_list) #returns facemask of shape (h,w,1)
#mask = np.swapaxes(mask, -1, 0)
mask_img = img*(1 - mask) + mask
Hi @Janspiry :
I'd like to know if you can provide pre-trained model weightes for testing colorization? Thanks!
Hi, Thanks for the amazing work!
I followed the instructions to run it but getting this error:
Close the Tensorboard SummaryWriter.
Traceback (most recent call last):
File "C:\Users\Hasan Sayeed\Documents\hasan\SR3\Palette\core\logger.py", line 112, in save_images
Image.fromarray(outputs[i]).save(os.path.join(result_path, names[i]))
File "C:\Users\Hasan Sayeed\anaconda3\lib\site-packages\PIL\Image.py", line 2169, in save
fp = builtins.open(filename, "w+b")
FileNotFoundError: [Errno 2] No such file or directory: 'experiments\\train_inpainting_celebahq_220531_173900\\results\\val\\205\\GT_train.flist\\00037.jpg'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "run.py", line 103, in <module>
main_worker(0, 1, opt)
File "run.py", line 69, in main_worker
model.train()
File "C:\Users\Hasan Sayeed\Documents\hasan\SR3\Palette\core\base_model.py", line 58, in train
val_log = self.val_step()
File "C:\Users\Hasan Sayeed\Documents\hasan\SR3\Palette\models\model.py", line 158, in val_step
self.writer.save_images(self.save_current_results())
File "C:\Users\Hasan Sayeed\Documents\hasan\SR3\Palette\core\logger.py", line 114, in save_images
raise NotImplementedError('You must specify the context of name and result in save_current_results functions of model.')
NotImplementedError: You must specify the context of name and result in save_current_results functions of model.
I might have missed something. Do you know what's wrong here?
When calculating metrics, why are you comparing the input (conditional) image with the generated image? Shouldn't we compare the output with the ground truth image?
Palette-Image-to-Image-Diffusion-Models/models/model.py
Lines 166 to 187 in feca17c
In particular:
Hey,
Thank you for the implementations!
Are there any plans on releasing a pretrained version on the multi-task learning objective (section 5.7 in the paper)?
Thanks,
Eliahu
In particular, I'm interesting in using a custom dataset for an application not involving images directly, but with a similar structure to images (square matrices with channels).
Related: #1
Hi,
I'm just a bit confused on how the loss is computed. From my understanding, for a given training loop we have a ground truth image denoted as GT. GT is passed through a series of 1000 timesteps t, and at each timestep a small amount of random gaussian noise is added.
Lets say the network takes in as input the noisy GT image from timestep 50. The network should predict the small amount of noise that was generated at timestep 50 right? So when we compute the loss, it should be the noise the network predicted that was generated at timestep 50 vs the actual noise that was generated at timestep 50? Or am I understanding it wrong.
In that case, why when the loss is calculated, the value for the actual noise is computed as being torch.randn_like(y_0) and not the noise at t=50?
noise = default(noise, lambda: torch.randn_like(y_0))
y_noisy = self.q_sample(
y_0=y_0, sample_gammas=sample_gammas.view(-1, 1, 1, 1), noise=noise)
if mask is not None:
noise_hat = self.denoise_fn(torch.cat([y_cond, y_noisy*mask+(1.-mask)*y_0], dim=1), sample_gammas)
loss = self.loss_fn(mask*noise, mask*noise_hat)
Thanks for putting together this software. Can you provide some direction for how to create a dataset for train and test for image inpainting?
In the configuration file I see references to *.flist file. I assume this is a text list of images to use for testing. But I'm not sure how to create it and format it.
Thanks,
Jay
Hi and thanks for the repo!
I have a question concerning the input dims of the Unet for colorization. If I understand correctly, the Unet is fed a 6 channel image containing noise and a 3-channel version of the black and white image as conditioning.
Is there an advantage in doing this compared to using a 1-channel black and white image as conditioning, given that the network would require less memory?
Is the conditioning less effective if only 1 channel is given?
thanks!
Hello,
Thank you very much for sharing your code.
when I studied the paper I came across 2 implementation detail. I see that the setting you choose are the same as the hyper parameter for the inpainting?
is there an option in the code for regular training the one with 1024 batch size? what is the difference between option 1 and 2 in the screen above?
Hi there,
Thanks for putting together this repository. I have a question about your implementation of Unet - why are you using Conv2d instead of ConvTranspose2d in your Upsample blocks?
Thanks
Or is it just JPG?
As a side note, do you know if JPEG restoration has the same input and output sizes? Or is it more like the super-resolution task?
I'm trying to train an inpainting model using multiple GPUs.
Initial training worked fine, progressed well and saved checkpoints to experiments/.../checkpoint
folder.
However, when I try to resume the same training (by modifying "resume_state"
in the config) I get this error:
Traceback (most recent call last):
File "/.../Palette-Image-to-Image-Diffusion-Models/run.py", line 58, in main_worker
model.train()
File "/.../Palette-Image-to-Image-Diffusion-Models/core/base_model.py", line 45, in train
train_log = self.train_step()
File "/.../Palette-Image-to-Image-Diffusion-Models/models/model.py", line 111, in train_step
self.optG.step()
File "/.../.conda/envs/.../lib/python3.10/site-packages/torch/optim/optimizer.py", line 109, in wrapper
return func(*args, **kwargs)
File "/.../.conda/envs/.../lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/.../.conda/envs/.../lib/python3.10/site-packages/torch/optim/adam.py", line 157, in step
adam(params_with_grad,
File "/.../.conda/envs/.../lib/python3.10/site-packages/torch/optim/adam.py", line 213, in adam
func(params,
File "/.../.conda/envs/.../lib/python3.10/site-packages/torch/optim/adam.py", line 255, in _single_tensor_adam
assert not step_t.is_cuda, "If capturable=False, state_steps should not be CUDA tensors."
AssertionError: If capturable=False, state_steps should not be CUDA tensors.
It seems like in multi-GPU when resuming some tensors (parameters or optimizer's internal variables) are not moved to the right device.
Is there any limitation for inferencing with higher resolution??
If the answer is no, does the model perform well on higher resolution such as 1024ร1024 or more for in-painting?
When would you be able to release the pretrained models for the colorization task?
Dockerfile that works for me if anyone's interested.
FROM nvidia/cuda:11.0.3-devel-ubuntu20.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get install python3 -y && \
apt-get install python3-pip -y && \
apt-get install git ffmpeg libsm6 libxext6 -y
RUN cd ./home && \
git clone https://github.com/Janspiry/Palette-Image-to-Image-Diffusion-Models
#I only clone this for the requirements.txt. I run palette from mounted drive /root/data
RUN pip3 install \
torch==1.7.0+cu110 \
torchvision==0.8.1+cu110 \
-f https://download.pytorch.org/whl/torch_stable.html
RUN pip3 install -r ./home/Palette-Image-to-Image-Diffusion-Models/requirements.txt
WORKDIR ./root/data
Run with something like
docker run --rm --gpus all -it -v <path_to_palette_repo>:/root/data -v palette
Hi!
Currently, I trained colorization model based on this repo and my own dataset which contains 1325 images. Training results were strange, so I want to improve my model.
My current training situation is as follows:
According to the other issues, causes of poor result are lack of dataset or learning rate. Considering dataset, I will use Places2 to have enough dataset.
The things I want to know the most is managing learning rate. In "Pallete" paper, it did not change any parameters, so I would like to know the specific parameters for colorization.
Hello!
I am trying to create a custom image-to-image model. I've set up a custom dataset that pulls gt_image
from one folder and cond_image
from a parallel folder.
img_path = f"{self.data_root}/train_A/{file_name}"
cond_image_path = f"{self.data_root}/train_B/{file_name}"
img = self.tfs(self.loader(img_path))
cond_image = self.tfs(self.loader(cond_image_path))
I've trained the model for 65 epochs and while the images are starting to converge nicely, they look like they are recreating the gt_image and not the conditional image.
I've done some debugging to ensure the image data is correct for both the gt
and cond
image in train_step()
. Is there another place worth double-checking that the data flow is correct?
Is it normal for the outputs to match the original data set at first before beginning to approximate the conditional image? Should I just keep training and hope to see an improvement?
Thanks for this repo!
Specifically,
I'm looking at line 58 in Palette-Image-to-Image-Diffusion-Models/data/dataset.py
I'm trying to set up my own custom mask function, to process my dataset, for a variation of the image2image cropping task, and this is what I have so far.
I have code that generates a custom mask like this for a given image. The mask has shape 2562563:
For the get_item method in data/dataset.py, I have some questions regarding the following lines:
img = self.tfs(self.loader(path))
mask = self.get_mask()
cond_image = img*(1. - mask) + masktorch.randn_like(img)
mask_img = img(1. - mask) + mask
Additionally, I have 1 extra question. Because I want to go from masked image with a drawing of the lips -> gt image, is this more suited for an image colorization task??
Thanks!
How long is the training given your computational resources?
Just wondering for the models you've already trained using the reduced parameters, what are the specs of the machines you used to train them, and roughly how long did it take for the models to start converging?
Hello,
So I am trying to train the model. It can train fine, but when it tries to val it can't find the GT_datasets folder in the experiments folder. Any advice on how to fix this?
Thanks,
Rory
Hello,
I have tried to run your code following the guideline.
I downloaded the celeb dataset from Kaggle in the readme file and insert it in the data folder then rename it to celeba_hq so the location of the data now is data/celeba_hq . I downloaded the flist file and insert it celeba_hq/flist. Then In the inpainting_celebahq.json file, I change the train data set like this:
"datasets": { // train or test
"train": {
"which_dataset": { // import designated dataset using arguments
"name": ["data.dataset", "InpaintDataset"], // import Dataset() class / function(not recommend) from data.dataset.py (default is [data.dataset.py])
"args":{ // arguments to initialize dataset
"data_root": "data/celebahq/flist/train.flist",
"data_len": -1,
"mask_config": {
"mask_mode": "hybrid"
}
}
},
but when I try to run the inpaiting script I get the following error:
File "/home/sss/Desktop/Palette-Image-to-Image-Diffusion-Models-main/run.py", line 92, in
main_worker(0, 1, opt)
File "/home/sss/Desktop/Palette-Image-to-Image-Diffusion-Models-main/run.py", line 37, in main_worker
phase_loader, val_loader = define_dataloader(phase_logger, opt) # val_loader is None if phase is test.
File "/home/sss/Desktop/Palette-Image-to-Image-Diffusion-Models-main/data/init.py", line 18, in define_dataloader
phase_dataset, val_dataset = define_dataset(logger, opt)
File "/home/sss/Desktop/Palette-Image-to-Image-Diffusion-Models-main/data/init.py", line 40, in define_dataset
phase_dataset = init_obj(dataset_opt, logger, default_file_name='data.dataset', init_type='Dataset')
File "/home/sss/Desktop/Palette-Image-to-Image-Diffusion-Models-main/core/praser.py", line 49, in init_obj
raise NotImplementedError('{} [{:s}() form {:s}] not recognized.'.format(init_type, class_name, file_name))
NotImplementedError: Dataset [InpaintDataset() form data.dataset] not recognized.
Is there any more step I should do for a successful run? I like to train from the beginning and not use the trained model.
can you guide me, Please?
It took a little bit long time per epoch.
So, I want to use 'torch.amp' & DDP, but it fails with following error, 'RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same'
Here is my code of 'models/models.py -> train_step:
with torch.cuda.amp.autocast(enabled=self.amp):
loss = self.netG(self.gt_image, self.cond_image, mask=self.mask)
I guess, this error caused from 'CheckpointFunction.backward', but I am not sure.
Could you add 'torch.amp' ?
Hi,
I see that you use the same embeddings for gammas that are used typically for time steps.
However time steps often go from 0 to 1,000.
Should the max_period
be updated accordingly? I'm thinking it should be lowered to something following the new order of magnitude (maybe to 10).
Running on Windows Subsystem for Linux 2 (WSL2).
git clone https://github.com/Janspiry/Palette-Image-to-Image-Diffusion-Models.git
cd Palette-Image-to-Image-Diffusion-Models
conda
installation per #20
python run.py -p train -c config/inpainting_celebahq_dummy.json --debug
inpainting_celebahq_dummy.json
{
"name": "inpainting_celebahq", // experiments name
"gpu_ids": [
0
], // gpu ids list, default is single 0
"seed": -1, // random seed, seed <0 represents randomization not used
"finetune_norm": false, // find the parameters to optimize
"path": { //set every part file path
"base_dir": "experiments", // base path for all log except resume_state
"code": "code", // code backup
"tb_logger": "tb_logger", // path of tensorboard logger
"results": "results",
"checkpoint": "checkpoint",
"resume_state": "experiments/train_inpainting_celebahq_220426_233652/checkpoint/190"
// "resume_state": null // ex: 100, loading .state and .pth from given epoch and iteration
},
"datasets": { // train or test
"train": {
"which_dataset": { // import designated dataset using arguments
"name": [
"data.dataset",
"InpaintDataset"
], // import Dataset() class / function(not recommend) from data.dataset.py (default is [data.dataset.py])
"args": { // arguments to initialize dataset
"data_root": "datasets/celebahq_dummy/flist/train.flist",
"data_len": -1,
"mask_config": {
"mask_mode": "hybrid"
}
}
},
"dataloader": {
"validation_split": 2, // percent or number
"args": { // arguments to initialize train_dataloader
"batch_size": 3, // batch size in each gpu
"num_workers": 4,
"shuffle": true,
"pin_memory": true,
"drop_last": true
},
"val_args": { // arguments to initialize valid_dataloader, will overwrite the parameters in train_dataloader
"batch_size": 1, // batch size in each gpu
"num_workers": 4,
"shuffle": false,
"pin_memory": true,
"drop_last": false
}
}
},
"test": {
"which_dataset": {
"name": "InpaintDataset", // import Dataset() class / function(not recommend) from default file
"args": {
"data_root": "datasets/celebahq_dummy/flist/test.flist",
"mask_config": {
"mask_mode": "center"
}
}
},
"dataloader": {
"args": {
"batch_size": 8,
"num_workers": 4,
"pin_memory": true
}
}
}
},
"model": { // networks/metrics/losses/optimizers/lr_schedulers is a list and model is a dict
"which_model": { // import designated model(trainer) using arguments
"name": [
"models.model",
"Palette"
], // import Model() class / function(not recommend) from models.model.py (default is [models.model.py])
"args": {
"sample_num": 8, // process of each image
"task": "inpainting",
"ema_scheduler": {
"ema_start": 1,
"ema_iter": 1,
"ema_decay": 0.9999
},
"optimizers": [
{
"lr": 5e-5,
"weight_decay": 0
}
]
}
},
"which_networks": [ // import designated list of networks using arguments
{
"name": [
"models.network",
"Network"
], // import Network() class / function(not recommend) from default file (default is [models/network.py])
"args": { // arguments to initialize network
"init_type": "kaiming", // method can be [normal | xavier| xavier_uniform | kaiming | orthogonal], default is kaiming
"module_name": "guided_diffusion", // sr3 | guided_diffusion
"unet": {
"in_channel": 6,
"out_channel": 3,
"inner_channel": 64,
"channel_mults": [
1,
2,
4,
8
],
"attn_res": [
// 32,
16
// 8
],
"num_head_channels": 32,
"res_blocks": 2,
"dropout": 0.2,
"image_size": 256
},
"beta_schedule": {
"train": {
"schedule": "linear",
"n_timestep": 2000,
// "n_timestep": 10, // debug
"linear_start": 1e-6,
"linear_end": 0.01
},
"test": {
"schedule": "linear",
"n_timestep": 1000,
"linear_start": 1e-4,
"linear_end": 0.09
}
}
}
}
],
"which_losses": [ // import designated list of losses without arguments
"mse_loss" // import mse_loss() function/class from default file (default is [models/losses.py]), equivalent to { "name": "mse_loss", "args":{}}
],
"which_metrics": [ // import designated list of metrics without arguments
"mae" // import mae() function/class from default file (default is [models/metrics.py]), equivalent to { "name": "mae", "args":{}}
]
},
"train": { // arguments for basic training
"n_epoch": 1e8, // max epochs, not limited now
"n_iter": 1e8, // max interations
"val_epoch": 5, // valdation every specified number of epochs
"save_checkpoint_epoch": 10,
"log_iter": 1e3, // log every specified number of iterations
"tensorboard": true // tensorboardX enable
},
"debug": { // arguments in debug mode, which will replace arguments in train
"val_epoch": 1,
"save_checkpoint_epoch": 1,
"log_iter": 2,
"debug_split": 50 // percent or number, change the size of dataloder to debug_split.
}
}
Exception has occurred: NotImplementedError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Model [Palette() form models.model] not recognized.
File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/core/praser.py", line 41, in init_obj
ret = attr(*args, **kwargs)
File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/models/model.py", line 49, in __init__
self.netG.set_new_noise_schedule(phase=self.phase)
File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/models/network.py", line 36, in set_new_noise_schedule
self.register_buffer('gammas', to_torch(gammas))
File "/home/sgbaird/miniconda3/envs/palette/lib/python3.9/site-packages/torch/cuda/__init__.py", line 166, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
During handling of the above exception, another exception occurred:
File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/core/praser.py", line 49, in init_obj
raise NotImplementedError('{} [{:s}() form {:s}] not recognized.'.format(init_type, class_name, file_name))
File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/models/__init__.py", line 10, in create_model
model = init_obj(model_opt, logger, default_file_name='models.model', init_type='Model')
File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py", line 44, in main_worker
model = create_model(
File "/home/sgbaird/GitHub/Palette-Image-to-Image-Diffusion-Models/run.py", line 92, in <module>
main_worker(0, 1, opt)
File "/home/sgbaird/miniconda3/envs/palette/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/sgbaird/miniconda3/envs/palette/lib/python3.9/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/home/sgbaird/miniconda3/envs/palette/lib/python3.9/runpy.py", line 268, in run_path
return _run_module_code(code, init_globals, run_name,
File "/home/sgbaird/miniconda3/envs/palette/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/sgbaird/miniconda3/envs/palette/lib/python3.9/runpy.py", line 197, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
Hi I was trying out the test process with the provided pretrained models. But the load_networks function in the model.py file doesn't seem work with strict=True
I get the error message below
The model is loaded properly with strict=False and the inpainting also seems to work fine
By the way you mentioned that a small dataset was enough for face inpainting but a much larger dataset was required for more complex scene inpainting like the places2 dataset.
I'm currently working on a image to image conditional diffusion model for driving scenes and a dataset of 10k samples doesn't seem to converge at all. Do you think that this is due to the small dataset? I'm curious of how much portion of the 10 million samples were used to train your model from the places2 dataset.
Thanks a bunch.
Hi,
Many thanks for this repo---it is of great help!
I'm wondering what the iterations used to train the pre-trained models specify? It seems I can reach, e.g. 660k iterations as specified in train.log in less than a day, but my models do not perform nearly as well as the pretrained Place2 inpainting model.
It will be very helpful if there is any training log saved while training the pretrained model, as an early reference and assessment of whether my trainings are likely to work at all.
Best regards,
Shengqu
Thanks for your nice code.
In the implementation details of the paper, the schedules of training / testing are different: (1e-6, 0.01) vs. (1e-4, 0.09).
Is there any reason for this setting? Intuitively we could achieve better and aligned result with a same schedule.
I notice that (1e-4, 0.09) performs better than (1e-6, 0.01) in a number of cases.
Hello, I'm considering to conduct a research about colorization, but I don't know how to train.
How can I train for colorization?
Can I train colorization by changing the code you mentioned in READ.me?
"which_dataset": { // import designated dataset using arguments "name": ["data.dataset", "InpaintDataset"], // import Dataset() class "args":{ // arguments to initialize dataset "data_root": "your data path", "data_len": -1, "mask_mode": "hybrid" } },
Thank you for the awesome work and detailed documentation.
How would I go about creating a model to remove photo blur?
I want to remove blur from macro photos of flowers. I have a dataset of sharp macro photos.
Could I use any of the existing configs like ex. colorization or would I have to crate a completely new config for this task?
Hello,
I am trying to resume training for celeb dataset from your check point, I changed the inpainting_celebahq.json like the instructions.
"path": { //set every part file path
"base_dir": "experiments", // base path for all log except resume_state
"code": "code", // code backup
"tb_logger": "tb_logger", // path of tensorboard logger
"results": "results",
"checkpoint": "checkpoint",
"resume_state": "experiments/train_inpainting_celebahq_221006_180531/checkpoint/200",
"resume_state": "200"
// "resume_state": null // ex: 100, loading .state and .pth from given epoch and iteration
},
I inserted the 200.state and 200_Network.pth in the "200" folder, but after running the command
python run.py -p train -c config/inpainting_celebahq.json
The training doesn't start. I don't even get an error, I only get Close the Tensorboard SummaryWriter.
in the ouput.
What is the correct way of resuming the training.
Hi, thanks for the awesome codes :)
One question for the inpainting task:
Looking at the following snippet from your code in networks.py, I cannot understand why you are conditioning your model on y_cond if you are already modifying your y_noisy based on the y_0 image using the expression "y_noisy*mask+(1.-mask)*y_0"?
Shouldn't concatenating with y_cond be redundant in this case? Your model is already seeing the ground truth parts of the image in the modified version of the y_noisy.
def forward(self, y_0, y_cond=None, mask=None, noise=None):
# sampling from p(gammas)
b, *_ = y_0.shape
t = torch.randint(1, self.num_timesteps, (b,), device=y_0.device).long()
gamma_t1 = extract(self.gammas, t-1, x_shape=(1, 1))
sqrt_gamma_t2 = extract(self.gammas, t, x_shape=(1, 1))
sample_gammas = (sqrt_gamma_t2-gamma_t1) * torch.rand((b, 1), device=y_0.device) + gamma_t1
sample_gammas = sample_gammas.view(b, -1)
noise = default(noise, lambda: torch.randn_like(y_0))
y_noisy = self.q_sample(
y_0=y_0, sample_gammas=sample_gammas.view(-1, 1, 1, 1), noise=noise)
if mask is not None:
noise_hat = self.denoise_fn(torch.cat([y_cond, y_noisy*mask+(1.-mask)*y_0], dim=1), sample_gammas)
loss = self.loss_fn(mask*noise, mask*noise_hat)
else:
noise_hat = self.denoise_fn(torch.cat([y_cond, y_noisy], dim=1), sample_gammas)
loss = self.loss_fn(noise, noise_hat)
return loss
How to train on custom dataset and how to train jpeg restoration model?
From what I can tell, it looks like the option is getting ignored and would result in an error.
Palette-Image-to-Image-Diffusion-Models/data/dataset.py
Lines 85 to 86 in d1b9b01
Hey guys,
can you please direct me on how to properly inference the trained model?
I wrote a small script for it, but not sure that I am doing everything right.
One of the conceptual questions for me is the robustness of the results. I tried to train the colorization model, and during inference it gives me different results for the same image. Is that because of randomness in a noise scheduling? Or is it something else?
Thanks in advance!
Hi, thanks for the nice work. However, when I try to run the test scripts, it failed,
NotImplementedError: Model [Palette() form models.model] not recognized.
I use the checkpoints and state file you provide.
Do you know what's the problem?
Thanks a lot.
conda create -n palette python==3.9.*
conda activate palette
conda install pytorch:pytorch pytorch:torchvision pytorch:cudatoolkit=11.3 numpy pandas tqdm scipy tensorboardx
(though mamba install
preferred, especially if you get a dependency conflict)
For pytorch installation commands, see https://pytorch.org/get-started/locally/
then install the leftover pip dependencies:
pip install opencv-python clean-fid
Note that opencv
had a conflict due to blas or something, which came up very quickly by trying to mamba
install. Couldn't find an Anaconda distribution of clean-fid
.
I downloaded the flicker25k dataset, preprocessed it and train a model with these modifications in the config file:
The rest of the configurations remained as in the current config file.
Even after 1000 training epochs, the model still produces bad results.
Is there anything I'm missing? Thanks.
I try change module_name to sr3 and test output error NotImplementedError: Network [Network() form models.network] not recognized.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.