filipandersson245 / cartoon-gan Goto Github PK

View Code? Open in Web Editor NEW

254.0 14.0 53.0 79.79 MB

License: MIT License

Jupyter Notebook 97.24% Python 2.74% HTML 0.02%

deep-learning gan pytorch

cartoon-gan's Introduction

Cartoon-GAN

Paper: https://arxiv.org/abs/2005.07702

Description

This project takes on the problem of transferring the style of cartoon images to real-life photographic images by implementing previous work done by CartoonGAN. We trained a Generative Adversial Network(GAN) on over 60 000 images from works by Hayao Miyazaki at Studio Ghibli.

To the people asking for the dataset, im sorry but as the material is copyright protected i cannot share the dataset.

Dependencies

Install Anaconda from https://www.anaconda.com/
Install pytorch at: https://pytorch.org/get-started/locally/

Install dependencies:

python -m pip install tqdm pillow numpy matplotlib opencv-python

For predicting videos you will also need ffmpeg

Weights

Weights for the presented models can be found here

Training

All training code can be found in experiment.ipynb

Predict

Predict by running predict.py.

Example:

python predict.py -i C:/folder/input_image.png -o ./output_folder/output_image.png

Predictions can be made on images, videos or a folder of images/videos.

Demonstration

Image #	Original	CartoonGAN	GANILLA	Our implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Citation

@misc{andersson2020generative,
      title={Generative Adversarial Networks for photo to Hayao Miyazaki style cartoons}, 
      author={Filip Andersson and Simon Arvidsson},
      year={2020},
      eprint={2005.07702},
      archivePrefix={arXiv},
      primaryClass={cs.GR}
}

cartoon-gan's People

Contributors

Stargazers

Watchers

Forkers

jaedukseo jjandnn satoshirobatofujimoto amdnsr hzy5000 johndpope bencoster codes1gn citrus-sss satrusskumar cedricoeldorf dominic-sylvester-art umonkey1975 nc-daewon-comic-avatars-mmorpg-for-kids ayushmankumar7 shinoby92 udarawanasinghe zhangzheng0131 lekhanhtoan37 dsanchezme chenchunsheng19 davideeva gitteor balaprasanna hktmxk facecup-event mansikataria zhangfc7 burweel shubham-sahay buildcore nhokcrazy199 t-v-k-git zakbastiani mbeytekin halil-ibrahim-gunbulak pk00095 balaji1359 icywang86rui amide-init pouriaomrani z556lab easycelsius elyons max0g joshua04j valleyzine bxck75 mrsipan os230904 darkcloudn thdfydgh tanguynav

cartoon-gan's Issues

Weights license

Hi. Thank you for sharing this work!

Is everything in the folder:
https://drive.google.com/drive/folders/1d_GsZncTGmMdYht0oUWG9pqvV4UqF_kM

licensed under the same MIT license as this repository?

I do know you don't own the dataset. I am referring to rights on the model other than those that exist because of the dataset (if any).

require for pre-trained models

Is there any plan to release the pre-trained models?

Upload weights for Discriminator

Hey can you upload the "trained_netD.pth" file. I am training on my own dataset, but I cannot continue without that file. It would be really helpful if you could share me that file. Thanks

Originally posted by @Yash619 in #4 (comment)

Is new code different from paper?

Hello, the new results published look really good - well done!
How is the new approach different? is it just a new dataset?

Pretrained models giving Out of memory error.

Is there a way to reduce the memory requirements to that of a 4 GB card?

RuntimeError: CUDA out of memory. Tried to allocate 5.72 GiB

Hello, the link of checkpoint has been removed, could you share a new link ?

How can I obtain the cartoon dataset?

What a wonderful job! And I am wondering how to obtain the cartoon dataset. Thank you.

The paging file is too small for this operation to complete

Hello, I tried to train from the checkpoint left in the repository but I got the "The paging file is too small for this operation to complete" Error.
I tried reducing the batch size or image size but it made no difference.
Command run: python train.py
I changed the following to be the path to my own dataset
real_dataloader = get_dataloader("E:/Proiecte_Dizertatie/Dataset_ghibli/trainA/", size = image_size, bs = batch_size)
cartoon_dataloader = get_dataloader("E:/Proiecte_Dizertatie/Dataset_ghibli/trainB/", size = image_size, bs = batch_size, trfs=get_pair_transforms(image_size))

I also downloaded the pth files with the pretrained models.

OS: Windows 11
PyTorch version 1.10
CUDA toolkit version CUDA 10.1
NVIDIA driver version
GPU RTX 3060

Logs:
Starting Training Loop...
training epoch 0
Traceback (most recent call last):
File "", line 1, in
File "E:\Anaconda\envs\CartoonGAN\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "E:\Anaconda\envs\CartoonGAN\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "E:\Anaconda\envs\CartoonGAN\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "E:\Anaconda\envs\CartoonGAN\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="mp_main")
File "E:\Anaconda\envs\CartoonGAN\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "E:\Anaconda\envs\CartoonGAN\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "E:\Anaconda\envs\CartoonGAN\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "E:\Proiecte_Dizertatie\cartoon-gan\train.py", line 1, in
import torch
File "E:\Anaconda\envs\CartoonGAN\lib\site-packages\torch_init.py", line 124, in
raise err
OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "E:\Anaconda\envs\CartoonGAN\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll" or one of its dependencies.
Traceback (most recent call last):
File "train.py", line 158, in
train()
File "train.py", line 71, in train
for i, (cartoon_edge_data, real_data) in enumerate(zip(cartoon_dataloader, real_dataloader)):
File "E:\Anaconda\envs\CartoonGAN\lib\site-packages\torch\utils\data\dataloader.py", line 359, in iter
return self._get_iterator()
File "E:\Anaconda\envs\CartoonGAN\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "E:\Anaconda\envs\CartoonGAN\lib\site-packages\torch\utils\data\dataloader.py", line 918, in init
w.start()
File "E:\Anaconda\envs\CartoonGAN\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "E:\Anaconda\envs\CartoonGAN\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "E:\Anaconda\envs\CartoonGAN\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "E:\Anaconda\envs\CartoonGAN\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "E:\Anaconda\envs\CartoonGAN\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe

Memory and pading error

Hello. Thanks for the other answer. I did that change and now I have a problem with the memory. If you can further help me?

RuntimeError: CUDA out of memory. Tried to allocate 134.00 MiB (GPU 0; 6.00 GiB total capacity; 5.15 GiB already allocated; 0 bytes free; 5.34 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I saw that someone else had the problem and you suggested lowering the batch size. I also did that and got (I changed it to 16):
RuntimeError: 3D or 4D (batch mode) tensor expected for input, but got: [ torch.cuda.FloatTensor{16,3,256,0} ]

I lowered the image size but it still goes out of memory.

And when I do both it gives:
RuntimeError: Padding size should be less than the corresponding input dimension, but got: padding (1, 1) at dimension 3 of input [16, 64, 128, 1]

Target size must be the same as input size

Hi, amazing work here.
I am trying to run the training loop to optimize a model to apply it to another artist.
I have created the smooth dataset for the artist's images (with every image containing the original and an edge smoothed version side by side), and got 30k images from Flickr.

When I run the training loop I get the following error:

ValueError: Target size (torch.Size([9, 1, 56, 56])) must be the same as input size (torch.Size([9, 1, 64, 72]))

Is there any processing to do with the Flickr images that I missed ?

Here is the complete log:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [4], line 49
     46 generated_pred = netD(generated_data)    #.view(-1)
     48 # Calculate discriminator loss on all batches.
---> 49 errD = adv_loss(cartoon_pred, generated_pred, edge_pred)
     51 # Calculate gradients for D in backward pass
     52 errD.backward()

File e:\sylvain-chomet-GAN\env\lib\site-packages\torch\nn\modules\module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File e:\sylvain-chomet-GAN\utils\loss.py:16, in AdversialLoss.forward(self, cartoon, generated_f, edge_f)
     14 def forward(self, cartoon, generated_f, edge_f):
     15     #print(cartoon.shape, self.cartoon_labels.shape)
---> 16     D_cartoon_loss = self.base_loss(cartoon, self.cartoon_labels)
     17     D_generated_fake_loss = self.base_loss(generated_f, self.fake_labels)
     18     D_edge_fake_loss = self.base_loss(edge_f, self.fake_labels)
...
   3147 if not (target.size() == input.size()):
-> 3148     raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
   3150 return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)

ValueError: Target size (torch.Size([9, 1, 56, 56])) must be the same as input size (torch.Size([9, 1, 64, 72]))

Thank you for this amazing work.

Not able to replicate results

Hello, so i want to use your work for further research hence I am using your repository to replicate your results on my dataset. But while training, I am not able to get cartoonish image but the output looks nearly same to real image. The problem is that Discriminator is not training properly. It always give near 0 outputs (Real like patch prediction) for cartoon images (around 0.3~0.5 mean), and also for real images (~0.01) hence the generator is not able to train properly to yeild cartoon images.

I have followed procedure as you told and also as mentioned in the original Cartoon GAN paper: I pretrained generator to reproduce real images (6k images from Flickr30k dataset), for 10 epochs then I pretrained Discriminator as a normal classifier (6k images from flickr, 4.6k anime images from 3 movies of Hayao : PoppingHill, Princess Mononoke, Spirited away and 4.6k corresponding smooth images) for 50 epochs (still having same problem as i described earlier). After this I trained the combination for 50 epochs on the same data (6k images from flickr, 4.6k anime images from 3 movies of Hayao : PoppingHill, Princess Mononoke, Spirited away and 4.6k corresponding smooth images).

Can you suggest what is the problem? I have not made any changes to your code except for writing new codes for pretraining Generator and Discriminator. Maybe the problem lies in the dataset? or am I doing something else wrong? please help me out.

It would be great if you could share your dataset with us. Thanks.

load video

Hi, nice work

predict.py -i ./input -o ./output_folder
Processing files:   0%|                                                                                                                                   | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "predict.py", line 113, in <module>
    predict_file(file_path, output_file_path)
  File "predict.py", line 55, in predict_file
    if mimetypes.guess_type(input_path)[0].startswith("image"):
AttributeError: 'NoneType' object has no attribute 'startswith'

I trying predicting videos (.mpeg format file), but AttributeError: 'NoneType' object has no attribute 'startswith'
can u help please ?
and what do you mean "predicting videos FFmpeg'"

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

I get this error when running the Training Loop.

I loaded the state like so:
netG.load_state_dict(torch.load("./checkpoints/trained_netG_original.pth"))

And the error happens here:
errG = BCE_loss(generated_pred, cartoon_labels) + content_loss(generated_data, real_data)

Here is the full log

RuntimeError                              Traceback (most recent call last)
Cell In [7], line 79
     77 print(type(generated_pred))
     78 print(type(cartoon_labels))
---> 79 errG = BCE_loss(generated_pred, cartoon_labels.to(device)) + content_loss(generated_data, real_data)
     81 # Calculate gradients for G
     82 errG.backward()

File e:\sylvain-chomet-GAN\env\lib\site-packages\torch\nn\modules\module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File e:\sylvain-chomet-GAN\utils\loss.py:40, in ContentLoss.forward(self, x1, x2)
     39 def forward(self, x1, x2):
---> 40     x1 = self.perception(x1)
     41     x2 = self.perception(x2)
     43     return self.omega * self.base_loss(x1, x2)

File e:\sylvain-chomet-GAN\env\lib\site-packages\torch\nn\modules\module.py:1130, in Module._call_impl(self, *input, **kwargs)
...
    452                     _pair(0), self.dilation, self.groups)
--> 453 return F.conv2d(input, weight, bias, self.stride,
    454                 self.padding, self.dilation, self.groups)