Anime-Colorization (English)

Keras Implementation of different algorithms to color gray images of anime characters

Data Used

Source

One can obtain the dataset used from: https://www.kaggle.com/mylesoneill/tagged-anime-illustrations#danbooru-metadata.zip. Since danbooru image dataset is too big, only moeimouto-faces.zip dataset has been used

Preprocessing

For better colorization algorithm, I've converted RGB image to LAB image and use L channel for input and AB channel as output. The reason is that using L channel as input would let you keep general information of images as much as possible, whereas using onel fo RGB channel as input would exclude the information of two other channels. For more information on LAB image, you can go to the links below:

(1) https://en.wikipedia.org/wiki/CIELAB_color_space

(2) https://www.aces.edu/dept/fisheries/education/pond_to_plate/documents/ExplanationoftheLABColorSpace.pdf

Below diagram is data preprocessing process I take for the analysis:

Algorithms Used (with Reference)

Alpha Version algorithm by Emil Wallner

(1) https://blog.floydhub.com/colorizing-b-w-photos-with-neural-networks/

This is a simple CNN encoder-decoder algorithm that Emil Wallner created to colorize the image. Roughly, the algorithm has a structure like a below diagram, and you can check a detailed Keras code on the website link above.

U-Net Implementation

(1) https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/

According to the inventor of the algorithm, U-Net is originalled intended to be convolutional network architecture for fast and precise segmentation of images. While searching for colorization algorithms, however, I've seen quite many people using U-Net for colorization, and since U-Net also has a structure of eocnder-decoder format suitable for colorization, I've editted the U-Net Keras code (https://github.com/zhixuhao/unet) little bit so that it can be used for colorization.

Below diagram is an architecture of U-Net algorithm:

Full Version by Emil Wallner (Alpha Version algorithm with Fusion Layer)

(1) http://iizuka.cs.tsukuba.ac.jp/projects/colorization/en//

In Emil's post, he didn't end up implementing Alpha algorithm, but he furtuer improved it by applying a concept called 'Fusion Layer'. Basically, what fusion layer does (in my case of coloring anime faces) is that if an input comes into the algorithm, it adds information about which anime character the input is to the encoded vector. Then, the algorithm begins to color it. Simply saying, it is based on the assumption that classifying the input first would help the algorithm giving a better colorization result. If you are interested in the concept of the fusion layer, below is a diagram briefly showing the concept of fusion layer (For more details, visit the link above).

Note: Emil used the vector output of InceptionV3 as a fusion layer, whereas I used the vector output of ResNet as a fusion layer.

DCGAN

(1) http://cs231n.stanford.edu/reports/2017/pdfs/302.pdf

(2) https://github.com/eriklindernoren/Keras-GAN/blob/master/dcgan/dcgan.py

Originally introduced by Ian Goodfellow in 2014, GAN is still popular deep-learning algorithm used for various purpuses. Recently, I've read a paper (1) that used DCGAN (GAN with CNN architecture) for image colorization, so I also decided to apply the algorithm for my anime face colorization. Python code for GAN I've wrote is originally from Erik Linder-Norén's github (2).

(Image from https://gluon.mxnet.io/chapter14_generative-adversarial-networks/dcgan.html)

WGAN

(1) https://arxiv.org/abs/1701.07875

(2) https://medium.com/@sunnerli/the-story-about-wgan-784be5acd84c

(3) https://www.slideshare.net/ssuser7e10e4/wasserstein-gan-i

(4) https://vincentherrmann.github.io/blog/wasserstein/(/p>

As GAN gets more popular as deep-learning algorithm, people have also been focusing on disadvantages of GAN. Thus, many new versions of GAN trying to remove those disadvantages have been invented. WGAN(Wasserstein GAN) is one of those new versions by Arjovsky and Bottou (2017)(1), which applies a concept of Wasserstein Distance (Earth Mover's Distance). Instead of KL and JS divergence used for distance as a loss function in original GAN, WGAN use Wasserstein distance as a new loss function. The very brief reason of changing the loss function is due to the inability of KL and JS divergence to correctly capture the loss value, whereas Wasserstein distance metric is able to capture the value.

For more details about Wasserstein distance, visit great articles and presentations I listed above ((2), (3)), and I've referred to Vincent Herrmann's code (4) for Keras WGAN implementation. Later, I'll create another repo exclusively explaining WGAN as much as I can.

Acknowledgement

Special thanks to Neowiz Play Studio (http://neowizplaystudio.com/ko/) and Sungkyu Oh ([email protected]) who have not only answered the problems I've encountered but also provided up-to-date machines (including RTX 2080 ti!!) used for analysis.

Contact Information

facebook: https://www.facebook.com/dabin.moon.7

email: [email protected]

Anime-Colorization (Korean)

흑백 애니메이션 캐릭터 이미지를 색칠하는 keras 알고리즘을 작성하였습니다.

사용 데이터

출처

알고리즘 생성에 사용된 데이터는 이 링크에서 획득하실 수 있습니다: https://www.kaggle.com/mylesoneill/tagged-anime-illustrations#danbooru-metadata.zip. danbooru image dataset은 너무 커서 moeimouto-faces.zip dataset 만 사용하였습니다.

데이터 가공

좀 더 나은 colorization algorithm을 위해서 RGB 이미지 대신 LAB형태의 이미지를 사용였고, 밝기를 상징하는 L channel을 input으로, 나머지 색을 관장하는 A,B channel을 output으로 도출하는 알고리즘으로 작성해보았습니다. LAB 이미지와 관련한 좀 더 자새한 정보는 아래 두 링크를 참조하시면 되겠습니다.

(1) https://en.wikipedia.org/wiki/CIELAB_color_space

(2) https://www.aces.edu/dept/fisheries/education/pond_to_plate/documents/ExplanationoftheLABColorSpace.pdf

아래는 제가 진행한 데이터 가공 과정을 간단하게 설명해주는 도표입니다: