Large Neural Style

This project can combine the content of one image with the style of another image, and can do so at very high resolutions, unlike other neural style projects.

The initial neural style paper is A Neural Algorithm of Artistic Style by Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge, and is the foundation on which this project is based.

I have incorporated enhancements of my own design in order to overcome the limitations inherent in that algorithm in order to generate high resolution images. Because of the large amounts of memory required by the neural style algorithm (mainly to compute gradients) and the relatively small amount of memory available on GPUs, the highest reasonable resolution that can be generated by most implementations is about 1024x1024 with 12 GB of GPU RAM. If your budget is 4 GB, then you're limited to about 512x512. This project uses two different techniques to overcome the memory limitations:

Overlapping tiled-generation with feathered blending. The content image is broken into overlapping tiles, and each tile is then processed by the neural style algorithm. The tiles are then recombined using a feathered blend in order to prevent visible edges. The style image is also resized to the target resolution (adjusted by a user-defined scale parameter) and then cropped to the tile size. This needs to be done in order to keep the style scale invariant to the output resolution.
Generation using an image pyramid. The process starts with a small dimension content and style image. The dimension is near 224x224, because that is the resolution of images the VGG network was trained to recognize. The actual dimension used is the target resolution divided by a power of 2, so that the final generation step happens at the final output resolution. For the initial resolution, the optimization process is run for quite a few steps. Next, the resulting image is resized by 2x, and the process repeats. Once the resolution is larger than the GPU memory can handle, the overlapping tile-generation is used. Each larger size adds more fine details.

Setup

The code is written in python and depends on lasagne, Theano, matplotlib, numpy, scikit-image and scipy. If you are using pip, you can install the dependencies by running: pip install -r requirements.txt

While you can run it using a CPU, it will be much faster to use a GPU. You'll want to install CUDA and cuDNN from NVIDIA and setup following the Theano instructions.

Usage

Contact

Please file issues using github.

Acknowledgements

vgg19_normalized.pkl - https://s3.amazonaws.com/lasagne/recipes/pretrained/imagenet/vgg19_normalized.pkl Prepared from the VGG author's pre-trained network weights by Eben Olson (https://github.com/ebenolson)

Inspiration from Lasagne Recipe for Neural Style Transfer - https://github.com/Lasagne/Recipes/blob/master/examples/styletransfer/Art%20Style%20Transfer.ipynb

Citation

If you find this code useful for your research, please cite:

@misc{Nuffer2016,
  author = {Nuffer, Daniel},
  title = {large-neural-style},
  year = {2016},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/dnuffer/large-neural-style}},
}

Backstory

Let me start off by explaining the reason I wanted to do this project. Earlier this year I went to NVIDIA's GPU Technology Conference where I saw a booth by the deepart.io, which is a startup company created by the inventors of the Neural Style algorithm. The art they had on display looked amazing and made me very curious to know how it was created. At the time I submitted some images to the deepart.io website, and had to wait days for the results because of the popularity and resulting processing backlog. The results were low-resolution (500x500) and square, and I wanted to generate higher resolution images while preserving the aspect ratio. Also while staying in downtown San Jose for the conference, I spent some time walking around and took some photos of murals. This got me thinking about using textures of the photo itself as the style, as a mural is a unique form of art that is embedded in a city, representing something else, and provides a concrete link between two different, yet related spaces. Back to my idea of using the textures of an image to re-style an image, it's essentially creating a recursive relationship between different scale levels of an image. That appeals to me as a computer scientist. I also felt that this project was doable for me because I have had previous experience with computer vision and deep neural networks and related libraries. I competed in the Kaggle Diabetic Retinopathy challenge in 2015, placing 13th. I learned and used the lasagne (deep learning) and theano (GPU accelerated linear algebra) libraries for that competition. Since that time, Google has released TensorFlow (a GPU accelerated linear algebra library). It has gained significant traction in the deep learning community. Also a Neural Network/Deep Learning library named Keras has become mature and appears to be more popular than lasagne.

I wanted to get some experience with TensorFlow and keras so I started out building my solution with those libraries. Once I got the code running, I found that using theano was faster than TensorFlow, so I switched to using theano. Also I found that the minimal part of the code I had using keras was harder to understand because it has an abstraction layer so that either theano or TensorFlow can be used, while lasagne is more an deep-learning function library for theano. My code has to capture the network activations and optimize the input to the network, which are not things normally done in deep-learning. So keras and lasagne basically only filled the role of constructing the convolutional and pooling layers. I switched to lasagne because it is closer to theano api. I first implemented the algorithm following the paper exactly as described. Next I found that as I used different sized images the different components of the loss function would scale differently, requiring me to manually adjust the weights to compensate. As I examined the loss, I realized that it could be modified to be resolution independent. My modified component loss computes the mean squared difference instead of the sum, by dividing by (i * j). I also removed the ½ constant because that is redundant with the loss weight components.

I modified the style layer loss. As as image size increases, N and M would grow quadratically causing the distribution of weights between layers to shift more heavily the upper layers (where the filter responses are larger before being pooled). I changed N2lM2l to not be squared: NlMl, and removed the ¼ because that is redundant with the loss component weights. I also tried changing the sum of squared differences of the gram matrices to be the mean of squared differences so that the loss would be independent of the number of filters, but this didn't improve things, and as it wasn't necessary for image size independence, I ended up using the sum. Finally, I added an extra component to the loss function: the total variation distance across two dimension of the generated image. https://en.wikipedia.org/wiki/Total_variation#Total_variation_for_functions_of_one_real_variable Credit for the total variation loss component idea is from the jcjohnson/neural-style/neural_style.lua code: https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua. I use the mean square of the total variation so that it is image size independent.

In the paper, the authors start the optimization from an initially random image. I tried that. I also tried initializing with the content image. In addition I tried initializing with the content image with various amounts of gaussian noise. I found that I prefer the images generated by the initial image with no additional noise. The more noise or completely random image, the more the result resembled just random parts of the style. Increasing the content weight could counteract that somewhat, but sometimes would cause odd artifacts. In order to keep the magnitude of the gram matrices between the style and content similar, I make the style image match the size of the generated image. Instead of stretching, I do a resize (taking into account the style scale option) and then I center crop the style image if it is too large and final do a mirror pad operation if it is too small. This preserves the content and patterns of the style image. Finally I implemented an algorithm to generate large images in a constant amount of memory. The algorithm in the paper is restricted to about 500x500 resolution because it uses about 2 GB of GPU RAM. I have 12 GB of GPU RAM, but can still only do about 1000x1000 with the standard algorithm. My idea was to optimize a piece of the image at a time. My first implementation just split up the generated image into tiles and optimized each separately and then recombined them. This however looked very bad. The seams were very obvious and the content on either side not blended. Next I tried generating overlapping tiles, but this also didn't work well, as optimizing a tile that was already half optimized would optimize the pre-optimized half much more than the other half.

The third approach I tried is the one that I found to work well. I first padded the image with reflection so that it has a border at least half the tile size. I created 50% overlapping tiles, optimized each independently, and then combined them all using a blending technique based on feathering the entire tile size. The filter is a linear ramp from 1 in the center to 0 at the outer edges. When the tiles are combined the filter weights are 1 everywhere the original image was present before the mirror padding was added. After combining, the padding area is removed, and the generated image is then used for the next step of optimization. I found that doing 30 steps of L-BFGS optimization in the inner loop on each tile and 3 steps of the outer loop produce a good result. This tiling process is about 5-6x slower than producing an image without it because the tile overlap increases the processing 4x and the mirror padding also adds 1-2x overhead, depending on the image. Splitting the image into tiles and recombining is relatively fast and doesn't materially affect the runtime.

After finishing the code, I proceeded to generate a lot of images using multiple photos I took for this purpose. I also have generated images using public domain art and images. I needed to do a bit of adjustment to the content and style weights. The images and relative scale of objects in them appear to be critical to producing a good output, as not all combinations are very appealing.

Future Directions

I would like to explore: Normalize gradients as found in https://github.com/jcjohnson/neural-style/blob/master/neural_style.lua Normalize gram matrix values as found in https://github.com/andersbll/neural\_artistic\_style/blob/master/style_network.py Use different network layers for the style and content and different layer weights to see the effect on the output Use a different CNN (e.g. Inception or SqueezeNet) to see if the performance can be improved.

Resources

http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Gatys_Image_Style_Transfer_CVPR_2016_paper.pdf - Image Style Transfer Using Convolutional Neural Networks. CVPR 2016.
A Neural Algorithm of Artistic Style. https://arxiv.org/pdf/1508.06576v2.pdf
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION. ICLR 2015. https://arxiv.org/pdf/1409.1556v6.pdf
https://github.com/andersbll/neural_artistic_style - Referenced for images and loss function
https://github.com/jcjohnson/neural-style - Referenced for images and loss function
jcjohnson/neural-style#36 - Discussion about generating large images
https://www.reddit.com/r/imagemagick/comments/4r8h0x/how_to_crop_an_image_into_overlapping_tiles/ - Discussion about generating large images
Lasagne https://github.com/Lasagne/Lasagne - Lightweight library to build and train neural networks in Theano
keras http://keras.io/ for the neural network library. "Keras is a minimalist, highly modular neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research."
https://www.tensorflow.org/ "TensorFlow™ is an open source software library for numerical computation using data flow graphs."
http://deeplearning.net/software/theano/ "Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently."
The weights for a pre-trained VGG-16 model were downloaded from https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3
Referenced https://github.com/Lasagne/Recipes/blob/master/examples/styletransfer/Art%20Style%20Transfer.ipynb

dnuffer / large_neural_style Goto Github PK