The rendering of semantic content in an image is deemed to be an arduous task in the field of image processing. A major roadblock in this technique was the lack of dataets which clearly distinguished an image from its content and style.
The algorithm used here allows us to synthesise new images which conglomerates an arbitrary photograph with famous art pieces.
As described by the paper, I make use of the VGG-19 architecture. A neat visual representation of the different layers is illustrated below
The only difference in my implementation is the use of Average Pooling layers as opposed to Max Pooling ones (since it yields slightly more appealing results).
The CNN was trained for a total of 500 epochs, which takes about a minute to generate the images when the code is accelerated by the GPU.
- Starry night
The image placed above is the style image (the one whose semantic features are to be transferred), and the one below is the content image (on which we wish to perform the style transfer)After performing style transfer on the image, we have the following result
- Golden gate
Similar to the above example, we have our initial images like so:And the image synthesised after performing style transfer on it:
The main reference that I used for implementing the code was the paper “Image Style Transfer Using Convolutional Neural Networks” (L. A. Gatys et. al.).
Apart from this, there were some portions of the mathematical description that came along (particularly Gram matrices) for which this was particularly useful for me.