In this repository I have implemented original Neural Style Transfer paper "Image Style Transfer Using Convolutional Neural Networks" and inspected how result of transfer of content and style image changes by changing weight constants, learning rates, optimizers etc.
Style Transfer is the task of composing style from one image which is style image over another image which is content image. Before doing style transfer using neural network the major limiting factor in this task was feature representation of content and style image for better composition. Lack of such representations thwarted the way to understand the semantics and separation between the two. With the success ✔️ of VGG networks on ImageNet Challenge in Object Localization and Object Detection 🔍 , researchers gave the style transfer a neural approach.
Authors used the feature representations from VGG network to learn high and low level features of both content and style images. Using these implicit information they kept minimizing the loss between content representation and generated image representation using MSELoss and between style representation and generated image representation using MSELoss of Gram Matrices. Task of Neural Style Transfer unlike supervised learning doesn't have metric to compare performance of quality of image(s). We are not training model but updating the values of image itself in every iteration using gradient descent such that it match closely with content and style image.
I believe this brief overview of Neural Style Transfer is enough to get us started with experiments and notice some fascinating results.
Note: This is not a blog post on Neural Style Transfer. No exlpanation on the type of model, training etc is provided.
For our experiments we will set the parameters to following value until explicitly written.
iterations: 2500
fps: 30
size: 128
sav_freq: 10
alpha: 5.0
beta: 7000.0
gamma: 1.2
style_weights: [1e3/n**2 for n in [16.0,32.0,128.0,256.0,512.0]]
lr: 0.06
if path to content image and style images are not provided then default images will be used that lie inside NeuraltyleTransfer-App/src/data
For detailed understanding about these parameters go through python3 main.py -h
Neural Style Transfer is like painting an image over a canvas. This canvas is of same size to that of content image since content is static and only dynamic changes that need to be composed over this canvas is of style image. Though size is same to that of content image but there are 3 - 4 ways we can initialize this canvas with, and then using gradient descent 📉 update the values of the canvas.
Following shell command can lead you to generate canvas by blending the style over content image. This is basic bash command for reconstruction of canvas, for more infomation about arguments you can go through python3 main.py --help
python3 main.py --reconstruct --content_layers <num> --style_layers 0 1 2 3 4
We can initialize the canvas with noise and then update the values to look similar to the content image having style composed on it. Using below script we generate a noise canvas and set its requires_grad = True
. This enables the grad function to update the values of the following canvas.
generated_image = torch.randn(content_image.size())
generated_image.to(device, torch.float)
generated_image.requires_grad = True
Lets start with some experiments... 🔬
bash command e.g,
python3 main.py --reconstruct --style_layers 0 1 2 3 4 --content_layers 1 --optimizer "Adam"
parameters we are using
optimizer: "Adam"
init_image: "noise"
Content_Layer | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
Generated Canvas |
on A4000
GPU it took 33s to run with current configuration for one canvas generation
Early layers have composed style over canvas relatively well than higher layers but lost the semantics of content in terminal layers. Mid level layers have preserved the content while focusing less on style composition.
python3 main.py --reconstruct --style_layers 0 1 2 3 4 --content_layers 0 --iterations 2000
parameters we using
optimizer: "LBFGS"
init_image: "noise"
Content_Layer | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
Generated Canvas |
pn A4000
GPU it took 120s to run with current configuration for one canvas generation
Again early layers composed style over canvas relatively well than higher layers but while moving towards higher layers canvas is losing content representation maybe due to over composition of style. Last layer has again lost semantics to quite some extent.
We can initialize the canvas with content image itself and then update the values to look similar to the content image having style composed on it. Using below line of code we initiate canvas with content image.
generated_image = content_image.clone().requires_grad_(True)
lets' start with some experiments...:microscope:
bash command e.g,
python3 main.py --reconstruct --style_layers 0 1 2 3 4 --content_layers 1 --optimizer "Adam" --init_image "content"
Content_Layers | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
Adam | |||||
LBFGS | |||||
Adam |
In first two rows the only change is in use of optimizer, and clearly both the optimizer produces comparitively similar canvas except in the last layer. Also Adam
needs more iterations to produce semantically similar canvas to that of what LBFGS
produce but at the same time former is quite fast to compute since its first order method and doesn't compute curvature of parameter space like latter.
So we used Adam
once again on different set of content and style image(last row) to generate canvas and found that last layer in all the cases loses some content information and style over composes on canvas. First two layers are giving comparitively better results all the time.
We can initialize the canvas with style image itself and then update the values to look similar to the content image having style composed on it. Using below line of code we initiate canvas with content image.
generated_image = style_image.clone().requires_grad_(True)
lets' start with some experiments... 🔬
python3 main.py --reconstruct --style_layers 0 1 2 3 4 --content_layers 1 --optimizer "Adam" --init_image "style"
Content_Layers | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
Adam |
Composing content representation over style canvas doesn't seem like a great idea. Last layers over composed the style with some noise, content_layer: 2
smoothen out the background highlighting content.
From above experiments we can infer that in content_layer: 4
canvas has lost semantics to some extent due to either over composing of style or under-representation of content representation. We can infer that out in Visualization by looking at what each layer is contributing in generating canvas. The same can be said for content_layer: 3
but with relatively less prominence than the former.
In content_layer: 0
we can see that style is well composed over canvas while also preserving the content representation, same can be said for content_layer: 1
but with less prominence. So for further experiment lets' use content_layer: 0
and Adam
for fast computation. Currently we have seen all the canvases generated by conv
layers, lets experiment with relu
now.
Content_Layers | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
conv | |||||
relu |
looking at all the canvases from conv
and relu
we can infer that both the layers don't output very different canvases, and its safe to use either of layers for reconstruction.
Until now we have reconstructed canvases using all the style layers and any one content layer, but in this section we will visualize the individual and grouped contribution of style and content layers. We have 3 ways to do so, either only visualizing content layer(s), or visualizing style layers(s) or both layers.
shell command to visualize is
python3 main.py --visualize "content" --content_layers 1 2 --iterations 1500 --fps 30 --sav_freq 5
when --visualize "content"
then we can only visualize the content representation of any layer or by grouping some layers.
Content_Layers | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
Canvas |
Latter layers are capturing textures of the content image while not giving much weightage to color and low level feature details. Although content_layer: 4
canvas seems to have under-representated the content representation maybe due to insufficient number of gradients flowing back to canvas for update
Earlier layers captured the shape and somewhat texture really well.
What if we arbitralily choose some content layers and find the output of their resultant on canvas, lets check
python3 main.py --visualize "content" --content_layers 1 3 4 ---iterations 700 --fps 2 --sav_freq 5
Content_Layers | 1 3 4 | 0 2 4 |
---|---|---|
Canvas |
when --visualize "style"
then we can only visualize the style representation of any layer or by grouping some layers.
Style_Layers | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
Adam |
Style layers when visualized individualy seems to have not been contributing any significant style to canvas, infact while moving towards higher layers we see patterns of noise.
What if we arbitralily choose some content layers and find the output of their resultant on canvas, lets check
python3 main.py --visualize "style" --content_layers 1 3 4 ---iterations 2000 --fps 25 --sav_freq 8 --optimizer "Adam"
Style_Layers | 0 1 4 | 1 2 3 | 0 1 |
---|---|---|---|
Adam | |||
LBFGS |
canvas output when all the style layers were used |
When visualized grouped contribution of layers we can see some style over canvas very clearly. LBGFS
shows style in every canvas even when Adam
failed to in style_layers: 1 2 3
. On further looking into the matter we found that Adam too atleast 4000 iterations to learn the representations and output visually appealing style in comparision to others. The reason behind it can be that higher layers don't focus more on colors but on texture and Adam
find it hard to extract the color features information than LBFGS
.
In the last we can visualize what all the style layers are contributing to the canvas, it looks more similar to style image itself.
For fun we will use all the style and content layers to generate the canvas, although this configuration worked for the below image but no for many.
Original image of lion was grey.
You can play with with other hyperparameters to generate canvases and enhance your understanding of Neural Style Transfer