Giter Site home page Giter Site logo

richzhang / perceptualsimilarity Goto Github PK

View Code? Open in Web Editor NEW
3.4K 51.0 485.0 8.76 MB

LPIPS metric. pip install lpips

Home Page: https://richzhang.github.io/PerceptualSimilarity

License: BSD 2-Clause "Simplified" License

Python 94.68% Shell 3.04% Dockerfile 2.28%
deep-learning deep-neural-networks perceptual perceptual-metric perceptual-losses pytorch perceptual-similarity

perceptualsimilarity's People

Contributors

denadai2 avatar dvschultz avatar richzhang avatar supershinyeyes avatar timothybrooks avatar yinan2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

perceptualsimilarity's Issues

Compare patches with different spatial resolution

Would it also be possible to compare 2 images with different spatial resolution?
e.g.

img_1 = 256 X 270 X 3
img_2 = 180 X 245 X 3

One hacky way of doing it to resize both images to same size and compare.
In the end, it network gives a feature vector of different length if image res are different for 2 images. But just curious in case you have tried out something.

How to interpret the distance?

Hi, how can we interpret the physical meaning of the similarity distance?
For example in which range, the distance means the images are very similar?
For example in which range, the distance means the images are very different?

I understood that 0 means two pictures are exactly the same. However, what if a value is around 0.5?
Any suggestions?

Thanks.

upsample function leads to tensor size mismatch for certain input image sizes when spatial=True

Currently, the upsample function is as follows:

def upsample(in_tens, out_HW=(64,64)): # assumes scale factor is same for H and W
    in_H, in_W = in_tens.shape[2], in_tens.shape[3]
    scale_factor_H, scale_factor_W = 1.*out_HW[0]/in_H, 1.*out_HW[1]/in_W

    return nn.Upsample(scale_factor=(scale_factor_H, scale_factor_W), mode='bilinear', align_corners=False)(in_tens)

This ends up failing in the case where the input images being compared are of resolution 800x600. When this is the case, one of the layers passed in as in_tens has shape (1, 1, 149, 199). As a result, in_H * scale_factor_H = 600.0000000000001 and in_W * scale_factor_W = 799.9999999999999. The result of the Upsample is an output tensor of size (1,1,600,799), which leads to an exception when it is added to other tensors of size (1,1,600,800).

Instead of computing the scale_factor, a more robust solution is to just set the size parameter directly:

    return nn.Upsample(size=out_HW, mode='bilinear', align_corners=False)(in_tens)

This might also be the cause of this specific comment: #45 (comment)

Can't import models

Hello!

The following doesn't work:
import PerceptualSimilarity.models as psm
I have cloned the repository into my working directory and attempt to use the similarity function for my pictures.
The import fails with the following error:
`ModuleNotFoundError Traceback (most recent call last)
in
----> 1 import PerceptualSimilarity.models as psm

~/HDD/works/Skoltech/CapsuleAD/src/PerceptualSimilarity/models/init.py in
9 from torch.autograd import Variable
10
---> 11 from models import dist_model
12

ModuleNotFoundError: No module named 'models'`

What am I doing wrong?
Could you please look into it?

dimension is not compatible with "net" model.

When using --model net for using an off-the-shelf network, train.py breaks here:

return self.model.forward(torch.cat((d0,d1,d0-d1,d0/(d1+eps),d1/(d0+eps)),dim=1))

because the dimensionality don't match. When using net-lin models, the dimensionality is [50, 1, 1, 1] while this wrapping into arrays doesn't happen when using net models.
Is there a reason for this wrapping under the net-lin model?

Contrary conclusions for LPIPS metirc

python test_network.py
Model [SSIM] initialized
Distances: (0.262, 0.344)
python test_network.py
Loading model from: /data/sunzhaomang/AdvFeat/PerceptualSimilarity/weights/v0.1/alex.pth
Model [net-lin [alex]] initialized
Distances: (0.034, 0.037)
python test_network.py
Loading model from: /data/sunzhaomang/AdvFeat/PerceptualSimilarity/weights/v0.1/alex.pth
Model [net-lin [alex]] initialized
Distances: (0.041, 0.047)
Using SSIM and LPIPS metric, the distance between ex_ref and ex_p0 is smaller than that between ex_ref and ex_p1, that is ex_p0.png is more similiar to ex_ref.png than ex_p1.png, which is contrary to the claim referred in the paper. How to explain this ???

how to use lpips loss in tensorflow?

Thank you for your significant contribution!

It seems that the LPIPS loss function can not be used directly in tensorflow to train a neural network. What should I do if I want to use it?

data normalization - possible bug

I noticed that the data normalization is

transforms.Normalize((0.5, 0.5, 0.5),(0.5, 0.5, 0.5))

(for example in twoafc_dataset.py).
The imagenet normalization coefficients are
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
This raises the questions:

  1. possibly you confused mean with variance? (the Normalize function accepts std as the second argument, not variance)
  2. any reason behind the design choice not to use the imagenet normalization?

small input size

I am trying to independently replicate the LPIPS metric in Keras, initially focusing on uncalibrated VGG. Following the README I was getting the test_network.py working, but am a little confused by the three example images ex_ref.png, ex_p0.png, and ex_p1.png and how they are processed.

Each of these images are 64x64, and in test_network.py they are passed to the vgg network without scaling. But the native input size of VGG is 224x224 and the pytorch models documentation clearly states that input sizes are expected to that size (or larger):

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224.

Notably, when provided with 224x224 inputs, the layer sizes are:

  • (64, 224, 224)
  • (128, 112, 112)
  • (256, 56, 56)
  • (512, 28, 28)
  • (512, 14, 14)

However when they left at 64x64 without scaling, the layer sizes are smaller at each stage:

  • (64, 64, 64)
  • (128, 32, 32)
  • (256, 16, 16)
  • (512, 8, 8)
  • (512, 4, 4)

I'm not familiar with pytorch internals and so it's not clear to me how to interpret this behaviour in porting this to Keras. So my questions are:

  • Are these smaller inputs in fact valid ways of using these pre-trained VGG weights?
  • Could the LPIPS metric alternatively be implemented by always scaling inputs to the expected WxH sizes?

Why channel-wisely compute feature map L2 distance?

Hi, LPIPS helps me a lot in image translation task! There's one question that I could not figure it out by myself. For feature map distance, why the paper computes L2 distance channel-wisely and then averages spatially? Could we compute L2 distance spatially(flatten feature map for one channel, and compute L2 distance) and then average over channels?

Thanks!

suggested fix in weights loading code

I find that when I import PerceptualSimilarity as a package, the weights loading line in dist_model.py fails as '.' points to the current directory of my calling code rather than the root directory of PerceptualSimilarity.

Here's a patch that fixes the issue:
fix_ps.patch.txt

What about 'scratch' and 'tune' models?

Hey, I hope you are well.
In the model options for your loss, we can either go for 'net'(vanilla pre-trained CNN) or 'net-lin' (which I assume is the one with the learned linear layer). I was interested in the 'tune' and 'scratch' models of the CNNs for research purposes, are they available, how can I obtain them? Thank you for your time.

Using WGAN to Calculate PerceptualSimilarity

Thanks for Publishing the code and appreciate if you could help me understand this.

I trained a WGAN on my own data. Now, i am planning to use the generator network features[weights] to calculate PerceptualSimilarity score. I am not quite sure how to do this.

If i correctly understood, In either VGG/Resnet we will be passing the query images(image1, image2) through the network, for each input image we get all the features and calculate the score using them.

But i am not sure how to use that features for WGAN, since the input to the generator network is noise and the output is the synthetically generated images. How do i pass query images to get those features?

'lpips' has no attribut 'PerceptualLoss'

When i use the ' sudo pip3 install lpips'

  • import lpips

  • and i use ' percept = lpips.PerceptualLoss(model='net-lin', net ='vgg', use_gpu= True)

  • then the error 'lpips' has no attribut 'PerceptualLoss'

Difference Paper - Implementation

Dear authors,

equation (1) in the paper states that you are taking the euclidean norm squared of the weighted differences.
Something like euclidean_norm(dot(w_l, (y - y_0)))²
However, in the implementation you are weighting the squared difference of the euclidean norms, something like dot(w_l, (euclidean_norm(y)-euclidean_norm(y_0))²) which as far as I am concerend is not the same thing. Or am I missing something here?

Thanks!

PerceptualLoss uses memory on GPU 0 when specified otherwise

When running PerceptualLoss on a machine with multiple GPUs, it always uses some memory of GPU 0, even when gpu_ids specifies another one.

Minimal example to reproduce:

import models
from time import sleep

model = models.PerceptualLoss(model='net-lin', net='vgg', gpu_ids=[2])
sleep(20)

Bildschirmfoto 2020-05-08 um 17 10 15

unable to load weights

I get the following error when running test_network.py:

Traceback (most recent call last):
  File "test_network.py", line 11, in <module>
    model.initialize(model='net-lin',net='alex',use_gpu=False)
  File "/Users/faro/repositories/PerceptualSimilarity/models/dist_model.py", line 38, in initialize
    self.net.load_state_dict(torch.load('./weights/%s.pth'%net, map_location=lambda storage, loc: 'cpu'))
  File "/Users/faro/repositories/PerceptualSimilarity/.env/lib/python3.6/site-packages/torch/serialization.py", line 261, in load
    return _load(f, map_location, pickle_module)
  File "/Users/faro/repositories/PerceptualSimilarity/.env/lib/python3.6/site-packages/torch/serialization.py", line 409, in _load
    result = unpickler.load()
  File "/Users/faro/repositories/PerceptualSimilarity/.env/lib/python3.6/site-packages/torch/_utils.py", line 74, in _rebuild_tensor
    module = importlib.import_module(storage.__module__)
AttributeError: 'str' object has no attribute '__module__'

To get there I needed to change a few files and fixing some import bugs
One thing that would really help to reproduce the results if you could specify the requirements (especially the pytorch version). Maybe consider adding a requirements file like I did here: See my fork here: faroit@4ccefee#diff-b4ef698db8ca845e5845c4618278f29a

Installation on Ubuntu 19.04

Thank you for such a well setup! I managed to run the code with a tiny modification on an Ubuntu 19.04. I commented those two lines in your requirements.txt:

#numpy>=1.14.3
#opencv>=2.4.11

Later, I installed the numpy and the opencv from Ubuntu's own repositories:

sudo apt-get install python-numpy python-opencv

Lastly, I verified by running comparison of two sample images as in below:

XXXXX:PerceptualSimilarity$ python compute_dists.py -p0 imgs/ex_ref.png -p1 imgs/ex_p0.png --use_gpu
Setting up Perceptual loss...
Loading model from: /home/XXXXX/PerceptualSimilarity/models/weights/v0.1/alex.pth
...[net-lin [alex]] initialized
...Done
Distance: 0.722

Geometric distortion

Is it possible to use this to measure geometric distortion? As in the quality of a retargeting compared to a full reference?

does the lpips work on grey images?

Is this safe to use lpips for gray images? The code does not work for 1 channel images. The hack would be to use 3 identical channels yet I am not sure what would be the effect within the end-to-end calibrated solution on color images.

Image width and height are not equal

It seems that the code can't calculate the metric on the images with unequal width and height, can it be expanded to calculate images of various sizes?

ModuleNotFoundError caused from inside lpips

Any idea why importing lpips is causing this error? It seems to be something from the IPython import causing issues. The ipython (and prompt_toolkit) versions are both the latest release. It works fine within an IPython notebook, but importing lpips inside my training job causes this crash.

Traceback (most recent call last):
  File "run_train.py", line 8, in <module>
    import models
  File "/home/timbrooks/code/prototypes/models/__init__.py", line 2, in <module>
    from .model import *
  File "/home/timbrooks/code/prototypes/models/model.py", line 18, in <module>
    import lpips
  File "/home/timbrooks/anaconda3/envs/prototypes/lib/python3.8/site-packages/lpips/__init__.py", line 11, in <module>
    from lpips.trainer import *
  File "/home/timbrooks/anaconda3/envs/prototypes/lib/python3.8/site-packages/lpips/trainer.py", line 11, in <module>
    from IPython import embed
  File "/home/timbrooks/anaconda3/envs/prototypes/lib/python3.8/site-packages/IPython/__init__.py", line 56, in <module>
    from .terminal.embed import embed
  File "/home/timbrooks/anaconda3/envs/prototypes/lib/python3.8/site-packages/IPython/terminal/embed.py", line 16, in <module>
    from IPython.terminal.interactiveshell import TerminalInteractiveShell
  File "/home/timbrooks/anaconda3/envs/prototypes/lib/python3.8/site-packages/IPython/terminal/interactiveshell.py", line 21, in <module>
    from prompt_toolkit.formatted_text import PygmentsTokens
ModuleNotFoundError: No module named 'prompt_toolkit.formatted_text'

PNetlin.forward image normalization

    def forward(self, in0, in1):
        in0_sc = (in0 - self.shift.expand_as(in0)) / self.scale.expand_as(in0)
        in1_sc = (in1 - self.shift.expand_as(in0)) / self.scale.expand_as(in0)

        if (self.pnet_tune):
            outs0 = self.net.forward(in0)
            outs1 = self.net.forward(in1)
        else:
            outs0 = self.net[0].forward(in0)
            outs1 = self.net[0].forward(in1)

Why you don't use in0_sc to feed the net? Is it a bug or a feature?

Trained on own datase but loss does not drop !

I want to use this model to detect the similarity of the two people's handwriting signature . so i made dataset just like 2afc.

train on net-lin + alex

(ep: 9, it: 20000, t: 0.003[s], ept: 0.16/0.55[h]) loss_total: 0.564, acc_r: 0.680
(ep: 9, it: 25000, t: 0.003[s], ept: 0.20/0.56[h]) loss_total: 0.556, acc_r: 0.700
(ep: 9, it: 30000, t: 0.003[s], ept: 0.25/0.58[h]) loss_total: 0.525, acc_r: 0.780
(ep: 9, it: 35000, t: 0.003[s], ept: 0.30/0.60[h]) loss_total: 0.570, acc_r: 0.720
(ep: 9, it: 40000, t: 0.003[s], ept: 0.35/0.62[h]) loss_total: 0.511, acc_r: 0.800
(ep: 9, it: 45000, t: 0.003[s], ept: 0.42/0.65[h]) loss_total: 0.674, acc_r: 0.660
(ep: 9, it: 50000, t: 0.003[s], ept: 0.48/0.67[h]) loss_total: 0.545, acc_r: 0.700
(ep: 9, it: 55000, t: 0.003[s], ept: 0.54/0.69[h]) loss_total: 0.548, acc_r: 0.720
(ep: 9, it: 60000, t: 0.003[s], ept: 0.62/0.72[h]) loss_total: 0.626, acc_r: 0.660
(ep: 9, it: 65000, t: 0.003[s], ept: 0.69/0.75[h]) loss_total: 0.606, acc_r: 0.640
(ep: 9, it: 70000, t: 0.003[s], ept: 0.77/0.77[h]) loss_total: 0.516, acc_r: 0.720
(ep: 10, it: 5000, t: 0.003[s], ept: 0.03/0.38[h]) loss_total: 0.447, acc_r: 0.800
(ep: 10, it: 10000, t: 0.003[s], ept: 0.05/0.38[h]) loss_total: 0.500, acc_r: 0.800
(ep: 10, it: 15000, t: 0.003[s], ept: 0.10/0.48[h]) loss_total: 0.484, acc_r: 0.840
(ep: 10, it: 20000, t: 0.003[s], ept: 0.15/0.52[h]) loss_total: 0.523, acc_r: 0.760
(ep: 10, it: 25000, t: 0.003[s], ept: 0.20/0.56[h]) loss_total: 0.579, acc_r: 0.700
(ep: 10, it: 30000, t: 0.003[s], ept: 0.25/0.59[h]) loss_total: 0.609, acc_r: 0.620
(ep: 10, it: 35000, t: 0.003[s], ept: 0.31/0.63[h]) loss_total: 0.544, acc_r: 0.760
(ep: 10, it: 40000, t: 0.003[s], ept: 0.38/0.67[h]) loss_total: 0.613, acc_r: 0.660
(ep: 10, it: 45000, t: 0.003[s], ept: 0.45/0.70[h]) loss_total: 0.569, acc_r: 0.700
(ep: 10, it: 50000, t: 0.003[s], ept: 0.52/0.74[h]) loss_total: 0.567, acc_r: 0.660
(ep: 10, it: 55000, t: 0.003[s], ept: 0.59/0.75[h]) loss_total: 0.651, acc_r: 0.600
(ep: 10, it: 60000, t: 0.003[s], ept: 0.66/0.77[h]) loss_total: 0.492, acc_r: 0.780
(ep: 10, it: 65000, t: 0.003[s], ept: 0.73/0.79[h]) loss_total: 0.547, acc_r: 0.720
(ep: 10, it: 70000, t: 0.003[s], ept: 0.81/0.81[h]) loss_total: 0.608, acc_r: 0.660

rsub() received an invalid combination of arguments

when I call lpips_vgg = loss_fn_vgg(a, b) I encounter this error, but I have a correct result in other codes.


Traceback (most recent call last):
  File "test.py", line 125, in <module>
    lpips_vgg_y = loss_fn_vgg(cropped_sr_img_y * 255, cropped_gt_img_y * 255)
  File "/mnt/data0/home/name/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/data0/home/name/miniconda3/lib/python3.7/site-packages/lpips/lpips.py", line 87, in forward
    in0_input, in1_input = (self.scaling_layer(in0), self.scaling_layer(in1)) if self.version=='0.1' else (in0, in1)
  File "/mnt/data0/home/name/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/data0/home/name/miniconda3/lib/python3.7/site-packages/lpips/lpips.py", line 122, in forward
    return (inp - self.shift) / self.scale
  File "/mnt/data0/home/name/miniconda3/lib/python3.7/site-packages/torch/tensor.py", line 396, in __rsub__
    return _C._VariableFunctions.rsub(self, other)
TypeError: rsub() received an invalid combination of arguments - got (Tensor, numpy.ndarray), but expected one of:
 * (Tensor input, Tensor other, *, Number alpha)
 * (Tensor input, Number other, Number alpha)

So I checked the format and type of input arguements a and b, get

type and shape of sr:torch.FloatTensor torch.Size([3, 472, 312])
type and shape of gt:torch.FloatTensor torch.Size([3, 472, 312])

And in the format of variables in the previous code witch returned right result is:

hr_img shape:torch.Size([3, 480, 320]), type:torch.FloatTensor
sr_img  shape:torch.Size([3, 480, 320]), type:torch.FloatTensor
lpips:tensor([[[[0.2569]]]])

So I don't know which part should I correct.

Relation between scaling weights of paper and implementation

Hello @richzhang,

In the LPIPS paper, the 1x1 scaling convolution of the difference of the activations is performed before the squaring.

image

But, in the implementation, the difference of the activations is first squared, and after scaled.

diffs[kk] = (feats0[kk]-feats1[kk])**2
...
self.lin[kk](diffs[kk])

Is this a mistake ? If yes, in the paper or in the implementation ?

Using LPIPS metric for image retrieval

I understand that the model takes as input two images, by design. I would like to know if there is a smart way to use LPIPS metric for image retrieval, other than computing all the pairwise distances.

For information, my dataset of game banners contains about 30k images. In my previous experiments, I extracted image features once, and could then work with this processed data using standard tools for efficient similarity search based on cosine similarity, Minkowski distance, etc.

Thank your for your attention.

Tensor Size Mismatch when running inference for own images

First of all - great paper!

I'm trying to run the single image similarity script and running into this error (for my own input images of size (224,224,3):

RuntimeError: The size of tensor a (255) must match the size of tensor b (55) at non-singleton dimension 3

Any ideas why this could be happening?

Thanks,

bug when run the code

Hi,
When I run the code, I have a bug when a function in "networks_basic.py" is called, which is "in_tens.mean([2,3], keepdim=keepdim)". I don't know how to trackle this, so I take the liberty to ask for help. After search on the Internet, I guess the reason may be that the argument "dim" in the first position should be an integer instead of a list "[2,3]". The detailed error reporting information are as fllow.
Thanks

"Traceback (most recent call last):
File "compute_dists_pair.py", line 34, in
dist01 = model.forward(img0,img1).item()
File "PerceptualSimilarity-master/models/init.py", line 40, in forward
return self.model.forward(target, pred)
File "PerceptualSimilarity-master/models/dist_model.py", line 116, in forward
return self.net.forward(in0, in1, retPerLayer=retPerLayer)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "PerceptualSimilarity-master/models/networks_basic.py", line 79, in forward
res = [spatial_average(self.lins[kk].model(diffs[kk]), keepdim=True) for kk in range(self.L)]
File "PerceptualSimilarity-master/models/networks_basic.py", line 79, in
res = [spatial_average(self.lins[kk].model(diffs[kk]), keepdim=True) for kk in range(self.L)]
File "PerceptualSimilarity-master/models/networks_basic.py", line 18, in spatial_average
return in_tens.mean([2,3],keepdim=keepdim)
TypeError: mean() received an invalid combination of arguments - got (list, keepdim=bool), but expected one of:

  • ()
  • (torch.dtype dtype)
  • (int dim, torch.dtype dtype)
    didn't match because some of the keywords were incorrect: keepdim
  • (int dim, bool keepdim, torch.dtype dtype)
  • (int dim, bool keepdim)
    didn't match because some of the arguments have invalid types: (list, keepdim=bool)

"

The distance result changes when I run several times?

As the title, I wonder does there have a random process in the flow ? Why the distance changes each time?
And does the forward process take more time on gpu than cpu? I test on v100 gpu with 402ms for the first pair and about 50ms on cpu. Looks strange! Any help will be appreciated! Thanks.

image

How to remove the notification when using lpips

I've tried to use lpips in my super-resolution project and it keeps printing:
"Loading model from: C:\Workspace\envs\workplace\lib\site-packages\lpips\weights\v0.1\alex.pth
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]"

Is there any way to turn it off?

RuntimeError: Function 'CudnnConvolutionBackward' returned nan values in its 1th output.

Hi there, i'm currently training an Artifact Removal/Super Resolution model, a multilayer ESPCN, but i'm having this issue after few iterations of training:

This is the code how i instantiate the loss:

lpips = lpips.LPIPS(net='vgg')

This is the code about the model:

class ESPCNResBlock(nn.Module):
    def __init__(self, nf=64):
        super(ESPCNResBlock, self).__init__()
        self.conv1 = nn.Conv2d(nf, nf, kernel_size=3, padding=3 // 2)
        self.conv2 = nn.Conv2d(nf, nf, kernel_size=3, padding=3 // 2)

    def forward(self, input):
        x = self.conv1(input)
        x = F.hardtanh(x, min_val=-1, max_val=1.0)
        x = self.conv2(x)
        x = F.hardtanh(x, min_val=-1, max_val=1.0)
        return x + input

class ESPCN(nn.Module):
    def __init__(self, scale_factor=2, n_blocks=4, nf=64, in_channels=3, out_channels=3):
        super(ESPCN, self).__init__()
        self.scale_factor = scale_factor
        layers = [nn.Conv2d(in_channels, nf, kernel_size=5, padding=5 // 2),
                  nn.Hardtanh()]
        for _ in range(n_blocks//2):
            layers += [ESPCNResBlock(),
                       ]

        layers += [
            nn.Conv2d(nf, 32, kernel_size=3, padding=3 // 2),
            nn.Hardtanh(),
        ]
        self.first_part = nn.Sequential(*layers)
        self.last_part = nn.Sequential(
            nn.Conv2d(32, out_channels * (scale_factor ** 2), kernel_size=3, padding=3 // 2),
            nn.PixelShuffle(scale_factor) if scale_factor > 1 else nn.Identity(),
            nn.Tanh()
        )

        self._initialize_weights()

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                if m.in_channels == 32:
                    nn.init.normal_(m.weight.data, mean=0.0, std=0.001)
                    nn.init.zeros_(m.bias.data)
                else:
                    nn.init.normal_(m.weight.data, mean=0.0,
                                    std=math.sqrt(2 / (m.out_channels * m.weight.data[0][0].numel())))
                    nn.init.zeros_(m.bias.data)

    def forward(self, input):
        x = self.first_part(input)
        x = self.last_part(x)

        x = x + F.interpolate(input,
                              scale_factor=self.scale_factor,
                              mode='bilinear')

        x = torch.clamp(x, min=-1, max=1)
        return x

I've localized the error in the normalize function, however i'm still looking for a fix.
The model is trained with Adam on batch of 64x64 images.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.