richzhang / perceptualsimilarity Goto Github PK

View Code? Open in Web Editor NEW

3.4K 51.0 485.0 8.76 MB

LPIPS metric. pip install lpips

Home Page: https://richzhang.github.io/PerceptualSimilarity

License: BSD 2-Clause "Simplified" License

Python 94.68% Shell 3.04% Dockerfile 2.28%

deep-learning deep-neural-networks perceptual perceptual-metric perceptual-losses pytorch perceptual-similarity

perceptualsimilarity's People

Contributors

Stargazers

Watchers

Forkers

19ai shuangseu ml-lab faroit shubhampachori12110095 giantstonex7 jaedukseo pandinosaurus fahall zzutk waleedgondal murari023 iij0 lukeandshuo connellybarnes santisy nikky4d kristofe lawtonjohn cfandy ryfan-rs liviust b2220333 yf817 ddl0 thelittlekid dsp6414 mygmyg shubhtuls supershinyeyes jimeffry afcarl enkiwang tony32769 oucandrewlee heavyflavor locussam jriddy 32l eddilacm scapeqin muyunlingxuan plastic75 jodyngo alexemg jmoraes7 hyzcn chen-shixin jonathanjuhl snlee81 ankitshah009 qaperf kirosg fei-aiart jliangnku amjltc295 layumi qiqzhang saurabh23 jwgu dx111 klqulei alexlee-gk aesopcode gongwk linda-liu g1910 youyuge34 satwantkumar shiyuan0806 justbbused wittawatj ferrine amitraj93 frizy-up leaveitout xgmiao ssnl berther samrtisong jhaux thomastilli reda-abdellah medical-images-process jamirando salehnia csjunxu whitepainter ishengfang aniket1998 amirunpri2018 quinnqiao wjinhai sunshine352 christinaliang pat-hanbury janesjanes peterzs vaibhavkumar11 electryone

perceptualsimilarity's Issues

Compare patches with different spatial resolution

Would it also be possible to compare 2 images with different spatial resolution?
e.g.

img_1 = 256 X 270 X 3
img_2 = 180 X 245 X 3

One hacky way of doing it to resize both images to same size and compare.
In the end, it network gives a feature vector of different length if image res are different for 2 images. But just curious in case you have tried out something.

What's the input image size of Alexnet/VGG/Squeezenet?

Nice paper! One question:What's the input image size of Alexnet/VGG/Squeezenet?

Thanks in advance!

How to interpret the distance?

Hi, how can we interpret the physical meaning of the similarity distance?
For example in which range, the distance means the images are very similar?
For example in which range, the distance means the images are very different?

I understood that 0 means two pictures are exactly the same. However, what if a value is around 0.5?
Any suggestions?

Thanks.

upsample function leads to tensor size mismatch for certain input image sizes when spatial=True

Currently, the upsample function is as follows:

def upsample(in_tens, out_HW=(64,64)): # assumes scale factor is same for H and W
    in_H, in_W = in_tens.shape[2], in_tens.shape[3]
    scale_factor_H, scale_factor_W = 1.*out_HW[0]/in_H, 1.*out_HW[1]/in_W

    return nn.Upsample(scale_factor=(scale_factor_H, scale_factor_W), mode='bilinear', align_corners=False)(in_tens)

This ends up failing in the case where the input images being compared are of resolution 800x600. When this is the case, one of the layers passed in as in_tens has shape (1, 1, 149, 199). As a result, in_H * scale_factor_H = 600.0000000000001 and in_W * scale_factor_W = 799.9999999999999. The result of the Upsample is an output tensor of size (1,1,600,799), which leads to an exception when it is added to other tensors of size (1,1,600,800).

Instead of computing the scale_factor, a more robust solution is to just set the size parameter directly:

    return nn.Upsample(size=out_HW, mode='bilinear', align_corners=False)(in_tens)

This might also be the cause of this specific comment: #45 (comment)

When I use compute_dists.py and get the distance between 2 images, what does it mean?

If the distance is smaller, does it mean the two images are similar?

Can't import models

Hello!

The following doesn't work:
import PerceptualSimilarity.models as psm
I have cloned the repository into my working directory and attempt to use the similarity function for my pictures.
The import fails with the following error:
`ModuleNotFoundError Traceback (most recent call last)
in
----> 1 import PerceptualSimilarity.models as psm

~/HDD/works/Skoltech/CapsuleAD/src/PerceptualSimilarity/models/init.py in
9 from torch.autograd import Variable
10
---> 11 from models import dist_model
12

ModuleNotFoundError: No module named 'models'`

What am I doing wrong?
Could you please look into it?

dimension is not compatible with "net" model.

When using --model net for using an off-the-shelf network, train.py breaks here:

Line 194 in 2416334

    
           return self.model.forward(torch.cat((d0,d1,d0-d1,d0/(d1+eps),d1/(d0+eps)),dim=1))

because the dimensionality don't match. When using net-lin models, the dimensionality is [50, 1, 1, 1] while this wrapping into arrays doesn't happen when using net models.
Is there a reason for this wrapping under the net-lin model?

No compute_dists_pair.py

Hello, i can't find the file compute_dists_pair.py, could you release it? Thanks.

Contrary conclusions for LPIPS metirc

python test_network.py
Model [SSIM] initialized
Distances: (0.262, 0.344)
python test_network.py
Loading model from: /data/sunzhaomang/AdvFeat/PerceptualSimilarity/weights/v0.1/alex.pth
Model [net-lin [alex]] initialized
Distances: (0.034, 0.037)
python test_network.py
Loading model from: /data/sunzhaomang/AdvFeat/PerceptualSimilarity/weights/v0.1/alex.pth
Model [net-lin [alex]] initialized
Distances: (0.041, 0.047)
Using SSIM and LPIPS metric, the distance between ex_ref and ex_p0 is smaller than that between ex_ref and ex_p1, that is ex_p0.png is more similiar to ex_ref.png than ex_p1.png, which is contrary to the claim referred in the paper. How to explain this ???

how to download the dataset independently

Hi
Do you have a link that I can use to download the dataset, I am asking for windows

how to use lpips loss in tensorflow?

Thank you for your significant contribution！

It seems that the LPIPS loss function can not be used directly in tensorflow to train a neural network. What should I do if I want to use it？

data normalization - possible bug

I noticed that the data normalization is

transforms.Normalize((0.5, 0.5, 0.5),(0.5, 0.5, 0.5))

(for example in twoafc_dataset.py).
The imagenet normalization coefficients are
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
This raises the questions:

possibly you confused mean with variance? (the Normalize function accepts std as the second argument, not variance)
any reason behind the design choice not to use the imagenet normalization?

There is a bug when I try to train with Multi-GPU.

I guess there is a bug when you put your model to mutil-gpus.

Reference:
pytorch/pytorch#8637 (comment)

small input size

I am trying to independently replicate the LPIPS metric in Keras, initially focusing on uncalibrated VGG. Following the README I was getting the test_network.py working, but am a little confused by the three example images ex_ref.png, ex_p0.png, and ex_p1.png and how they are processed.

Each of these images are 64x64, and in test_network.py they are passed to the vgg network without scaling. But the native input size of VGG is 224x224 and the pytorch models documentation clearly states that input sizes are expected to that size (or larger):

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224.

Notably, when provided with 224x224 inputs, the layer sizes are:

(64, 224, 224)
(128, 112, 112)
(256, 56, 56)
(512, 28, 28)
(512, 14, 14)

However when they left at 64x64 without scaling, the layer sizes are smaller at each stage:

(64, 64, 64)
(128, 32, 32)
(256, 16, 16)
(512, 8, 8)
(512, 4, 4)

I'm not familiar with pytorch internals and so it's not clear to me how to interpret this behaviour in porting this to Keras. So my questions are:

Are these smaller inputs in fact valid ways of using these pre-trained VGG weights?
Could the LPIPS metric alternatively be implemented by always scaling inputs to the expected WxH sizes?

Unable to download the dataset.

Is the server hosting the dataset down ? I am not able to download it "ERROR 503: Service Unavailable"

Can we measure the difference of 2 gray scale images?

Hello author,

Can we measure the difference of 2 gray scale images? Or this metric is only used for RGB images?

Thank you.

Why channel-wisely compute feature map L2 distance?

Hi, LPIPS helps me a lot in image translation task! There's one question that I could not figure it out by myself. For feature map distance, why the paper computes L2 distance channel-wisely and then averages spatially? Could we compute L2 distance spatially(flatten feature map for one channel, and compute L2 distance) and then average over channels?

Thanks!

suggested fix in weights loading code

I find that when I import PerceptualSimilarity as a package, the weights loading line in dist_model.py fails as '.' points to the current directory of my calling code rather than the root directory of PerceptualSimilarity.

Here's a patch that fixes the issue:
fix_ps.patch.txt

Using Perceptual Similarity as a loss function for a Neural Network

I am trying to replace the standard loss functions like MSE in my autoencoder network, with the Perceptual Similarity Metric. I wanted to know whether this would be possible, since instances of the network and specific formatting might be required for the same.

What about 'scratch' and 'tune' models?

Hey, I hope you are well.
In the model options for your loss, we can either go for 'net'(vanilla pre-trained CNN) or 'net-lin' (which I assume is the one with the learned linear layer). I was interested in the 'tune' and 'scratch' models of the CNNs for research purposes, are they available, how can I obtain them? Thank you for your time.

pip install lpips _ the version installed got an error

I installed lpips with the command that you suggested but the version installed has an error on file init.py of lpips folder. It is missing plt in load_image.

Using WGAN to Calculate PerceptualSimilarity

Thanks for Publishing the code and appreciate if you could help me understand this.

I trained a WGAN on my own data. Now, i am planning to use the generator network features[weights] to calculate PerceptualSimilarity score. I am not quite sure how to do this.

If i correctly understood, In either VGG/Resnet we will be passing the query images(image1, image2) through the network, for each input image we get all the features and calculate the score using them.

But i am not sure how to use that features for WGAN, since the input to the generator network is noise and the output is the synthetically generated images. How do i pass query images to get those features?

'lpips' has no attribut 'PerceptualLoss'

When i use the ' sudo pip3 install lpips'

import lpips
and i use ' percept = lpips.PerceptualLoss(model='net-lin', net ='vgg', use_gpu= True)
then the error 'lpips' has no attribut 'PerceptualLoss'

Difference Paper - Implementation

Dear authors,

equation (1) in the paper states that you are taking the euclidean norm squared of the weighted differences.
Something like euclidean_norm(dot(w_l, (y - y_0)))²
However, in the implementation you are weighting the squared difference of the euclidean norms, something like dot(w_l, (euclidean_norm(y)-euclidean_norm(y_0))²) which as far as I am concerend is not the same thing. Or am I missing something here?

Thanks!

Question about input size

Hi @richzhang , I notice you adopt input image size as 6464. My question is if atribute input size is ok? or it should be dixed as 6464?

PerceptualLoss uses memory on GPU 0 when specified otherwise

When running PerceptualLoss on a machine with multiple GPUs, it always uses some memory of GPU 0, even when gpu_ids specifies another one.

Minimal example to reproduce:

import models
from time import sleep

model = models.PerceptualLoss(model='net-lin', net='vgg', gpu_ids=[2])
sleep(20)

for the appendix of paper

Hi, I can't find the appendix of your paper. Could you help me find it?

unable to load weights

I get the following error when running test_network.py:

Traceback (most recent call last):
  File "test_network.py", line 11, in <module>
    model.initialize(model='net-lin',net='alex',use_gpu=False)
  File "/Users/faro/repositories/PerceptualSimilarity/models/dist_model.py", line 38, in initialize
    self.net.load_state_dict(torch.load('./weights/%s.pth'%net, map_location=lambda storage, loc: 'cpu'))
  File "/Users/faro/repositories/PerceptualSimilarity/.env/lib/python3.6/site-packages/torch/serialization.py", line 261, in load
    return _load(f, map_location, pickle_module)
  File "/Users/faro/repositories/PerceptualSimilarity/.env/lib/python3.6/site-packages/torch/serialization.py", line 409, in _load
    result = unpickler.load()
  File "/Users/faro/repositories/PerceptualSimilarity/.env/lib/python3.6/site-packages/torch/_utils.py", line 74, in _rebuild_tensor
    module = importlib.import_module(storage.__module__)
AttributeError: 'str' object has no attribute '__module__'

To get there I needed to change a few files and fixing some import bugs
One thing that would really help to reproduce the results if you could specify the requirements (especially the pytorch version). Maybe consider adding a requirements file like I did here: See my fork here: faroit@4ccefee#diff-b4ef698db8ca845e5845c4618278f29a

Performance on ImageNet data (and similarly sized images)

I notice that the model by default assumes images of dimensions 64x64. I'm curious how the model/distance metric performs for higher resolutions like that of ImageNet (224x224).

Installation on Ubuntu 19.04

Thank you for such a well setup! I managed to run the code with a tiny modification on an Ubuntu 19.04. I commented those two lines in your requirements.txt:

#numpy>=1.14.3
#opencv>=2.4.11

Later, I installed the numpy and the opencv from Ubuntu's own repositories:

sudo apt-get install python-numpy python-opencv

Lastly, I verified by running comparison of two sample images as in below:

XXXXX:PerceptualSimilarity$ python compute_dists.py -p0 imgs/ex_ref.png -p1 imgs/ex_p0.png --use_gpu
Setting up Perceptual loss...
Loading model from: /home/XXXXX/PerceptualSimilarity/models/weights/v0.1/alex.pth
...[net-lin [alex]] initialized
...Done
Distance: 0.722

Geometric distortion

Is it possible to use this to measure geometric distortion? As in the quality of a retargeting compared to a full reference?

does the lpips work on grey images?

Is this safe to use lpips for gray images? The code does not work for 1 channel images. The hack would be to use 3 identical channels yet I am not sure what would be the effect within the end-to-end calibrated solution on color images.

Image width and height are not equal

It seems that the code can't calculate the metric on the images with unequal width and height, can it be expanded to calculate images of various sizes?

Enabling normalization causes an error

It looks like target and pred should be replaced with in0 and in1.

PerceptualSimilarity/lpips/lpips.py

Line 83 in a1188a3

target = 2 * target - 1

To reproduce: set normalize=True when calling the loss function.

Hello， Is the distance of two picture between 0 and 1?

ModuleNotFoundError caused from inside lpips

Any idea why importing lpips is causing this error? It seems to be something from the IPython import causing issues. The ipython (and prompt_toolkit) versions are both the latest release. It works fine within an IPython notebook, but importing lpips inside my training job causes this crash.

Traceback (most recent call last):
  File "run_train.py", line 8, in <module>
    import models
  File "/home/timbrooks/code/prototypes/models/__init__.py", line 2, in <module>
    from .model import *
  File "/home/timbrooks/code/prototypes/models/model.py", line 18, in <module>
    import lpips
  File "/home/timbrooks/anaconda3/envs/prototypes/lib/python3.8/site-packages/lpips/__init__.py", line 11, in <module>
    from lpips.trainer import *
  File "/home/timbrooks/anaconda3/envs/prototypes/lib/python3.8/site-packages/lpips/trainer.py", line 11, in <module>
    from IPython import embed
  File "/home/timbrooks/anaconda3/envs/prototypes/lib/python3.8/site-packages/IPython/__init__.py", line 56, in <module>
    from .terminal.embed import embed
  File "/home/timbrooks/anaconda3/envs/prototypes/lib/python3.8/site-packages/IPython/terminal/embed.py", line 16, in <module>
    from IPython.terminal.interactiveshell import TerminalInteractiveShell
  File "/home/timbrooks/anaconda3/envs/prototypes/lib/python3.8/site-packages/IPython/terminal/interactiveshell.py", line 21, in <module>
    from prompt_toolkit.formatted_text import PygmentsTokens
ModuleNotFoundError: No module named 'prompt_toolkit.formatted_text'

PNetlin.forward image normalization

    def forward(self, in0, in1):
        in0_sc = (in0 - self.shift.expand_as(in0)) / self.scale.expand_as(in0)
        in1_sc = (in1 - self.shift.expand_as(in0)) / self.scale.expand_as(in0)

        if (self.pnet_tune):
            outs0 = self.net.forward(in0)
            outs1 = self.net.forward(in1)
        else:
            outs0 = self.net[0].forward(in0)
            outs1 = self.net[0].forward(in1)

Why you don't use in0_sc to feed the net? Is it a bug or a feature?

How about the training script?

as title states

Trained on own datase but loss does not drop !

I want to use this model to detect the similarity of the two people's handwriting signature . so i made dataset just like 2afc.

train on net-lin + alex

(ep: 9, it: 20000, t: 0.003[s], ept: 0.16/0.55[h]) loss_total: 0.564, acc_r: 0.680
(ep: 9, it: 25000, t: 0.003[s], ept: 0.20/0.56[h]) loss_total: 0.556, acc_r: 0.700
(ep: 9, it: 30000, t: 0.003[s], ept: 0.25/0.58[h]) loss_total: 0.525, acc_r: 0.780
(ep: 9, it: 35000, t: 0.003[s], ept: 0.30/0.60[h]) loss_total: 0.570, acc_r: 0.720
(ep: 9, it: 40000, t: 0.003[s], ept: 0.35/0.62[h]) loss_total: 0.511, acc_r: 0.800
(ep: 9, it: 45000, t: 0.003[s], ept: 0.42/0.65[h]) loss_total: 0.674, acc_r: 0.660
(ep: 9, it: 50000, t: 0.003[s], ept: 0.48/0.67[h]) loss_total: 0.545, acc_r: 0.700
(ep: 9, it: 55000, t: 0.003[s], ept: 0.54/0.69[h]) loss_total: 0.548, acc_r: 0.720
(ep: 9, it: 60000, t: 0.003[s], ept: 0.62/0.72[h]) loss_total: 0.626, acc_r: 0.660
(ep: 9, it: 65000, t: 0.003[s], ept: 0.69/0.75[h]) loss_total: 0.606, acc_r: 0.640
(ep: 9, it: 70000, t: 0.003[s], ept: 0.77/0.77[h]) loss_total: 0.516, acc_r: 0.720
(ep: 10, it: 5000, t: 0.003[s], ept: 0.03/0.38[h]) loss_total: 0.447, acc_r: 0.800
(ep: 10, it: 10000, t: 0.003[s], ept: 0.05/0.38[h]) loss_total: 0.500, acc_r: 0.800
(ep: 10, it: 15000, t: 0.003[s], ept: 0.10/0.48[h]) loss_total: 0.484, acc_r: 0.840
(ep: 10, it: 20000, t: 0.003[s], ept: 0.15/0.52[h]) loss_total: 0.523, acc_r: 0.760
(ep: 10, it: 25000, t: 0.003[s], ept: 0.20/0.56[h]) loss_total: 0.579, acc_r: 0.700
(ep: 10, it: 30000, t: 0.003[s], ept: 0.25/0.59[h]) loss_total: 0.609, acc_r: 0.620
(ep: 10, it: 35000, t: 0.003[s], ept: 0.31/0.63[h]) loss_total: 0.544, acc_r: 0.760
(ep: 10, it: 40000, t: 0.003[s], ept: 0.38/0.67[h]) loss_total: 0.613, acc_r: 0.660
(ep: 10, it: 45000, t: 0.003[s], ept: 0.45/0.70[h]) loss_total: 0.569, acc_r: 0.700
(ep: 10, it: 50000, t: 0.003[s], ept: 0.52/0.74[h]) loss_total: 0.567, acc_r: 0.660
(ep: 10, it: 55000, t: 0.003[s], ept: 0.59/0.75[h]) loss_total: 0.651, acc_r: 0.600
(ep: 10, it: 60000, t: 0.003[s], ept: 0.66/0.77[h]) loss_total: 0.492, acc_r: 0.780
(ep: 10, it: 65000, t: 0.003[s], ept: 0.73/0.79[h]) loss_total: 0.547, acc_r: 0.720
(ep: 10, it: 70000, t: 0.003[s], ept: 0.81/0.81[h]) loss_total: 0.608, acc_r: 0.660

rsub() received an invalid combination of arguments

when I call lpips_vgg = loss_fn_vgg(a, b) I encounter this error, but I have a correct result in other codes.


Traceback (most recent call last):
  File "test.py", line 125, in <module>
    lpips_vgg_y = loss_fn_vgg(cropped_sr_img_y * 255, cropped_gt_img_y * 255)
  File "/mnt/data0/home/name/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/data0/home/name/miniconda3/lib/python3.7/site-packages/lpips/lpips.py", line 87, in forward
    in0_input, in1_input = (self.scaling_layer(in0), self.scaling_layer(in1)) if self.version=='0.1' else (in0, in1)
  File "/mnt/data0/home/name/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/data0/home/name/miniconda3/lib/python3.7/site-packages/lpips/lpips.py", line 122, in forward
    return (inp - self.shift) / self.scale
  File "/mnt/data0/home/name/miniconda3/lib/python3.7/site-packages/torch/tensor.py", line 396, in __rsub__
    return _C._VariableFunctions.rsub(self, other)
TypeError: rsub() received an invalid combination of arguments - got (Tensor, numpy.ndarray), but expected one of:
 * (Tensor input, Tensor other, *, Number alpha)
 * (Tensor input, Number other, Number alpha)

So I checked the format and type of input arguements a and b, get

type and shape of sr:torch.FloatTensor torch.Size([3, 472, 312])
type and shape of gt:torch.FloatTensor torch.Size([3, 472, 312])

And in the format of variables in the previous code witch returned right result is:

hr_img shape:torch.Size([3, 480, 320]), type:torch.FloatTensor
sr_img  shape:torch.Size([3, 480, 320]), type:torch.FloatTensor
lpips:tensor([[[[0.2569]]]])

So I don't know which part should I correct.

Relation between scaling weights of paper and implementation

Hello @richzhang,

In the LPIPS paper, the 1x1 scaling convolution of the difference of the activations is performed before the squaring.

But, in the implementation, the difference of the activations is first squared, and after scaled.

diffs[kk] = (feats0[kk]-feats1[kk])**2
...
self.lin[kk](diffs[kk])

Is this a mistake ? If yes, in the paper or in the implementation ?

Using LPIPS metric for image retrieval

I understand that the model takes as input two images, by design. I would like to know if there is a smart way to use LPIPS metric for image retrieval, other than computing all the pairwise distances.

For information, my dataset of game banners contains about 30k images. In my previous experiments, I extracted image features once, and could then work with this processed data using standard tools for efficient similarity search based on cosine similarity, Minkowski distance, etc.

Thank your for your attention.

Tensor Size Mismatch when running inference for own images

First of all - great paper!

I'm trying to run the single image similarity script and running into this error (for my own input images of size (224,224,3):

RuntimeError: The size of tensor a (255) must match the size of tensor b (55) at non-singleton dimension 3

Any ideas why this could be happening?

Thanks,

Code request - self-supervides

Hi,
Any chance you can share the code for split-Brain / BiGAN / Puzzle

Thanks!!

bug when run the code

Hi,
When I run the code, I have a bug when a function in "networks_basic.py" is called, which is "in_tens.mean([2,3], keepdim=keepdim)". I don't know how to trackle this, so I take the liberty to ask for help. After search on the Internet, I guess the reason may be that the argument "dim" in the first position should be an integer instead of a list "[2,3]". The detailed error reporting information are as fllow.
Thanks

"Traceback (most recent call last):
File "compute_dists_pair.py", line 34, in
dist01 = model.forward(img0,img1).item()
File "PerceptualSimilarity-master/models/init.py", line 40, in forward
return self.model.forward(target, pred)
File "PerceptualSimilarity-master/models/dist_model.py", line 116, in forward
return self.net.forward(in0, in1, retPerLayer=retPerLayer)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "PerceptualSimilarity-master/models/networks_basic.py", line 79, in forward
res = [spatial_average(self.lins[kk].model(diffs[kk]), keepdim=True) for kk in range(self.L)]
File "PerceptualSimilarity-master/models/networks_basic.py", line 79, in
res = [spatial_average(self.lins[kk].model(diffs[kk]), keepdim=True) for kk in range(self.L)]
File "PerceptualSimilarity-master/models/networks_basic.py", line 18, in spatial_average
return in_tens.mean([2,3],keepdim=keepdim)
TypeError: mean() received an invalid combination of arguments - got (list, keepdim=bool), but expected one of:

()
(torch.dtype dtype)
(int dim, torch.dtype dtype)
didn't match because some of the keywords were incorrect: keepdim
(int dim, bool keepdim, torch.dtype dtype)
(int dim, bool keepdim)
didn't match because some of the arguments have invalid types: (list, keepdim=bool)

Paper regarding Table 2

In parameter type; parameters:
What is the meaning of oixel wise(l1) in Loss/Learning

The distance result changes when I run several times?

As the title, I wonder does there have a random process in the flow ? Why the distance changes each time?
And does the forward process take more time on gpu than cpu? I test on v100 gpu with 402ms for the first pair and about 50ms on cpu. Looks strange! Any help will be appreciated! Thanks.

How to remove the notification when using lpips

I've tried to use lpips in my super-resolution project and it keeps printing:
"Loading model from: C:\Workspace\envs\workplace\lib\site-packages\lpips\weights\v0.1\alex.pth
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]"

Is there any way to turn it off?

The output with GPU is always 0 for any inputs.

Hello sir,

Whatever the inputs are, why the output is always 0 when using GPU?
But when using only CPU, I can obtain the scores normally.

Thank you!

RuntimeError: Function 'CudnnConvolutionBackward' returned nan values in its 1th output.

Hi there, i'm currently training an Artifact Removal/Super Resolution model, a multilayer ESPCN, but i'm having this issue after few iterations of training:

This is the code how i instantiate the loss:

lpips = lpips.LPIPS(net='vgg')

This is the code about the model:

class ESPCNResBlock(nn.Module):
    def __init__(self, nf=64):
        super(ESPCNResBlock, self).__init__()
        self.conv1 = nn.Conv2d(nf, nf, kernel_size=3, padding=3 // 2)
        self.conv2 = nn.Conv2d(nf, nf, kernel_size=3, padding=3 // 2)

    def forward(self, input):
        x = self.conv1(input)
        x = F.hardtanh(x, min_val=-1, max_val=1.0)
        x = self.conv2(x)
        x = F.hardtanh(x, min_val=-1, max_val=1.0)
        return x + input

class ESPCN(nn.Module):
    def __init__(self, scale_factor=2, n_blocks=4, nf=64, in_channels=3, out_channels=3):
        super(ESPCN, self).__init__()
        self.scale_factor = scale_factor
        layers = [nn.Conv2d(in_channels, nf, kernel_size=5, padding=5 // 2),
                  nn.Hardtanh()]
        for _ in range(n_blocks//2):
            layers += [ESPCNResBlock(),
                       ]

        layers += [
            nn.Conv2d(nf, 32, kernel_size=3, padding=3 // 2),
            nn.Hardtanh(),
        ]
        self.first_part = nn.Sequential(*layers)
        self.last_part = nn.Sequential(
            nn.Conv2d(32, out_channels * (scale_factor ** 2), kernel_size=3, padding=3 // 2),
            nn.PixelShuffle(scale_factor) if scale_factor > 1 else nn.Identity(),
            nn.Tanh()
        )

        self._initialize_weights()

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                if m.in_channels == 32:
                    nn.init.normal_(m.weight.data, mean=0.0, std=0.001)
                    nn.init.zeros_(m.bias.data)
                else:
                    nn.init.normal_(m.weight.data, mean=0.0,
                                    std=math.sqrt(2 / (m.out_channels * m.weight.data[0][0].numel())))
                    nn.init.zeros_(m.bias.data)

    def forward(self, input):
        x = self.first_part(input)
        x = self.last_part(x)

        x = x + F.interpolate(input,
                              scale_factor=self.scale_factor,
                              mode='bilinear')

        x = torch.clamp(x, min=-1, max=1)
        return x

I've localized the error in the normalize function, however i'm still looking for a fix.
The model is trained with Adam on batch of 64x64 images.