haofanwang / score-cam Goto Github PK

Official implementation of Score-CAM in PyTorch

License: MIT License

Python 100.00%

score-cam cam cnn-visualization-technique saliency cnn-visualization pytorch visual-explanations explainability class-activation-maps gradient-free

score-cam's Introduction

Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks

We develop a novel post-hoc visual explanation method called Score-CAM, which is the first gradient-free CAM-based visualization method that achieves better visual performance (state-of-the-art).

Paper: Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks, appeared at IEEE CVPR 2020 Workshop on Fair, Data Efficient and Trusted Computer Vision. Our paper has been cited by 400!

Demo: You can run an example via Colab

Update

2021.12.16: A great MATLAB implementation from Kenta Itakura.

2021.4.03: A Pytorch implementation jacobgil/pytorch-grad-cam (3.8K Stars).

2020.8.18: A PaddlePaddle implementation from PaddlePaddle/InterpretDL.

2020.7.11: A Tensorflow implementation from keisen/tf-keras-vis.

2020.5.11: A Pytorch implementation from utkuozbulak/pytorch-cnn-visualizations (6.2K Stars).

2020.3.24: Merged into frgfm/torch-cam, a wonderful library that supports multiple CAM-based methods.

Citation

If you find this work is helpful in your research, please cite our work:

@inproceedings{wang2020score,
  title={Score-CAM: Score-weighted visual explanations for convolutional neural networks},
  author={Wang, Haofan and Wang, Zifan and Du, Mengnan and Yang, Fan and Zhang, Zijian and Ding, Sirui and Mardziel, Piotr and Hu, Xia},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops},
  pages={24--25},
  year={2020}
}

Thanks

Utils are built on flashtorch, thanks for releasing this great work!

Contact

If you have any questions, feel free to open an issue or directly contact me via: [email protected].

score-cam's People

Contributors

Stargazers

Watchers

Forkers

davidsirui trendingtechnology gtanisik dreadlord1984 joelorentz rezacsedu zhawhjw jennynanap youtang1993 keniuniu gt980103 alymostafa mathpopo mymuli xrosliang hdyen echo-wu151 herolin12 wangyuan249 joseph9303 fantasyzhai tjufan david-zzy tubbz-alt yuhenghuang42 umairjavaid jasonguo1 wjn0 xhqglorry11 kokuno-km xychen9459 finetooth aaaeeee ml-edu darrenzhang01 wukaiyeah lijing-coder scotter-qian vaynexie pengtaojiang jiangrn donglongzi ljiaqii hdsak xdotproduct siyukenny yjz1729 pondkann aoulalay bit0123 ruoyuchen10 duc-ng odysseyau ideafisher wicsp wanghailan jinho-genesislab ralphscheu sushmitha047 pattanaikay simeon340703 trudes808 lengocduc195khtn sidd462

score-cam's Issues

RuntimeError: Output 0 of UnbindBackward is a view and is being modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.

Traceback (most recent call last):
File "D:/1.Study/PycharmProjects/Score-CAM-master/test.py", line 50, in
basic_visualize(input_.cpu(), scorecam_map.type(torch.FloatTensor).cpu(), save_path='resnet.png')
File "D:\1.Study\PycharmProjects\Score-CAM-master\utils_init_.py", line 299, in basic_visualize
input_ = format_for_plotting(denormalize(input_))
File "D:\1.Study\PycharmProjects\Score-CAM-master\utils_init_.py", line 173, in denormalize
channel.mul_(std).add_(mean)

Grid Effects

I am only getting strong activations along a grid pattern. Attached image shows 1,000 averaged activation maps which highlights grid. Is there a known reason for this effect?

I am using a standard resnet implementation with dilation set to 1 on all conv layers

Which layer's feature should be used?

Hi, I want to visualize my datasets on ResNet50. I wonder which layers' feature map should be used. My data's resolution is 28x28.

Visualization Issue

Hello Haofan,
I have simply attached a code example to show what I need on the example for GradCAM given by Keras and added comments on crucial points.

Thanks in advance,

Halil
GradCAM1.zip

the speed of score-cam and grad-cam

I think the speed of score-cam is more slower than grad-cam,is that right?

Implementation for Energy-based Point Game

I have received many emails about releasing the code for Energy-based Point Game (a modified evaluation metric proposed in our paper), to promote the reproducibility of the research, I will release the code recently (I'm working on NeurIPS reviewing process now, I will clean up the code later). Thanks for your patience!

Point Game Function

Would you please offer your Point Game code for an evaluation? Thanks.

Could you please provide the feature importance included dataset that has been generated?

I found that using ScoreCAM to generate feature importance interpretations is slow, taking about 2 seconds for one image. If I want to generate all the image explanations in the Imagenet dataset, the time it takes seems very long. Or, is there a suitable acceleration method?

Efficiency problem regarding the implementation of Score-CAM

Dear author:

First of all, thanks for your great work!

When I check the code of scorecam.py, I notice that it computes the score_saliency_map for every single instance. This is sometimes inefficient when you want to compute the score-cam for a number of instances.

I also read your paper and find that for your algorithm it is in fact possible to compute score-cam for a mini-batch (correct me if I am wrong). This can be more efficient than computing for a single instance.

However, to do the mini-batch computations, some of the codes need to be modified:

I saw in ScoreCAM, you did score.backward(retain_graph=retain_graph). But according to my understanding, Score-CAM is gradient-free so the backward computation is in fact useless. We need to remove the backward process before we can do computation for mini-batch input.
In ScoreCAM you skip the computation whenever saliency_map.max() == saliency_map.min() . This logic needs to be implemented for mini-batch computation as well.

I will leave my codes here. I have tested for a few instances and did not find any problems. This implementation will spend around 41 second for 32 instances on my server. And computing for one instance will spend around 16 second. So there is improvement for the efficiency problem.

As I am not sure whether the codes are correct, I will leave them below. You can check them when you are free.

def forward(self, input, class_idx=None, retain_graph=False):
    b, c, h, w = input.size()
    # predication on raw input
    logit = self.model_arch(input).cuda()
    
    if class_idx is None:
        predicted_class = logit.max(1)[-1]
        #score = logit[:, logit.max(1)[-1]].squeeze()
    else:
        predicted_class = class_idx.long() # assume the class_idx in tensor form
        #predicted_class = torch.LongTensor([class_idx])
        #score = logit[:, class_idx].squeeze()
    
    logit = F.softmax(logit, dim=1)

    if torch.cuda.is_available():
      predicted_class= predicted_class.cuda()
      #score = score.cuda()
      logit = logit.cuda()

    #self.model_arch.zero_grad()
    #score.backward(retain_graph=retain_graph)

    predicted_class = predicted_class.reshape(-1, 1)

    activations = self.activations['value']
    b, k, u, v = activations.size()
    
    score_saliency_map = torch.zeros((b, 1, h, w))

    if torch.cuda.is_available():
      activations = activations.cuda()
      score_saliency_map = score_saliency_map.cuda()

    with torch.no_grad():
      for i in range(k):

          # upsampling
          saliency_map = torch.unsqueeze(activations[:, i, :, :], 1)
          saliency_map = F.interpolate(saliency_map, size=(h, w), mode='bilinear', align_corners=False)
          
          #if saliency_map.max() == saliency_map.min():
          #  continue
          
          # normalize to 0-1
          saliency_max = saliency_map.view(b, -1).max(dim=1)[0]
          saliency_max = saliency_max.reshape(b, 1, 1, 1).repeat(1, 1, h, w)
          saliency_min = saliency_map.view(b, -1).min(dim=1)[0]
          saliency_min = saliency_min.reshape(b, 1, 1, 1).repeat(1, 1, h, w)
          norm_saliency_map = (saliency_map - saliency_min) / (saliency_max - saliency_min + 1e-7)
          

          # how much increase if keeping the highlighted region
          # predication on masked input
          output = self.model_arch(input * norm_saliency_map)
          output = F.softmax(output, dim=-1)
          #score = output[0][predicted_class]
          score = output[torch.arange(predicted_class.size(0)).unsqueeze(1), predicted_class]
          # Apply the torch.where function, so the score of saliency_map.max() == saliency_map.min() instance is 0.
          score = torch.where(saliency_map.view(b, -1).max(dim=1)[0].reshape(b, 1) > saliency_map.view(b, -1).min(dim=1)[0].reshape(b, 1), 
                                score, torch.zeros_like(score))
        
          score = score.reshape(b, 1, 1, 1).repeat(1, 1, h, w)
          score_saliency_map +=  score * saliency_map
    
    score_saliency_map = F.relu(score_saliency_map)
    score_saliency_map_min = score_saliency_map.view(b, -1).min(dim=1)[0]
    score_saliency_map_min = score_saliency_map_min.reshape(b, 1, 1, 1).repeat(1, 1, h, w)
    score_saliency_map_max = score_saliency_map.view(b, -1).max(dim=1)[0]
    score_saliency_map_max = score_saliency_map_max.reshape(b, 1, 1, 1).repeat(1, 1, h, w)
    #score_saliency_map_min, score_saliency_map_max = score_saliency_map.min(), score_saliency_map.max()

    # count_nonzero is only available after pytorch 1.7.0 
    if len(((score_saliency_map_max - score_saliency_map_min) == 0).nonzero(as_tuple=False)) != 0:
        raise Exception
    
    #if score_saliency_map_min == score_saliency_map_max:
    #    return None

    score_saliency_map = (score_saliency_map - score_saliency_map_min).div(score_saliency_map_max - score_saliency_map_min).data
    return score_saliency_map

I have some issues of your score-cam paper,looking forward to your answer

In section 4.2 of the experiment part of the paper,there is a sentence saying "In this experiment, rather than do point-wise multiplication with the original generated saliency map, we slightly modify by limiting the number of positive pixels in the saliency map."Could you explain how you did this experiment?Compared with grad-cam++, which parts have you modified？

I have some questions about paper and your implementation.

Hello. I'm impressed with the idea of this paper and want to apply it to my project.
But there was an incomprehensible part between your thesis and code implementation.

In my understanding, CIC means difference between target score of original input image and target score multiplied by Mask and input image. Did I get it wrong?

In CIC, doesn't $X_b$ mean the input image? In your implementation, you used the target score for the mask multiplied by the input image to get the CIC. The target score for the original image is not subtracted. I can't understand this part very well. I want to know what I misunderstood.

Thank you!

can we can combine two score cam from different images of same class?

Hi Thank you for your great work.I want to ask that is it possible to combine two CAM of two images for same class?(multiple instance)

Average Increase Drop和其他指标的评估代码

您好，可以提供一下Average Increase Drop和其他指标的评估代码吗T_T
在下感激不尽。！！！！！！
在下感激不尽。！！！！！！
在下感激不尽。！！！！！！

Softmax of scores across channels / Baseline image

I have two questions and I couldn't find lines implementing the following functionality in scorecam.py:

In Algorithm 1 in the paper you compute the score using a baseline image X_b. This is not done here and instead, we only have the first part of the equation.
In Algorithm 1 in the paper you apply softmax channel-wise for the importance scores. This is not done here and instead, we directly multiply with the score.

Am I missing something?

At the end of the model must have two full-connection layers?

Dear author:

First of all, thanks for your great work!
In the paper i find FC layer is out of model.But in the code,i find the two FC-layer is inside the model. When I want to visualize the yolov4 that has pretrained,I find it dont have two FC behind the model.If I add two FC which is not pre-trained.It will random initialization what causes each layer of visualization map is not fixed.I don't know if I understand right or do you have any good suggestions.Looking forward to your reply!

If value of score_saliency_map is all nagative, how can i solve this?

Hi, I'm interested in Score-CAM and I'm using it.

I use Score-CAM in pytorch model vgg16_bn and ImageNet validation image. Most of the values on the activation map have negative values.

So score_saliency_map has all negative values and F.relu return all zeros.

My question is

(1) If score_sliency_map have all negative values, I cannot use Score-CAM?

(2) Is there any solution?

scorecam of batch size more than 1

How are you generating cams when batch_size is more than one? In your implementation there is only one loop which iterates over number of activations but no loop for when batch_size is more than one. Your implementation only returns one activation, I am assuming it does not return multiple cams, when multiple images are given as input

Score-CAM when the output function is sigmoid

Hello, I am trying to visualize the last convolutional layer of my model, which have a sigmoid function as the last layer. In this case, when the masked array is inputted, should we use softmax function to the output of sigmoid function and take dot with activation map (as this calculation results in all one vector)? Otherwise, should we just use the output of sigmoid function, to multiply with activation map?

Sincerely,

Is it this acceleration resonable

I found that some of activations are nearly zeros, I want to remove these calculations.

# short cut ratio
self.top_percent = 0.1
# 10% same quality

# use activation as masks
activations = self.activations

# remove useless activations by sorted mean activation , leave sub-masks 
top_count = int(self.top_percent * activations.shape[1])
channel_scores = activations.mean(axis=[2, 3], keepdim=False).flatten()
top_indice = channel_scores.argsort(0,descending=True)[:top_count]
sub_masks = activations[:, top_indice]  # only these will be computed

does it work for RNN model

does it work for RNN+CNN model whose task is text classification

How to visualize my own model?

Modifying images before evaluation using Average Drop/Increase in Confidence

Hi, I have some questions regarding the Average Drop/Increase in Confidence metric in section 4.2.

In this experiment, rather than do point-wise multiplication with the original generated saliency map, we slightly modify by limiting the number of positive pixels in the saliency map (50% of pixels of the image are muted in our experiment).

Do you select 50% of the pixels randomly for every image?
Is it 50% of the input image, or 50% of the overlayed image produced by the multiplication?

Coefficients of activation maps

Hi, I was looking at the computation of coefficients for the activation maps:

              # how much increase if keeping the highlighted region
              # predication on masked input
              output = self.model_arch(input * norm_saliency_map)
              output = F.softmax(output)
              score = output[0][predicted_class]

              score_saliency_map +=  score * saliency_map

In the paper (and in the comment), you refer to the Increase of confidence, so the score should be computed as the difference between the score of the raw input and the score of the masked input. However, looking at this implementation we understand that the score is just the one predicted on the masked input. Am I missing something?

Thank you
Nicole

ScoreCAM paper Algorithm1 implementation question

Hello,

I have a question about the implementation of the Algorithm1 of the ScoreCAM paper.
The code

              # how much increase if keeping the highlighted region
              # predication on masked input
              output = self.model_arch(input * norm_saliency_map)
              output = F.softmax(output)
              score = output[0][predicted_class]

suggests that the output is simply the masked images run through the original neural net. However, in the paper there is an additional step:
$S^{c} = f^c(M) - f^c(X_b)$.

I am not sure exactly why this step is needed in the first place, but since it is in the paper, I am curious why it does not seem to be in the code?

Thank you.

Object detection and Segmentation module

@haofanwang @haofanw hi thanks for opensourcing the code , can we use ScoreCAM for models like MaskRCNN/Deeplab and FasterRCNN,Yolov2, RetinaNet

Normalization Operation

Hi, haofanwang, the author of score_cam.

I'm a student learning interpretation of CNN. Something confused me about score_cam.
I copy a paragraph as follows in the paper "Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks".

3.2. Normalization on Score
Each forward passing in neural network is independent, the score amplitude of each forward propagation is unpredictable and not fixed. The relative output value (post-softmax) after normalization is more reasonable to measure the relevance than absolute output value (pre-softmax).
Thus, in Score-CAM, we represent weight as post-softmax value, so that the score can be rescaled into a fixed range.
...
Normalization operation equips Score-CAM with good class discrimination ability.

What is exactly the Normalization Operation?
After reading, I have two ideas.

Normalization on logits. Vgg16 in pytorch output logits that can include negative elements. The probs(probabilities), is the output of logits after softmax function. The prob of class c is the Normalization.
This idea comes from the replacement of score function. Score function without norm output a logit, otherwise prob.
Normalization on scores. Scores(CIC) of every channel are stored in tensor scores. Scores act as weights. As written in Algorithm 1, scores are sent to softmax inplace, to ensure the sum of them equals one.

Is both operation applied?
Which of them improved the discrimination power?

BTW, when both were applied , the effect is worse.