Giter Site home page Giter Site logo

pris-cv / mutual-channel-loss Goto Github PK

View Code? Open in Web Editor NEW
253.0 7.0 43.0 378 KB

Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

License: MIT License

Python 100.00%
mutual-channel-loss fine-grained-classification fine-grained fine-grained-recognition fine-grained-visual-categorization

mutual-channel-loss's People

Contributors

dongliangchang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

mutual-channel-loss's Issues

训练自己的数据集

作者您好,我在使用您的网络训练自己的模型的时候 因为数据集只有10个类,cnum还是保持为3不变吗?
我把通道数使用Conv2d调整为30了。目前效果不是很好。是要调整通道数还是要增大cnum?
是否还有其他参数需要调整?
foo = [1] * 2 + [0] * 1
bar = []
for i in range(10):
random.shuffle(foo)
bar += foo
bar = [bar for i in range(nb_batch)]
bar = np.array(bar).astype("float32")
bar = bar.reshape(nb_batch,10*channels,1,1)

Question about Ldis in code

Wonderful work! But i have some question about Ldis. In your paper, i see specific channels belong to a class(eg. channel1 to channel3 is belong to class 1 ), but how to realize in code? you don't use targets when calculating the mask

大佬,我又来了,这次是关于判别性区域的问题

Screenshot from 2020-09-14 17-23-27
大佬,我在用预训练的resent50训练mcloss网络之后,得到了最终模型,我用的是你在issues中推荐的复现链接,acc为86.6,可能是batch只有32的原因。
不过我的问题不在于此。如上图所示我在可视化特征图时,发现将近有一半的特征图把注意力放在了背景信息上。我的可视化代码应该没有任何问题。请问,这种问题你有尝试过解决吗?还是说我的训练结果有问题,亦或是代码有问题?

 heat = he.data.cpu().numpy()    # heat为最后一层卷积的输出特征图
 print('heat',heat.shape)
 heat = np.squeeze(heat, 0)
 heat = heat[52*10+9,:]
 # heat = heat[2:3, :]
 heatmap = np.maximum(heat, 0)  # heatmap与0比较,取其大者
 # heatmap = np.mean(heatmap, axis=0)
 heatmap /= np.max(heatmap)
 #plt.matshow(heatmap)
 #plt.show()

 # 用cv2加载原始图像
 img = cv2.imread('/home/jim/data/FGVC/CUB_200_2011/images/053.Western_Grebe/Western_Grebe_0007_36074.jpg')
 heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))  # 将热力图的大小调整为与原始图像相同
 heatmap = np.uint8(255 * heatmap)  # 将热力图转换为RGB格式
 print(type(heatmap))

 heatmap = cv2.applyColorMap(heatmap, cv2.cv2.COLORMAP_HSV )  # 将热力图应用于原始图像
 heat_img = cv2.addWeighted(img, 1, heatmap, 0.5, 0)
 superimposed_img = heatmap * 0.3 + img * 0.7  # 这里的0.4是热力图强度因子
 cv2.imwrite('./self/heat_9.jpg', heat_img)  # 将图像保存到硬盘
 cv2.imshow('heat_11.jpg', heat_img)
 cv2.waitKey()

Can this idea transfer to the recognition task that has a lot of category?

Hi ! I have reproduced your paper and think it's a wonderful idea to make network learning diversity feature. I have a question that there is a small amont of categories in fined-grained recognition(eg. 200 classes), but how to deal with the task that has a big amount of categories(eg. 5000 classes)? And whether it work when i make a channel to represent more than one category?

Interfacing with GradCAM?

Is it possible to update the documentation or provide some instructions on how we can reproduce GradCAM like visualizations with the code that is provided?

about the cnum

hello ! in CUB-200-2011.py line 198, you set cnum = 3, which is not equal to the categories num, which is not same as your paper said, could you please tell me why?

dataset

How do I process the downloaded data set to get train and test

关于ξ的设置问题

作者你好,在论文中看到您使用VGG16的预训练模型时,为了去匹配最后一个卷积层的输出通道,将ξ设置成两种值。在train from scratch时,所有的通道中ξ设为统一值。
我想问的问题是:

  1. 您有没有试图使用预训练模型去fine tune最后一个卷积层以及分类层,例如将卷积层使出通道设置为600,使得所有通道的ξ为3?效果如何呢?
  2. 在您的代码中VGG的 _make_layers中 有这样一行代码:layers += [nn.AvgPool2d(kernel_size=1, stride=1)],核与步长都为1的平均池化,好像没有任何作用,这里的目的是什么呢?

About Channel-Wise Attention(CWA) in the code.

Great work!But I have little question about CWA.In the origianl paper, I see M_i = diag(Mask_i), where diag is putting a vector on the princial diagonal of a diagonal matrix.But in the code below:

foo = [1] * 2 + [0] *  1
bar = []
for i in range(200):
    random.shuffle(foo)
    bar += foo
bar = [bar for i in range(nb_batch)]

I think bar is not a diagonal matrix. Please point out my problem if I misunderstood the operation here.Thanks a lot.

About validation set

Hi,

I recently did some research on fine-grained classification, and I found that you mentioned you didn't divide a validation set on CUB_200_2011 in your paper. Does it mean that when I do some experiments on CUB_200_2011, I can just take the test set as the valid set and I can take the best validation performance as my final result?

Hope to get help from you, since I can't find descriptions about the validation set in other existing papers.

Thank you.

Kind regards.

train

when i train the model,there's a problem
image
please me again!

Does mc-loss need more training epoches?

Hi,
I added mc-loss to mnasnet (https://arxiv.org/abs/1807.11626v3) network and train it on a custom fine-grained dataset.
The total epoches is 15, initial LR is 2e-2, and final LR is 1e-5 using cosine LR scheduler.
But the validation accuracy at epoch 15 is 0.62, while the original mnasnet implementation reaches 0.78 val accuracy at epoch 15.

The training epoches in the paper is 300, is this the cause?
(my GPU is slow, so I want to experiment for less epoches to determine whether mc-loss performs good on this dataset)

Thanks for your great work!

Is it possible to use MaxPool1D instead of MaxPool2D

It's a very interesting paper and many thanks !

In the code, function supervisor, in order to perform the CCMP for Ldiv, you

  1. reshaped the branch from (batch, 512, 17*17) to (batch, 512, 17, 17)
  2. used maxpool2d with kernel & stride (1, 3) to find max value on "channel" dimension
  3. convert fro (batch, 512, 17, 17) back to (batch, 512, 17*17)

and the code is:

branch = branch.reshape(branch.size(0),branch.size(1), x.size(2), x.size(2))
branch = my_MaxPool2d(kernel_size=(1,cnum), stride=(1,cnum))(branch)  
branch = branch.reshape(branch.size(0),branch.size(1), branch.size(2) * branch.size(3))

I am wondering if it is possible to use MaxPool1D instead of MaxPool2D, to avoid those two reshape steps, like:

  1. use maxpool1d directly on (batch, 512, 17*17), that is,
    a. transpose(2,1)
    b. MaxPool1D (3)
    c. transpose(1,2)

In order to save some instructions during training :)

Codes for training with ResNet18

Hi authors:
Could you release codes for training with ResNet18 (trained from scratch)? I find it is hard to obtain the reported scores (45.7 with CE loss) training with the hyperparameters provided in your paper. Maybe I miss some critical issues.

About λ(bata_1)

Great work!But I have a little question about λ.
In your code:
loss = ce_loss + args["alpha_1"] * MC_loss[0] + args["beta_1"] * MC_loss[1]
And in your paper:
image

So,I should set beta_1 as -10 in code?
Please point out my problem if I misunderstood the operation here.Thanks a lot.

Reproducing the results

Hello,

thanks again for your interesting work.
I however, struggle to reproduce your reported results on CUB200 with the Resnet50 backbone (87.3 %).
If I understand Kurumi233/Mutual-Channel-Loss#1 correctly, then you used entirely different hyperparameters than those mentioned in the paper?
I also saw that you deviated (as mentioned here:#21) from the formulas of the paper, which impacts the hyperparameters.
Could you point me to the correct combination of hyperparameters and code / formulas to reproduce your results?
I appreciate your help.

Greetings

Small questions about parameter settings

`def supervisor(x,targets,height,cnum):
mask = Mask(x.size(0), cnum)
branch = x
branch = branch.reshape(branch.size(0),branch.size(1), branch.size(2) * branch.size(3))
branch = F.softmax(branch,2)
branch = branch.reshape(branch.size(0),branch.size(1), x.size(2), x.size(2))
branch = my_MaxPool2d(kernel_size=(1,cnum), stride=(1,cnum))(branch)
branch = branch.reshape(branch.size(0),branch.size(1), branch.size(2) * branch.size(3))
loss_2 = 1.0 - 1.0*torch.mean(torch.sum(branch,2))/cnum # set margin = 3.0

    branch_1 = x * mask 

    branch_1 = my_MaxPool2d(kernel_size=(1,cnum), stride=(1,cnum))(branch_1)  
    branch_1 = nn.AvgPool2d(kernel_size=(height,height))(branch_1)
    branch_1 = branch_1.view(branch_1.size(0), -1)

    loss_1 = criterion(branch_1, targets)
    
    return [loss_1, loss_2] `

what dose cnum mean? cmun is mean channels number of the feature or class numbers of dataset?
if I want to input data like [batch, channel, W , H] = [?, 32, 1, 40], How should I set this parameter cmun。
`def Mask(nb_batch, channels):

foo = [1] * 2 + [0] *  1
bar = []
for i in range(200):
    random.shuffle(foo)
    bar += foo
bar = [bar for i in range(nb_batch)]
bar = np.array(bar).astype("float32")
bar = bar.reshape(nb_batch,200*channels,1,1)
bar = torch.from_numpy(bar)
bar = bar.cuda()
bar = Variable(bar)
return bar`

Can you explain the meaning of "200" of the above code ?
Looking forward to your answer!
Thank you very much!

Question about Ldiv code

Brilliant idea!

I am trying to re-implement the code using tensorflow and I have some questions about Ldiv part:

loss_2 = 1.0 - 1.0*torch.mean(torch.sum(branch,2))/cnum # set margin = 3.0

  1. what does the "margin" mean here?
  2. Since you are using torch.mean, why do divide cnum again?
  3. You are using 1 - Ldiv,which make the loss function become Lmc = Ldis + lambda - lambda * Ldiv
    It seems not same as the paper description: Lmc = Ldis - lambda * Ldiv.

Thank you very much for any response!

关于Feature channels和FC的疑惑.

作者你好,最近用您的网络在鸟类和汽车数据集进行了训练.在无预训练模型的条件下,分别达到了67.33(论文65.98)和90.34(论文90.85)的准确率.验证了模型的有效性.但是我今天在看模型结构的时候,有些疑惑.
Feature channels中N = ξ * C,按照论文中的描述,通过索引,某组中ξ个特征图用于表示某类的特征.但是当Feature channels通过全连接层时,通道已被打乱融合了,这时再进行分类,组的概念就已经消失了,便不具有特征图与类别的针对性了.所以想请问我的理解错误在哪儿?

Soft Channel labels

Hi,

thanks for your very interesting work. I was wondering whether you could release your code for the soft channel labels.
Also, in #15 you mentioned that it is not very good, even though the reported accuracy was increased. Could you explain the drawbacks of that method?

Greetings

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.