pris-cv / mutual-channel-loss Goto Github PK

Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

License: MIT License

Python 100.00%

mutual-channel-loss fine-grained-classification fine-grained fine-grained-recognition fine-grained-visual-categorization

mutual-channel-loss's People

Contributors

Stargazers

Watchers

Forkers

alphashi wuzhan11 sunf71 flyfoxs hongfel3 hello-trouble apllolulu ljm198134 nankaigc leofengxin xinming0411 antoinetang piaofu110 alamja cv-ip qingshucuiman qiaoxie mymuli yawudede lld533 kaviezhang zzalan020 tianyu-zhang0319 peryion shihaozhao-2019 pangxuejiao juingzhou trendingtechnology yingunjun hongbo-sun xuzhikangnba dayang9927 likun1994 uplmgup6 and2797 hungry-98 dl-loss bhrdbn ahsan856jalal azzura25 joseph-ano paper-implementations

mutual-channel-loss's Issues

训练自己的数据集

作者您好，我在使用您的网络训练自己的模型的时候因为数据集只有10个类，cnum还是保持为3不变吗？
我把通道数使用Conv2d调整为30了。目前效果不是很好。是要调整通道数还是要增大cnum？
是否还有其他参数需要调整？
foo = [1] * 2 + [0] * 1
bar = []
for i in range(10):
random.shuffle(foo)
bar += foo
bar = [bar for i in range(nb_batch)]
bar = np.array(bar).astype("float32")
bar = bar.reshape(nb_batch,10*channels,1,1)

what is the function of the "self.features_1 and self.features_2"?

Mutual-Channel-Loss/CUB-200-2011.py

Lines 187 to 188 in 1c1ff52

    
           net.features_1 = torch.nn.DataParallel(net.features_1) 
        
           net.features_2 = torch.nn.DataParallel(net.features_2)

I see you split VGG16 into two parts, however, you don't use their features respectively, I don't know why you do that. Could you please make a explanation?

关于应用在别的网络结构

感谢您优秀的工作，若应用于resnet50上，第252行的class model_bn(nn.Module):需要作何修改呢？

Question about Ldis in code

Wonderful work! But i have some question about Ldis. In your paper, i see specific channels belong to a class(eg. channel1 to channel3 is belong to class 1 ), but how to realize in code? you don't use targets when calculating the mask

大佬，我又来了，这次是关于判别性区域的问题

大佬，我在用预训练的resent50训练mcloss网络之后，得到了最终模型，我用的是你在issues中推荐的复现链接，acc为86.6，可能是batch只有32的原因。
不过我的问题不在于此。如上图所示我在可视化特征图时，发现将近有一半的特征图把注意力放在了背景信息上。我的可视化代码应该没有任何问题。请问，这种问题你有尝试过解决吗？还是说我的训练结果有问题，亦或是代码有问题？

 heat = he.data.cpu().numpy()    # heat为最后一层卷积的输出特征图
 print('heat',heat.shape)
 heat = np.squeeze(heat, 0)
 heat = heat[52*10+9,:]
 # heat = heat[2:3, :]
 heatmap = np.maximum(heat, 0)  # heatmap与0比较，取其大者
 # heatmap = np.mean(heatmap, axis=0)
 heatmap /= np.max(heatmap)
 #plt.matshow(heatmap)
 #plt.show()

 # 用cv2加载原始图像
 img = cv2.imread('/home/jim/data/FGVC/CUB_200_2011/images/053.Western_Grebe/Western_Grebe_0007_36074.jpg')
 heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))  # 将热力图的大小调整为与原始图像相同
 heatmap = np.uint8(255 * heatmap)  # 将热力图转换为RGB格式
 print(type(heatmap))

 heatmap = cv2.applyColorMap(heatmap, cv2.cv2.COLORMAP_HSV )  # 将热力图应用于原始图像
 heat_img = cv2.addWeighted(img, 1, heatmap, 0.5, 0)
 superimposed_img = heatmap * 0.3 + img * 0.7  # 这里的0.4是热力图强度因子
 cv2.imwrite('./self/heat_9.jpg', heat_img)  # 将图像保存到硬盘
 cv2.imshow('heat_11.jpg', heat_img)
 cv2.waitKey()

Can this idea transfer to the recognition task that has a lot of category?

Hi ! I have reproduced your paper and think it's a wonderful idea to make network learning diversity feature. I have a question that there is a small amont of categories in fined-grained recognition(eg. 200 classes), but how to deal with the task that has a big amount of categories(eg. 5000 classes)? And whether it work when i make a channel to represent more than one category？

About multi-gpu and the details of training

I can't run this code when I want to use multi-GPU.
I only change os.environ["CUDA_VISIBLE_DEVICES"] = "2,3"
Is there any question?DO I need change another code?

How you split the dataset

it split by 80:20 or 50:50

Interfacing with GradCAM?

Is it possible to update the documentation or provide some instructions on how we can reproduce GradCAM like visualizations with the code that is provided?

about the cnum

hello ! in CUB-200-2011.py line 198, you set cnum = 3, which is not equal to the categories num, which is not same as your paper said, could you please tell me why?

dataset

How do I process the downloaded data set to get train and test

Tensorflow Implementation usage report

I implement the tf version Mutual Channel Loss function here: https://github.com/darcula1993/Mutual-Channel-Loss.

I use it with my own 3-class fine-grained image classification task (around 90,0000 images) and it works well for now: with default hyperparameter setting, it converge much faster than before.

The model is still training and I will keep update result here.

关于ξ的设置问题

作者你好，在论文中看到您使用VGG16的预训练模型时，为了去匹配最后一个卷积层的输出通道，将ξ设置成两种值。在train from scratch时，所有的通道中ξ设为统一值。
我想问的问题是：

您有没有试图使用预训练模型去fine tune最后一个卷积层以及分类层，例如将卷积层使出通道设置为600，使得所有通道的ξ为3？效果如何呢？
在您的代码中VGG的 _make_layers中有这样一行代码：layers += [nn.AvgPool2d(kernel_size=1, stride=1)],核与步长都为1的平均池化，好像没有任何作用，这里的目的是什么呢？

About Channel-Wise Attention(CWA) in the code.

Great work!But I have little question about CWA.In the origianl paper, I see M_i = diag(Mask_i), where diag is putting a vector on the princial diagonal of a diagonal matrix.But in the code below:

foo = [1] * 2 + [0] *  1
bar = []
for i in range(200):
    random.shuffle(foo)
    bar += foo
bar = [bar for i in range(nb_batch)]

I think bar is not a diagonal matrix. Please point out my problem if I misunderstood the operation here.Thanks a lot.

About validation set

Hi,

I recently did some research on fine-grained classification, and I found that you mentioned you didn't divide a validation set on CUB_200_2011 in your paper. Does it mean that when I do some experiments on CUB_200_2011, I can just take the test set as the valid set and I can take the best validation performance as my final result?

Hope to get help from you, since I can't find descriptions about the validation set in other existing papers.

Thank you.

Kind regards.

train

when i train the model,there's a problem

please me again!

Does mc-loss need more training epoches?

Hi,
I added mc-loss to mnasnet (https://arxiv.org/abs/1807.11626v3) network and train it on a custom fine-grained dataset.
The total epoches is 15, initial LR is 2e-2, and final LR is 1e-5 using cosine LR scheduler.
But the validation accuracy at epoch 15 is 0.62, while the original mnasnet implementation reaches 0.78 val accuracy at epoch 15.

The training epoches in the paper is 300, is this the cause?
(my GPU is slow, so I want to experiment for less epoches to determine whether mc-loss performs good on this dataset)

Thanks for your great work!

Is it possible to use MaxPool1D instead of MaxPool2D

It's a very interesting paper and many thanks !

In the code, function supervisor, in order to perform the CCMP for Ldiv, you

reshaped the branch from (batch, 512, 17*17) to (batch, 512, 17, 17)
used maxpool2d with kernel & stride (1, 3) to find max value on "channel" dimension
convert fro (batch, 512, 17, 17) back to (batch, 512, 17*17)

and the code is:

branch = branch.reshape(branch.size(0),branch.size(1), x.size(2), x.size(2))
branch = my_MaxPool2d(kernel_size=(1,cnum), stride=(1,cnum))(branch)  
branch = branch.reshape(branch.size(0),branch.size(1), branch.size(2) * branch.size(3))

I am wondering if it is possible to use MaxPool1D instead of MaxPool2D, to avoid those two reshape steps, like:

use maxpool1d directly on (batch, 512, 17*17), that is,
a. transpose(2,1)
b. MaxPool1D (3)
c. transpose(1,2)

In order to save some instructions during training :)

Codes for training with ResNet18

Hi authors:
Could you release codes for training with ResNet18 (trained from scratch)? I find it is hard to obtain the reported scores (45.7 with CE loss) training with the hyperparameters provided in your paper. Maybe I miss some critical issues.

About λ(bata_1)

Great work!But I have a little question about λ.
In your code:
loss = ce_loss + args["alpha_1"] * MC_loss[0] + args["beta_1"] * MC_loss[1]
And in your paper:

So,I should set beta_1 as -10 in code?
Please point out my problem if I misunderstood the operation here.Thanks a lot.

Reproducing the results

Hello,

thanks again for your interesting work.
I however, struggle to reproduce your reported results on CUB200 with the Resnet50 backbone (87.3 %).
If I understand Kurumi233/Mutual-Channel-Loss#1 correctly, then you used entirely different hyperparameters than those mentioned in the paper?
I also saw that you deviated (as mentioned here:#21) from the formulas of the paper, which impacts the hyperparameters.
Could you point me to the correct combination of hyperparameters and code / formulas to reproduce your results?
I appreciate your help.

Greetings

Small questions about parameter settings

`def supervisor(x,targets,height,cnum):
mask = Mask(x.size(0), cnum)
branch = x
branch = branch.reshape(branch.size(0),branch.size(1), branch.size(2) * branch.size(3))
branch = F.softmax(branch,2)
branch = branch.reshape(branch.size(0),branch.size(1), x.size(2), x.size(2))
branch = my_MaxPool2d(kernel_size=(1,cnum), stride=(1,cnum))(branch)
branch = branch.reshape(branch.size(0),branch.size(1), branch.size(2) * branch.size(3))
loss_2 = 1.0 - 1.0*torch.mean(torch.sum(branch,2))/cnum # set margin = 3.0

    branch_1 = x * mask 

    branch_1 = my_MaxPool2d(kernel_size=(1,cnum), stride=(1,cnum))(branch_1)  
    branch_1 = nn.AvgPool2d(kernel_size=(height,height))(branch_1)
    branch_1 = branch_1.view(branch_1.size(0), -1)

    loss_1 = criterion(branch_1, targets)
    
    return [loss_1, loss_2] `

what dose cnum mean？ cmun is mean channels number of the feature or class numbers of dataset?
if I want to input data like [batch, channel, W , H] = [?, 32, 1, 40], How should I set this parameter cmun。
`def Mask(nb_batch, channels):

foo = [1] * 2 + [0] *  1
bar = []
for i in range(200):
    random.shuffle(foo)
    bar += foo
bar = [bar for i in range(nb_batch)]
bar = np.array(bar).astype("float32")
bar = bar.reshape(nb_batch,200*channels,1,1)
bar = torch.from_numpy(bar)
bar = bar.cuda()
bar = Variable(bar)
return bar`

Can you explain the meaning of "200" of the above code ?
Looking forward to your answer！
Thank you very much！

Where is the model or how to save the model?

请问有人在二分类任务上尝试过吗？

Question about Ldiv code

Brilliant idea!

I am trying to re-implement the code using tensorflow and I have some questions about Ldiv part:

loss_2 = 1.0 - 1.0*torch.mean(torch.sum(branch,2))/cnum # set margin = 3.0

what does the "margin" mean here?
Since you are using torch.mean, why do divide cnum again?
You are using 1 - Ldiv,which make the loss function become Lmc = Ldis + lambda - lambda * Ldiv
It seems not same as the paper description: Lmc = Ldis - lambda * Ldiv.

Thank you very much for any response!

关于Feature channels和FC的疑惑．

作者你好，最近用您的网络在鸟类和汽车数据集进行了训练．在无预训练模型的条件下，分别达到了67.33(论文65.98)和90.34(论文90.85)的准确率．验证了模型的有效性．但是我今天在看模型结构的时候，有些疑惑．
Feature channels中Ｎ = ξ * C,按照论文中的描述，通过索引，某组中ξ个特征图用于表示某类的特征．但是当Feature channels通过全连接层时，通道已被打乱融合了，这时再进行分类，组的概念就已经消失了，便不具有特征图与类别的针对性了．所以想请问我的理解错误在哪儿？

Soft Channel labels

Hi,

thanks for your very interesting work. I was wondering whether you could release your code for the soft channel labels.
Also, in #15 you mentioned that it is not very good, even though the reported accuracy was increased. Could you explain the drawbacks of that method?