yimiandai / open-aff Goto Github PK
View Code? Open in Web Editor NEWcode and trained models for "Attentional Feature Fusion"
code and trained models for "Attentional Feature Fusion"
xo = 2 * x * wei + 2 * residual * (1 - wei)
Hi I was trying modify the class AFF() code to support new version of keras, but stuggling with this error
The modified AFF class
`class AFF(tf.keras.layers.Layer):
'''
多特征融合 AFF
'''
def __init__(self, channels=64, r=4):
super().__init__()
inter_channels = int(channels // r)
self.local_att = tf.keras.Sequential(
Conv2D(filters=64, kernel_size=(3,3), strides=1, padding='same'),
tf.keras.layers.BatchNormalization(inter_channels),
tf.keras.layers.ReLU(),
Conv2D(filters=64, kernel_size=(3,3), strides=1, padding='same'),
tf.keras.layers.BatchNormalization(channels),
)
self.global_att = tf.keras.Sequential(
tf.keras.layers.AveragePooling2D(1),
Conv2D(filters=64, kernel_size=(3,3), strides=1, padding='same'),
tf.keras.layers.BatchNormalization(inter_channels),
tf.keras.layers.ReLU(),
Conv2D(filters=64, kernel_size=(3,3), strides=1, padding='same'),
tf.keras.layers.BatchNormalization(channels),
)
self.sigmoid = nn.Sigmoid()
def forward(self, x, residual):
xa = x + residual
xl = self.local_att(xa)
xg = self.global_att(xa)
xlg = xl + xg
wei = self.sigmoid(xlg)
xo = 2 * x * wei + 2 * residual * (1 - wei)
return xo`
The create model function
tf.keras.backend.clear_session()
input = Input(shape=(256,256,3), name="input_layer")
print("Input =",input.shape)
conv_block = Convolutional_block()(input)
print("Conv block =",conv_block.shape)
ca_block = Channel_attention()(conv_block)
sa_block = SpatialGate()(conv_block)
# AFF block instead of concatenate
ca_block = AFF()(ca_block)
model = Model(inputs=[input], outputs=[ca_block])
return model
model = create_model()
model.summary()```
Input is an image of size 256,256,3
NLP 的特征大都是维度相同的,使用这种特征融合方案是否会效果更好
1.您的论文中关于在FPN中应用AFF代码是哪个部分,我没找到
2.Global + Local方式您是在哪个分支上增加globalpooling的,还是两个分支任意哪个都可以?
麻烦您能帮我解答一下,谢谢
您好,我在将您提供的模块添加进网络后,出现了如下报错,请问这要怎么解决呢?
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
作者你好,我看你的AFF特征融合好像只是针对2D检测的,请问怎样可以应用到3D检测中?
作者您好,想使用你上传的模型测试,能上传相应的测试代码吗?
As the title. Thanks!
作者您好,请问该注意力机制有预训练模型resnet18的吗?另外,该注意力机制能否更好的捕捉低层次的细节信息呢?就像传统的图像方向的算子那样?最后,该注意力机制能否实现即插即用呢?在不增加网络参数的情况下。谢谢,期待您的回答。
作者您好,非常感谢您的工作!就是,那个我有一个很简单的问题,就是想问问您那个网络可视化咋做的呢?
是提取最后的特征图,然后按照数字大小,绘制彩色云图然后调整透明度,覆盖到原图上面吗。
还是别的思路呢。谢谢您
Hi, thanks for your outstanding work. i am trying to add your AFF module with my model decoder and unfortunately getting this error. I don't know which value should i give to residual ? or how solve this issue.
Hint....
self.AFFBlock4 = AFF(512)
self.AFFBlock3 = AFF(256)
self.AFFBlock2 = AFF(128)
self.AFFBlock1 = AFF(64)
add with decoder....
d4 = self.decoder4(e4) + e3
d4 = self.AFFBlock4(d4)
d3 = self.decoder3(d4) + e2
d3 = self.AFFBlock3(d3)
d2 = e1 + F.upsample(self.decoder2(d3), (e1.size(2), e1.size(3)), mode='bilinear')
#d2 = self.decoder2(d3) + e1
d2 = self.AFFBlock2(d2)
d1 = self.decoder1(d2) + x
d1 = self.AFFBlock1(d1)
你好,最近我在使用pytorch进行复现,但是遇到一个问题。
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 64, 1, 1])
原因是因为经过GlobalAvgPooling后的特征图尺度都是C11,这个1*1的特征图在BN就会报这个错误
您可以试试以下的代码,就可以复现我的问题了
import torch
a = torch.randn(1, 64, 1, 1)
bn = torch.nn.BatchNorm2d(64)
bn(a)
I am interested in AFF-ResNets, and I evaluated the performance of AFF-ResNets with both this implementation and my own implementation.
Through the experiments, I found a bug.
In train_cifar.py
, CIFAR-100 dataset is loaded as follows:
train_data = gluon.data.DataLoader(
gluon.data.vision.CIFAR100(train=True).transform_first(transform_train),
batch_size=batch_size, shuffle=True, last_batch='discard', num_workers=num_workers)
but, according to API reference of gluon, CIFAR-100 with default settings has only 20 classes.
Option fine_label=True
is required to compare the performance with other models on CIFAR-100 classification task.
为什么要用2的方式,是为了防止数值太小梯度消失吗,,但为什么iAFF里面又不用2了呢
Hi @YimianDai , thanks for sharing your work and code.
Just want to quick check the reason why you multiply 2 at the end of module block.
Does it help you train the model or is it a normalization parameter?
What does z^'
in the Eq. (2) mean?
Hi Yimian, I came here from your WACV 2021 presentation. This work looks pretty impressive.
As we discussed during your presentation, could you share the inference time data for different configurations (ResNet-50, -101, ResNeXt-50, etc.) as well as the numbers for baseline models and the hardware details? Thank you!
Hi,
First, I think your paper is very interesting, excellent work!
I was wondering if you have training from scratch report avialable ?
All the aviable reports are based on pretrained models with already high-accuracy (specifically, I am referring to CIFAR100 expierement).
你好,我不是很明白你论文所说的,初始特征集成,就是为什么要用两次注意力呢
您好,我想请问一下AFF或者说iAFF如何应用于FPN呢,以AFF为例,X、Y分别为不同stage下的特征层,文章中说Y是高语义的,那么以resnet中下采样为32的层(Y)和下采样为16的层(X)来说,如何能使X+Y呢,这两个首先特征图的尺度不同,其次维度也不同。
很抱歉打扰您,期待您的回复,谢谢!
您好,我想尝试把MS-CAM模块加入到其它网络模型中,请问加到backbone的最后,效果会怎么样呢?
from __future__ import division
import os
from mxnet.gluon.block import HybridBlock
from mxnet.gluon import nn
from mxnet.gluon.nn import BatchNorm
from gluoncv.model_zoo.fcn import _FCNHead
from mxnet import nd
from .askc import LCNASKCFuse
from model.atac.backbone import ATACBlockV1, conv1ATAC, DynamicCell
from model.atac.convolution import LearnedCell, ChaDyReFCell, SeqDyReFCell, SK_ChaDyReFCell, \
SK_1x1DepthDyReFCell, SK_MSSpaDyReFCell, SK_SpaDyReFCell, Direct_AddCell, SKCell, \
SK_SeqDyReFCell, Sub_MSSpaDyReFCell, SK_MSSeqDyReFCell, iAAMSSpaDyReFCell
from model.atac.convolution import \
LearnedConv, ChaDyReFConv, SeqDyReFConv, SK_ChaDyReFConv, \
SK_1x1DepthDyReFConv, SK_MSSpaDyReFConv, SK_SpaDyReFConv, Direct_AddConv, SKConv, \
SK_SeqDyReFConv
# , SK_MSSeqDyReFConv
from .activation import xUnit, SpaATAC, ChaATAC, SeqATAC, MSSeqATAC, MSSeqATACAdd, \
MSSeqATACConcat, MSSeqAttentionMap, xUnitAttentionMap
from model.atac.fusion import Direct_AddFuse_Reduce, SK_MSSpaFuse, SKFuse_Reduce, LocalChaFuse, \
GlobalChaFuse, \
LocalGlobalChaFuse_Reduce, LocalLocalChaFuse_Reduce, GlobalGlobalChaFuse_Reduce, \
AYforXplusYChaFuse_Reduce, XplusAYforYChaFuse_Reduce, IASKCChaFuse_Reduce,\
GAUChaFuse_Reduce, SpaFuse_Reduce, ConcatFuse_Reduce, AXYforXplusYChaFuse_Reduce,\
BiLocalChaFuse_Reduce, BiGlobalChaFuse_Reduce, LocalGAUChaFuse_Reduce, GlobalSpaFuse,\
AsymBiLocalChaFuse_Reduce, BiSpaChaFuse_Reduce, AsymBiSpaChaFuse_Reduce, LocalSpaFuse, \
BiGlobalLocalChaFuse_Reduce
# from gluoncv.model_zoo.resnetv1b import BasicBlockV1b
from gluoncv.model_zoo.cifarresnet import CIFARBasicBlockV1
class ASKCResNetFPN(HybridBlock):
def __init__(self, layers, channels, fuse_mode, act_dilation, classes=1, tinyFlag=False,
norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
super(ASKCResNetFPN, self).__init__(**kwargs)
self.layer_num = len(layers)
self.tinyFlag = tinyFlag
with self.name_scope():
stem_width = int(channels[0])
self.stem = nn.HybridSequential(prefix='stem')
self.stem.add(norm_layer(scale=False, center=False,
**({} if norm_kwargs is None else norm_kwargs)))
if tinyFlag:
self.stem.add(nn.Conv2D(channels=stem_width*2, kernel_size=3, strides=1,
padding=1, use_bias=False))
self.stem.add(norm_layer(in_channels=stem_width*2))
self.stem.add(nn.Activation('relu'))
else:
self.stem.add(nn.Conv2D(channels=stem_width, kernel_size=3, strides=2,
padding=1, use_bias=False))
self.stem.add(norm_layer(in_channels=stem_width))
self.stem.add(nn.Activation('relu'))
self.stem.add(nn.Conv2D(channels=stem_width, kernel_size=3, strides=1,
padding=1, use_bias=False))
self.stem.add(norm_layer(in_channels=stem_width))
self.stem.add(nn.Activation('relu'))
self.stem.add(nn.Conv2D(channels=stem_width*2, kernel_size=3, strides=1,
padding=1, use_bias=False))
self.stem.add(norm_layer(in_channels=stem_width*2))
self.stem.add(nn.Activation('relu'))
self.stem.add(nn.MaxPool2D(pool_size=3, strides=2, padding=1))
# self.head1 = _FCNHead(in_channels=channels[1], channels=classes)
# self.head2 = _FCNHead(in_channels=channels[2], channels=classes)
# self.head3 = _FCNHead(in_channels=channels[3], channels=classes)
# self.head4 = _FCNHead(in_channels=channels[4], channels=classes)
self.head = _FCNHead(in_channels=channels[1], channels=classes)
self.layer1 = self._make_layer(block=CIFARBasicBlockV1, layers=layers[0],
channels=channels[1], stride=1, stage_index=1,
in_channels=channels[1])
self.layer2 = self._make_layer(block=CIFARBasicBlockV1, layers=layers[1],
channels=channels[2], stride=2, stage_index=2,
in_channels=channels[1])
self.layer3 = self._make_layer(block=CIFARBasicBlockV1, layers=layers[2],
channels=channels[3], stride=2, stage_index=3,
in_channels=channels[2])
if self.layer_num == 4:
self.layer4 = self._make_layer(block=CIFARBasicBlockV1, layers=layers[3],
channels=channels[4], stride=2, stage_index=4,
in_channels=channels[3])
if self.layer_num == 4:
self.fuse34 = self._fuse_layer(fuse_mode, channels=channels[3],
act_dilation=act_dilation) # channels[4]
self.fuse23 = self._fuse_layer(fuse_mode, channels=channels[2],
act_dilation=act_dilation) # 64
self.fuse12 = self._fuse_layer(fuse_mode, channels=channels[1],
act_dilation=act_dilation) # 32
# if fuse_order == 'reverse':
# self.fuse12 = self._fuse_layer(fuse_mode, channels=channels[2]) # channels[2]
# self.fuse23 = self._fuse_layer(fuse_mode, channels=channels[3]) # channels[3]
# self.fuse34 = self._fuse_layer(fuse_mode, channels=channels[4]) # channels[4]
# elif fuse_order == 'normal':
# self.fuse34 = self._fuse_layer(fuse_mode, channels=channels[4]) # channels[4]
# self.fuse23 = self._fuse_layer(fuse_mode, channels=channels[4]) # channels[4]
# self.fuse12 = self._fuse_layer(fuse_mode, channels=channels[4]) # channels[4]
def _make_layer(self, block, layers, channels, stride, stage_index, in_channels=0,
norm_layer=BatchNorm, norm_kwargs=None):
layer = nn.HybridSequential(prefix='stage%d_'%stage_index)
with layer.name_scope():
downsample = (channels != in_channels) or (stride != 1)
layer.add(block(channels, stride, downsample, in_channels=in_channels,
prefix='', norm_layer=norm_layer, norm_kwargs=norm_kwargs))
for _ in range(layers-1):
layer.add(block(channels, 1, False, in_channels=channels, prefix='',
norm_layer=norm_layer, norm_kwargs=norm_kwargs))
return layer
def _fuse_layer(self, fuse_mode, channels, act_dilation):
if fuse_mode == 'Direct_Add':
fuse_layer = Direct_AddFuse_Reduce(channels=channels)
elif fuse_mode == 'Concat':
fuse_layer = ConcatFuse_Reduce(channels=channels)
elif fuse_mode == 'SK':
fuse_layer = SKFuse_Reduce(channels=channels)
# elif fuse_mode == 'LocalCha':
# fuse_layer = LocalChaFuse(channels=channels)
# elif fuse_mode == 'GlobalCha':
# fuse_layer = GlobalChaFuse(channels=channels)
elif fuse_mode == 'LocalGlobalCha':
fuse_layer = LocalGlobalChaFuse_Reduce(channels=channels)
elif fuse_mode == 'LocalLocalCha':
fuse_layer = LocalLocalChaFuse_Reduce(channels=channels)
elif fuse_mode == 'GlobalGlobalCha':
fuse_layer = GlobalGlobalChaFuse_Reduce(channels=channels)
elif fuse_mode == 'IASKCChaFuse':
fuse_layer = IASKCChaFuse_Reduce(channels=channels)
elif fuse_mode == 'AYforXplusY':
fuse_layer = AYforXplusYChaFuse_Reduce(channels=channels)
elif fuse_mode == 'AXYforXplusY':
fuse_layer = AXYforXplusYChaFuse_Reduce(channels=channels)
elif fuse_mode == 'XplusAYforY':
fuse_layer = XplusAYforYChaFuse_Reduce(channels=channels)
elif fuse_mode == 'GAU':
fuse_layer = GAUChaFuse_Reduce(channels=channels)
elif fuse_mode == 'LocalGAU':
fuse_layer = LocalGAUChaFuse_Reduce(channels=channels)
elif fuse_mode == 'SpaFuse':
fuse_layer = SpaFuse_Reduce(channels=channels, act_dialtion=act_dilation)
elif fuse_mode == 'BiLocalCha':
fuse_layer = BiLocalChaFuse_Reduce(channels=channels)
elif fuse_mode == 'BiGlobalLocalCha':
fuse_layer = BiGlobalLocalChaFuse_Reduce(channels=channels)
elif fuse_mode == 'AsymBiLocalCha':
fuse_layer = AsymBiLocalChaFuse_Reduce(channels=channels)
elif fuse_mode == 'BiGlobalCha':
fuse_layer = BiGlobalChaFuse_Reduce(channels=channels)
elif fuse_mode == 'BiSpaCha':
fuse_layer = BiSpaChaFuse_Reduce(channels=channels)
elif fuse_mode == 'AsymBiSpaCha':
fuse_layer = AsymBiSpaChaFuse_Reduce(channels=channels)
# elif fuse_mode == 'LocalSpa':
# fuse_layer = LocalSpaFuse(channels=channels, act_dilation=act_dilation)
# elif fuse_mode == 'GlobalSpa':
# fuse_layer = GlobalSpaFuse(channels=channels, act_dilation=act_dilation)
# elif fuse_mode == 'SK_MSSpa':
# # fuse_layer.add(SK_MSSpaFuse(channels=channels, act_dilation=act_dilation))
# fuse_layer = SK_MSSpaFuse(channels=channels, act_dilation=act_dilation)
else:
raise ValueError('Unknown fuse_mode')
return fuse_layer
def hybrid_forward(self, F, x):
_, _, hei, wid = x.shape
x = self.stem(x) # down 4, 32
c1 = self.layer1(x) # down 4, 32
c2 = self.layer2(c1) # down 8, 64
out = self.layer3(c2) # down 16, 128
if self.layer_num == 4:
c4 = self.layer4(out) # down 32
if self.tinyFlag:
c4 = F.contrib.BilinearResize2D(c4, height=hei//4, width=wid//4) # down 4
else:
c4 = F.contrib.BilinearResize2D(c4, height=hei//16, width=wid//16) # down 16
out = self.fuse34(c4, out)
if self.tinyFlag:
out = F.contrib.BilinearResize2D(out, height=hei//2, width=wid//2) # down 2, 128
else:
out = F.contrib.BilinearResize2D(out, height=hei//8, width=wid//8) # down 8, 128
out = self.fuse23(out, c2)
if self.tinyFlag:
out = F.contrib.BilinearResize2D(out, height=hei, width=wid) # down 1
else:
out = F.contrib.BilinearResize2D(out, height=hei//4, width=wid//4) # down 8
out = self.fuse12(out, c1)
pred = self.head(out)
if self.tinyFlag:
out = pred
else:
out = F.contrib.BilinearResize2D(pred, height=hei, width=wid) # down 4
######### reverse order ##########
# up_c2 = F.contrib.BilinearResize2D(c2, height=hei//4, width=wid//4) # down 4
# fuse2 = self.fuse12(up_c2, c1) # down 4, channels[2]
#
# up_c3 = F.contrib.BilinearResize2D(c3, height=hei//4, width=wid//4) # down 4
# fuse3 = self.fuse23(up_c3, fuse2) # down 4, channels[3]
#
# up_c4 = F.contrib.BilinearResize2D(c4, height=hei//4, width=wid//4) # down 4
# fuse4 = self.fuse34(up_c4, fuse3) # down 4, channels[4]
#
######### normal order ##########
# out = F.contrib.BilinearResize2D(c4, height=hei//16, width=wid//16)
# out = self.fuse34(out, c3)
# out = F.contrib.BilinearResize2D(out, height=hei//8, width=wid//8)
# out = self.fuse23(out, c2)
# out = F.contrib.BilinearResize2D(out, height=hei//4, width=wid//4)
# out = self.fuse12(out, c1)
# out = self.head(out)
# out = F.contrib.BilinearResize2D(out, height=hei, width=wid)
return out
def evaluate(self, x):
"""evaluating network with inputs and targets"""
return self.forward(x)
作者你好,我在实验中想使用aff应用于解码结构,但是在使用的时候发现在最深层的尺度时aff是有效的,在其他尺度或者在所有尺度添加aff的时候反而效果变差了,想请教一下这是什么问题,需要对哪些部分进行调整吗?
请问当我应用iaff模块时候存在以下情况:1、F的batch_size为1;2、全局平均池化将特征图的面积变为1*1。这导致了经过BN层的时候会报错ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1]),请问这个有处理方法吗
forward函数中, xg2 = self.global_att(xi)。这样self.global_att2似乎没有用上,请问这里是不是不小心写错了?
作者您好,您可以详细解释一下MS-CAM、AFF对检测中localization和small object的影响原因吗?
Or will you consider publishing the pytorch version of the code in the future?
open-aff/aff_pytorch/aff_net/fusion.py
Line 74 in 0bcfd8a
I found that your module "self.global_att2" did not use in your iAFF.
I wonder if "xg2 = self.global_att(xi)" should be modified as "xg2 = self.global_att2(xi)"
Thanks for your contributions, Dai
作者,您好!拜读了您的文章和代码,收益匪浅,收获颇丰,谢谢您。对于你所提供的resnet50预训练模型是mxnet中的.params作为模型保存后缀,如何将其转化为pytorch中进行使用呢?或者作者您可以提供一下pytorch版本的预训练模型。期待您的解答,万分感谢!
我采用你的方法进行特征融合,
nn.AdaptiveAvgPool2d(1),
nn.Conv2d(channels, inter_channels, kernel_size=1, stride=1, padding=0),
nn.BatchNorm2d(inter_channels),
nn.ReLU(inplace=True),
nn.Conv2d(inter_channels, channels, kernel_size=1, stride=1, padding=0),
nn.BatchNorm2d(channels),
第一行将自适应池化的输出按照原文设为1时会报错ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1]),请问有解决办法吗。
thanks for your work, I want to there any way to extract parameters and save it to pytorch pth?
第一:
为什么多尺度信息也存在于通道当中?
第二:
为什么求得的attention map H ,其中H<i,j,C> 表示的含义?表示的意义是什么意思?表示通道之间的依赖性?
回复:
谢谢您的来信。
更准确的说法应该是,在论文中,我们认为 通道注意力也应该是有 尺度 这一概念/属性的,而目前 SENet / SKNet 中所用的只是极端情况,最大的尺度 Global Scale 时候的 Channel Attention,而 AFF 论文里用的另一个分支,则是另一种情况,就是最小的尺度最最 Local 时候的 Channel Attention。AFF 用了最简单的多尺度,也就是 Local + Global,来聚合多尺度信息。
特征图的大小是 C x H x W, 因为我们用了 Local Channel Attention,所以计算出来的 Attention Map 的大小也是 C x H x W,实现 Local / Element-wise 的 Refinement。与之相对的是 SENet,一个 Channel 的权重是施加给整个 H x W 的,大小为 H x W 的 feature map 上每个元素所接收的权重都是一样的。通道之间的依赖性还是照常,SENet 用 Fully Connected 来抓取,那么 AFF 其实也一样, 就是用 Point-wise Conv 来抓取,在 Global 分支中,Point-wise Conv 跟 Fully Connected 是一模一样的。
总之,论文的假设就是 通道注意力也应该是有 尺度 的,控制尺度的变量就是 Pooling 的 Size。这个其实跟 SIFT 之类的经典方法通过 控制不同大小的高斯滤波器 来实现不同尺度空间的想法是一样的,只不过 AFF 里面用的是 AVGPooling。一旦接受了 Channel Attention 也应该有尺度这个概念,就可以了。
最后,有个不情之请,我们最好能在代码的 Issues 里面一起讨论 https://github.com/YimianDai/open-aff/issues ,这样的好处是大家都能看到。
祝您身体健康,工作顺利~
作者您好,非常感谢您的工作!就是,那个我有一个很简单的问题,就是想问问您那个网络可视化咋做的呢?
是提取最后的特征图,然后按照数字大小,绘制彩色云图然后调整透明度,覆盖到原图上面吗。
还是别的思路呢。
作者您好,在拜读了您的文章以后,想要请教一下,无论是AFF还是IAFF似乎都是两种图的融合,那么如果要进行三种图的融合,该如何进行更改呢?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.