I have checked your code and find in file flopsometer.py</co

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

SACT for accelerating convolution about sact HOT 4 CLOSED

mfigurnov commented on August 17, 2024

SACT for accelerating convolution

from sact.

Comments (4)

mfigurnov commented on August 17, 2024

I explored the achievable speedups for perforated convolutional layer (that only computes the outputs in a subset of positions) in PerforatedCNNs paper (NIPS 2016). Take a look at Table 5 at page 11. You can see that the empirical speedup is about 80% of the theoretical one. In that paper, I used im2col+gemm convolution implementation. I did not implement an efficient perforated convolutional layer for the SACT paper, because matching the speed of the modern CuDNN versions written in GPU assembly is infeasible for me.

from sact.

mfigurnov commented on August 17, 2024

Also, I think that convolutions are compute bound nowadays. E.g., the first GPU convolution kernel written in assembly, maxDNN, claims that

maxDNN reaches 96.3% computational efficiency on typical deep learning network architectures

Interestingly, you can go above 100% efficiency by using Winograd algorithm, see Scott Gray's blog post. I am reasonably sure that one can implement a perforated convolutional layer using both the implicit GEMM and Winograd algorithms, although in the Winograd case the perforation mask would have to be tiled into k times k blocks.

from sact.

warmspringwinds commented on August 17, 2024

Hi, @mfigurnov .
Great work!

I just have a more or less relevant question.

In your paper you said this:

An alternative approach to using the perforated convolutional layer is to tile
the halting scores map. Suppose that we
share the values of the halting scores h l within k × k tiles. For example, we can perform pooling of h l with a kernel size k × k and stride
k and then upscale the results by a factor
of k. Then, all positions in a tile have the same active flag,
and we can apply the residual unit densely to just the active
tiles, reusing the commonly available convolution routines.
k should be sufficiently high to mitigate the overhead of the
additional kernel calls and the overlapping computations of
the first 1×1
convolution. Therefore, tiling is advisable
when the SACT is applied to high-resolution images

I am not quite getting it, can you, please elaborate if you have spare time or maybe you have
a reference to some material where a similar approach is used?

I understand the dilation and perforated convolution part, but this one is not obvious for me.
Is it an approximation and how do you choose k?

Thank you.

from sact.

mfigurnov commented on August 17, 2024

Hi @warmspringwinds, thank you!

The idea is quite simple. In the basic version of SACT you have a H x W x C feature map and produce a H x W active mask. Therefore you need to use perforated convolutions to get speedup. Now, let's say you first produce a h x w active mask, h << H, w << W, and then upsample/tile it. In the corner case when h=w=1, you'll get ACT which is simple to implement (just skip the layer if the only value of the active mask is zero). For larger h and w, you split the feature map into chunks of size (H/h + k) x (W/w + k) x C, where k is an overlap that ensures that there are no "border effects" around the edge of the chunk (maybe you can ignore this overlap and still get okay results). Then, you apply the standard residual unit only to the active chunks and stitch the outputs to get the final output feature map. Maybe someone did something similar for processing of very high resolution image using convnets, where a whole feature map wouldn't fit into GPU memory, but I can't give you a reference out of top of my head.

I did some preliminary experiments with this approach. It seemed to work fine. You can take a look at the code from an older version that I used:

sact/python/resnet_act_utils.py

Line 137 in f3dbaae

if resolution:

. resolution there is the h = w from the text above.

from sact.

SACT for accelerating convolution about sact HOT 4 CLOSED

Comments (4)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent