Giter Site home page Giter Site logo

Comments (4)

mfigurnov avatar mfigurnov commented on August 17, 2024

I explored the achievable speedups for perforated convolutional layer (that only computes the outputs in a subset of positions) in PerforatedCNNs paper (NIPS 2016). Take a look at Table 5 at page 11. You can see that the empirical speedup is about 80% of the theoretical one. In that paper, I used im2col+gemm convolution implementation. I did not implement an efficient perforated convolutional layer for the SACT paper, because matching the speed of the modern CuDNN versions written in GPU assembly is infeasible for me.

from sact.

mfigurnov avatar mfigurnov commented on August 17, 2024

Also, I think that convolutions are compute bound nowadays. E.g., the first GPU convolution kernel written in assembly, maxDNN, claims that

maxDNN reaches 96.3% computational efficiency on typical deep learning network architectures

Interestingly, you can go above 100% efficiency by using Winograd algorithm, see Scott Gray's blog post. I am reasonably sure that one can implement a perforated convolutional layer using both the implicit GEMM and Winograd algorithms, although in the Winograd case the perforation mask would have to be tiled into k times k blocks.

from sact.

warmspringwinds avatar warmspringwinds commented on August 17, 2024

Hi, @mfigurnov .
Great work!

I just have a more or less relevant question.

In your paper you said this:

An alternative approach to using the perforated convolutional layer is to tile
the halting scores map. Suppose that we
share the values of the halting scores h l within k × k tiles. For example, we can perform pooling of h l with a kernel size k × k and stride
k and then upscale the results by a factor
of k. Then, all positions in a tile have the same active flag,
and we can apply the residual unit densely to just the active
tiles, reusing the commonly available convolution routines.
k should be sufficiently high to mitigate the overhead of the
additional kernel calls and the overlapping computations of
the first 1×1
convolution. Therefore, tiling is advisable
when the SACT is applied to high-resolution images

I am not quite getting it, can you, please elaborate if you have spare time or maybe you have
a reference to some material where a similar approach is used?

I understand the dilation and perforated convolution part, but this one is not obvious for me.
Is it an approximation and how do you choose k?

Thank you.

from sact.

mfigurnov avatar mfigurnov commented on August 17, 2024

Hi @warmspringwinds, thank you!

The idea is quite simple. In the basic version of SACT you have a H x W x C feature map and produce a H x W active mask. Therefore you need to use perforated convolutions to get speedup. Now, let's say you first produce a h x w active mask, h << H, w << W, and then upsample/tile it. In the corner case when h=w=1, you'll get ACT which is simple to implement (just skip the layer if the only value of the active mask is zero). For larger h and w, you split the feature map into chunks of size (H/h + k) x (W/w + k) x C, where k is an overlap that ensures that there are no "border effects" around the edge of the chunk (maybe you can ignore this overlap and still get okay results). Then, you apply the standard residual unit only to the active chunks and stitch the outputs to get the final output feature map. Maybe someone did something similar for processing of very high resolution image using convnets, where a whole feature map wouldn't fit into GPU memory, but I can't give you a reference out of top of my head.

I did some preliminary experiments with this approach. It seemed to work fine. You can take a look at the code from an older version that I used:

if resolution:
. resolution there is the h = w from the text above.

from sact.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.