Giter Site home page Giter Site logo

Comments (5)

Gabri95 avatar Gabri95 commented on May 28, 2024 2

Hi @kinalmehta

Thank for your question.

The behaviour is not totally unexpected.
Unfortunately, using dilated filters is not super trivial when using steerable filters, since dilated filters can produce angular aliasing issues (due to their sparsity).

As a short answer, you can try to pass the argument frequencies_cutoff=1. when you use diltation=2 to obtain a similar number of parameters.

Here is the long answer:

First of all, recall what a basis for the steerable filters looks like. See Figure 2 in https://arxiv.org/pdf/1711.07289.pdf

The equivariant property only constraints the angular part of the filters but not the radial one.
Therefore, we split the radial part in a number of independent rings.
In a normal (dense) filter, larger rings are sampled on a larger numbers of cells of the filter.
This allows one to also consider higher frequencies for the angular component of the largest rings.

The perfect trade off for the number of frequencies to use in each ring is hard to estimate theoretically.
What we did, instead, was to manually search for combinations which were containing sufficiently high frequencies while not introducing too much aliasing.

The default parameters in R2Conv use our manually tuned trade-off, which works quite well for dense filters, but is not tuned for sparse filters like dilated ones.
This means that, if you are using a 3x3 dilated filter with dilation 2, it corresponds to a 5x5 filter and you will sample high frequencies as if your 5x5 filter is dense.

This is the reason why dilated filters have more parameters.

I would actually recommend trying to use a stronger frequency cut-off when using dilated filters.
You can tune this with the parameter frequencies_cutoff in R2Conv.

Have a look at this answer, where I gave an a bit more detailed explaination of the parameters you need to tune: #18 (comment)
It is interesting for you from the sentence "A steerable filter is in general split in multiple rings,.....".

In your case, when using diltation, the set of rings is computed as if the filter is not dilated and then the filters are scaled by the dilation; see this line.
For instance, a 3x3 filter with dilation 2 has two rings at radii 0. and 2. (the center of the filter and a ring that passes through the cell in position (2, 2) of the 5x5 grid).
The default policy associates a maximum frequency of 0 at radius 0 and 3 at radius 2; see this line (where r is the radius and you can ignore the max_radius in this case).
The idea of this "policy" is that on radius r you can generally sample frequencies up to 2*r (with some correction for the largest rings since they can partially fall outside the grid), but it assumes dense filters such that larger rings are sampled on more cells.
I would recomment to use at most frequency 2 for the outer ring of radius 2. This should also give you the same number of parameters of the dense 3x3 filter.
You can do so by passing the argument frequencies_cutoff=1., which is interpreted as allowing max frequency 1. * r = r at radius r.

Does this make sense for you?

Gabriele

from e2cnn.

Gabri95 avatar Gabri95 commented on May 28, 2024 1

Hi @purse1996

I think you may have some issue with the frequencies cutoff.

If you use a 3x3 filter with dilation D, the outer pixels will have radius D.
You frequency cutoff policy allows frequencies up to 3*D to be sampled there. However, such dilated filter is very sparse.
In particular, the orbit of a pixel will be sampled at most on 4 locations, so I'd recommend not using frequencies higher than 2.
You could use frequencies_cutoff = lambda r: min(r, 2) such that

  • in the central pixel you have max frequency = 0
  • on other pixels you have max frequency = 2

However, keep in mind you filter is still very sparse, which also means that it will most likely not be very stable to continuous rotations (but should still be equivariant to 90 deg ones).
Does this help?

Gabriele

from e2cnn.

kinalmehta avatar kinalmehta commented on May 28, 2024

Hi @Gabri95 ,

Thanks for such a detailed answer.
The solution worked.

My steerable Convolution concepts are a bit week, but referring to the answer gave me a decent overview of why there are different number of parameters in the two case.

I am using dilated convolution during evaluation and training the model using (max-pool+non-dilated) version.
Do you think this will adversely effect the prediction?

Thanks again
Kinal

from e2cnn.

Gabri95 avatar Gabri95 commented on May 28, 2024

hi @kinalmehta

I am happy it was useful :)

This is hard to tell a priori.
Using pooling (especially max-pooling) in general introduces aliasing issues whcih break equivariance (even translation equivariance).
Still, in Deep learning we usually use deep networks with max-pooling and find great results; so I don't expect any significant additional adversely effect with respect to a conventional CNN.
Actually, the fact the steerable filters are bandlimited and rather smooth should help and make downsampling rather stable.

However, you will probably observe some more noise when checking explicitly the rotation equivariance of the model.

In any case, you can always try to experiment a bit with different bandlimiting of the filters to find a better trade-off for the smoothness of the filters (which reduces the equivariance error).

If you find some interesting result, I'd be curious to hear about it so, please, let me know :)

Best,
Gabriele

from e2cnn.

purse1996 avatar purse1996 commented on May 28, 2024

I want to use R2Conv in atrous spatial pyramid pooling(ASPP), whose dilation is 12, 24, 36. But the result is very very pool. The code is as follows. Could you give some suggestions?

conv3x3(in_dim, reduction_dim, dilation=r, padding=r)(r=12, 24, 36)
def conv3x3(inplanes, out_planes, stride=1, padding=1, groups=1, dilation=1):
"""3x3 convolution with padding"""
in_type = FIELD_TYPE['regular'](gspace, inplanes)
out_type = FIELD_TYPE['regular'](gspace, out_planes)
return enn.R2Conv(in_type, out_type, 3,
stride=stride,
padding=padding,
groups=groups,
bias=False,
dilation=dilation,
sigma=None,
frequencies_cutoff=lambda r: 3 * r,
initialize=False)

from e2cnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.