Giter Site home page Giter Site logo

Comments (7)

ap229997 avatar ap229997 commented on May 27, 2024

The fusion can also be done at 64x64 resolution but that would be too computationally expensive since a transformer is used (quadratic complexity due to attention), so I reduced the size to 8x8 at each resolution of the intermediate feature maps.

from transfuser.

roger-cv avatar roger-cv commented on May 27, 2024

Thanks for your quick reply. I guess that the input feature map of the transformer of each layer will be downsampled to 8*8 according to what you mean?

from transfuser.

ap229997 avatar ap229997 commented on May 27, 2024

that's correct, now there are several variants of transformer which address the quadratic complexity issue of the transformer (eg. Linformer) so maybe it's possible to use the transformer without downsampling.

from transfuser.

roger-cv avatar roger-cv commented on May 27, 2024

that's correct, now there are several variants of transformer which address the quadratic complexity issue of the transformer (eg. Linformer) so maybe it's possible to use the transformer without downsampling.

Ok, Another interesting question is that can this fusion fashion based on the transformer be replaced with other transformers, such as swim or PVT. Because I notice that this transformer is developed based on the GPT suited for the NLP area.

from transfuser.

ap229997 avatar ap229997 commented on May 27, 2024

I agree, architecture design can be improved quite a bit.

from transfuser.

roger-cv avatar roger-cv commented on May 27, 2024

Ok, Nice work, Thanks for your reply.

from transfuser.

Kin-Zhang avatar Kin-Zhang commented on May 27, 2024

But it may require more resources to train...

I agree, architecture design can be improved quite a bit.

from transfuser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.