Giter Site home page Giter Site logo

Attention Mask about ict HOT 4 CLOSED

raywzy avatar raywzy commented on June 15, 2024
Attention Mask

from ict.

Comments (4)

raywzy avatar raywzy commented on June 15, 2024

Thanks for your interests!

  1. This option should be enabled all the time :)
  2. To make the transformer aware of the mask positions, we use special tokens to denote the missing elements of input sequence following original BERT model in NLP. You could also try to add such constraints while performing attention to see what will happen.
  3. Great question! I think it relys on how you sample the exact token from current distribution. Selecting the token with top probability will lead to good pixel quality but low diversities, and vice versa. In the paper we just adopt the most naive way for sampling. Considering the property of bi-directional transformer, I believe there exists more robust and efficient sampling strategy could be explored :)

from ict.

Janspiry avatar Janspiry commented on June 15, 2024

Thanks for your reply. I completely understood 1 and 2.
About question 3, I want to confirm the question with you. I want to know about why we can select the pixel in first masked position each sampling like code in sample_mask function Line98, uitl.py , since the model generate all masked pixels a time which have no difference in mask region in training. Is it more reasonable to select the position who have the highest/Top-K probability in the rest of mask region each sampling.

from ict.

raywzy avatar raywzy commented on June 15, 2024

Thanks for your reply. I completely understood 1 and 2.
About question 3, I want to confirm the question with you. I want to know about why we can select the pixel in first masked position each sampling like code in sample_mask function Line98, uitl.py , since the model generate all masked pixels a time which have no difference in mask region in training. Is it more reasonable to select the position who have the highest/Top-K probability in the rest of mask region each sampling.

Absolutely. You could try the mentioned sampling strategy to see if the performance will become better. Currently I just sample the token in the sequential order.

from ict.

Janspiry avatar Janspiry commented on June 15, 2024

I see, thanks!

from ict.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.