Hi, thanks for your nice works. There are some details that bothered me. I would

Thanks for your interests! This option should be enabled all t

Attention Mask about ict HOT 4 CLOSED

raywzy commented on August 25, 2024

Attention Mask

from ict.

Comments (4)

raywzy commented on August 25, 2024

Thanks for your interests!

This option should be enabled all the time :)
To make the transformer aware of the mask positions, we use special tokens to denote the missing elements of input sequence following original BERT model in NLP. You could also try to add such constraints while performing attention to see what will happen.
Great question! I think it relys on how you sample the exact token from current distribution. Selecting the token with top probability will lead to good pixel quality but low diversities, and vice versa. In the paper we just adopt the most naive way for sampling. Considering the property of bi-directional transformer, I believe there exists more robust and efficient sampling strategy could be explored :)

from ict.

Janspiry commented on August 25, 2024

Thanks for your reply. I completely understood 1 and 2.
About question 3, I want to confirm the question with you. I want to know about why we can select the pixel in first masked position each sampling like code in sample_mask function Line98, uitl.py , since the model generate all masked pixels a time which have no difference in mask region in training. Is it more reasonable to select the position who have the highest/Top-K probability in the rest of mask region each sampling.

from ict.

raywzy commented on August 25, 2024

Thanks for your reply. I completely understood 1 and 2.
About question 3, I want to confirm the question with you. I want to know about why we can select the pixel in first masked position each sampling like code in sample_mask function Line98, uitl.py , since the model generate all masked pixels a time which have no difference in mask region in training. Is it more reasonable to select the position who have the highest/Top-K probability in the rest of mask region each sampling.

Absolutely. You could try the mentioned sampling strategy to see if the performance will become better. Currently I just sample the token in the sequential order.

from ict.

Janspiry commented on August 25, 2024

I see, thanks!

from ict.

Recommend Projects

Attention Mask about ict HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent