Well done, I've also been studying multimodal tasks recently, if I have video frame le

Do I need position Embedding before CrossAttentionLayer? about multimodal-action-recognition HOT 2 CLOSED

akashe commented on September 22, 2024

Do I need position Embedding before CrossAttentionLayer?

from multimodal-action-recognition.

Comments (2)

akashe commented on September 22, 2024

I didn't understand the question completely. If you meant should you add positional embeddings to your video embeddings then yes, you should. It should work better if you add a positional embedding to your video embeddings because there is no notion of sequential information in video embeddings by default. With positional info, the attention mechanisms can extract features that are changing over time. Hope, that helped.

from multimodal-action-recognition.

Breeze-Zero commented on September 22, 2024

I didn't understand the question completely. If you meant should you add positional embeddings to your video embeddings then yes, you should. It should work better if you add a positional embedding to your video embeddings because there is no notion of sequential information in video embeddings by default. With positional info, the attention mechanisms can extract features that are changing over time. Hope, that helped.

Thanks, I use efficientNet to get the video frame feature and Bert to get the text feature. Then, I want to refer to your CrossAttentionLayer for modal fusion, so I am confused about whether position Embedding is required

from multimodal-action-recognition.

Related Issues (6)

Recommend Projects

Do I need position Embedding before CrossAttentionLayer? about multimodal-action-recognition HOT 2 CLOSED

Comments (2)

Related Issues (6)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent