🚀 Feature request Example code to produce the (supercool!) Adapte

Example code for the Inter-Adapter attention plots in Adapter Fusion about adapters HOT 2 OPEN

adapter-hub commented on May 21, 2024

Example code for the Inter-Adapter attention plots in Adapter Fusion

from adapters.

Comments (2)

daandouwe commented on May 21, 2024 1

Thanks for the clarification! This addressed all my questions.

I will give this a try!

from adapters.

arueckle commented on May 21, 2024

I agree, this is a quite useful feature of AdapterFusion!

For instance, we leverage the recent_attention in our AdapterDrop paper for pruning AdapterFusion (§4.2). Hence, in #84, we will clean this up and add documentation on how to read out the fusion weights.

To answer your questions:

Should I understand this to be the attention displayed in the above figure? But how do I get something of shape [num_adapters, num_adapters]?

That is correct. If you consider only one downstream task you could obtain a tensor of shape [n_layers, n_adapters, seq_len]. Averaging over the last dimension gives you: [n_layers, n_adapters]. In the AdapterFusion paper, I believe we also averaged over n_layers resulting in [n_adapters]. If you now repeat this for several tasks, you get [n_tasks, n_adapters], where n_tasks ==n_adapters in the special case you are referring to.

Accessing the stored attention tensors [...] How should I do this?

One way could be:

model.roberta.encoder.layer[layer_i].output.adapter_fusion_layer['<name of the fusion layer>'].recent_attention

We will add a cleaned-up variant of this soon.

Would that address your issue?

from adapters.

Recommend Projects

Example code for the Inter-Adapter attention plots in Adapter Fusion about adapters HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent