The agop_feature_learning from aradha

Clarification about verifying NFM Ansatz for Transformer

Hi I just wanted some clarification on the logic when verifying the NFM ansatz for transformers as implemented in /ansatz_verification/transformers/language_models/verify_ansatz.py

In check_ansatz you seem to be applying dropout to the input to our original transformer before running it through all the layers - why is this done?

 with torch.no_grad():
            n, t = x.shape
            pos = torch.arange(0, t, dtype=torch.long, device='cuda')
            z = Q.drop(Q.wpe(pos) + Q.wte(x))

            for i in range(layer_idx):
                z = Q.h[i](z)

Directly after the above step, z is the output after running the input tensor through layer_idx+1 full blocks of our transformer, but we then seem to arbitrarily apply the layernorm for the layer_idx+1th block corresponding to its CausalSelfAttention module- why is this done? From my understanding z should already be normalized (and in particular, has already passed through ln_2).

 with torch.no_grad():
            ln_z1 = Q.h[layer_idx].ln_1(z).unsqueeze(0)
            ln_z2 = deepcopy(ln_z1)
            ln_z3 = deepcopy(ln_z1)
            ln_z = deepcopy(ln_z1)

Also, since it seems like the goal is to evaluate the NFM for attention, why are we taking the output of each block after it has already passed through the MLP layers for the corresponding block - shouldn't we hook in and get it right after it passes through our attention block?
When we actually seek to get the jacobian of the model we run the renormalized ln_z1,...,ln_z back through the network using the newGPT object. Why are we passing the outputs of the network from the block at layer_idx back through the same network? Shouldn't we be passing in tokens corresponding to the text that we care about (i.e. x) and then extract the gradients for the attention matrices we care about?

 J = get_jacobian(newGPT, [z.unsqueeze(0), ln_z1, ln_z2, ln_z3, ln_z],
                                 BATCH_SIZE=BATCH_SIZE,
                                 IDX=IDX,
                                 OUT_SIZE=OUT_SIZE,
                                 OUT_IDX=OUT_IDX)

It's possible that I'm misunderstanding the implementation somewhat but I'd love to apply this work to some other transformer analysis and I'd like to know how these quantities are actually computed in your paper - thank you!

aradha / agop_feature_learning Goto Github PK

agop_feature_learning's People

Contributors

Stargazers

Watchers

Forkers

agop_feature_learning's Issues

Clarification about verifying NFM Ansatz for Transformer

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent