Comments (7)
I'm interested in this as well but I haven't had time to work on it. The original paper "retrofitted" T5 by adding additional cross-attentions between the pretrained model and the KB retrieval/chunk system. They claimed it only took a small number of training steps to teach the revised model to utilize the new cross-attentions. I'm assuming this involved training all model weights on a masking task the same way that was done in the original pretraining.
It shouldn't be too difficult to hack up the Huffingface model code to add the cross attentions and then use the information retrieval components from here. I'll probably try this sometime in the next few months. I'm more interested in the Bart model so I was planning to work on that, not T5. Let me know if you or someone else get to it first.
from retro-pytorch.
Thank you for your implementation!
I'm interested in how would you add CCA to Bart, in encoder or decoder? If in encoder, CCA is causal, How would you recommend solving this. If in decoder, retrieval needs 64 token at least. If generated text less than 64 token, retrieval would not be used.
Thanks!
from retro-pytorch.
Has anyone of you worked on the retrofitting part yet?
from retro-pytorch.
I haven't had the time and although I'm still somewhat insterested, realistically I probably won't get to this.
It might be worth emailing the authors of the original paper to see if they'd be willing to post that code or provide additional information on the retrofitting process. As I recall, there was only a paragraph or so on it. Seems like there's a number of details it would be helpful if they could provide.
from retro-pytorch.
Yup, let me email them and hopefully they respond.
from retro-pytorch.
I just got a no response from them
from retro-pytorch.
Hey there, anyone had any time to work on this?
from retro-pytorch.
Related Issues (20)
- Double [CLS] token in the first doc chunk HOT 1
- Clarification on Architecture
- Scann vs faiss HOT 6
- 'NoneType' object is not callable HOT 1
- Is there any pre-trained RETRO model released yet? HOT 4
- Huggingface model
- I am revising the model to solve QA task.. HOT 1
- How to give Prompt to trained RETRO Model? HOT 6
- Why are there so many position embeddings? HOT 5
- Causal mask in Chunked Cross Attention
- Error # could not open .tmp/.index/knn.index for reading: No such file or directory
- Question-Answer Dataset Format ?
- AttributeError: module 'faiss' has no attribute 'GpuParameterSpace' HOT 2
- Question: residual connect after `ChunkedCrossAttention`? HOT 5
- Convert embedded tokens to English
- how to deal with the problem , HOT 1
- Use my own dataset to train/finetune RETRO and evaluate
- No embeddings found in folder .tmp/embeddings
- Clarification about the code.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from retro-pytorch.