Comments (3)
@jinmang2 rotary embeddings is the best type of relative positional encoding, if you do not need to extrapolate to sequence lengths longer than what was trained on. it was proven out in PaLM! enough said! :)
from retro-pytorch.
@jinmang2 oh hey! yes, sorry the logic wasn't quite correct, fixed in f2d2815
so this was actually a discovery by EleutherAI and the building of GPT-J by Wang & Komatsuzaki et al. They found out that only rotating a part of the head dimension (and leaving the rest unrotated) leads to even better performance. I believe Deepmind picked up this practice in their latest language model (Gopher or something else)
And yes, you are correct that rotary can be split two ways! But as long as they match up later, it doesn't matter (and I believe I have it matched up correctly, but do submit a PR if you find otherwise)
from retro-pytorch.
I thought that Shaw' RPE was simply used in the RETRO, but I didn't know there was such an insight!
Thanks for the feedback :) issue closing
from retro-pytorch.
Related Issues (20)
- RuntimeError: Error in void faiss::gpu::GpuIndexIVFPQ::verifySettings_() HOT 3
- Double [CLS] token in the first doc chunk HOT 1
- Retro-fitting a pretrained model HOT 7
- Clarification on Architecture
- Scann vs faiss HOT 6
- 'NoneType' object is not callable HOT 1
- Is there any pre-trained RETRO model released yet? HOT 4
- Huggingface model
- I am revising the model to solve QA task.. HOT 1
- How to give Prompt to trained RETRO Model? HOT 6
- Why are there so many position embeddings? HOT 5
- Causal mask in Chunked Cross Attention
- Error # could not open .tmp/.index/knn.index for reading: No such file or directory
- Question-Answer Dataset Format ?
- AttributeError: module 'faiss' has no attribute 'GpuParameterSpace' HOT 2
- Question: residual connect after `ChunkedCrossAttention`? HOT 5
- Convert embedded tokens to English
- how to deal with the problem ,
- Use my own dataset to train/finetune RETRO and evaluate
- No embeddings found in folder .tmp/embeddings
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from retro-pytorch.