Giter Site home page Giter Site logo

Comments (3)

lucidrains avatar lucidrains commented on July 17, 2024 2

@jinmang2 rotary embeddings is the best type of relative positional encoding, if you do not need to extrapolate to sequence lengths longer than what was trained on. it was proven out in PaLM! enough said! :)

from retro-pytorch.

lucidrains avatar lucidrains commented on July 17, 2024 1

@jinmang2 oh hey! yes, sorry the logic wasn't quite correct, fixed in f2d2815

so this was actually a discovery by EleutherAI and the building of GPT-J by Wang & Komatsuzaki et al. They found out that only rotating a part of the head dimension (and leaving the rest unrotated) leads to even better performance. I believe Deepmind picked up this practice in their latest language model (Gopher or something else)

And yes, you are correct that rotary can be split two ways! But as long as they match up later, it doesn't matter (and I believe I have it matched up correctly, but do submit a PR if you find otherwise)

from retro-pytorch.

jinmang2 avatar jinmang2 commented on July 17, 2024 1

I thought that Shaw' RPE was simply used in the RETRO, but I didn't know there was such an insight!
Thanks for the feedback :) issue closing

from retro-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.