Giter Site home page Giter Site logo

roformer's People

Contributors

zhuiyitechnology avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

roformer's Issues

Reference to prior work

Hey Everybody,

first of all congrats to the paper, it's really interesting and am looking forward to seeing its impact on the space!

I wanted to point out that a few colleagues and me followed a similar approach to introduce translational equivariance in kernelizable attention (as is implemented in the Performer) for image classification tasks and posted it on ArXiv in the beginning of February https://arxiv.org/abs/2102.07680.

While the approach proposed in your work is more generic, we would highly appreciate if you could also refer to our prior work in the publication.

Best,
Max

Changing bases and sampling rate

Hi,

I am a hobbyist in AI, and I am interested to hear your thoughts on researching further developing RoPE.

Recent advancement in rotary token encoding with a focus on extending tokens by interpolation has merged with NTK theory, which necessarily requires a higher-order non-linear basis for encoding data so that NN can learn both low and high-frequency features instead of converging to low freq feat early.

The discussion below shows extending the context range by using a "simply non-linearly interpolated basis w/o fine tuning" could achieve comparable results with a "fine-tuned model on longer tokens with linearly interpolated basis":
https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/
It's like zero-shot extending token length.

The following CoLab from the above discussion has demonstrated a non-linear power-series bases have achieved good results on 8k tokens inputs (which the meta paper on RoPE interpolation https://arxiv.org/abs/2306.15595)
https://colab.research.google.com/drive/1VI2nhlyKvd5cw4-zHvAIk00cAVj2lCCC#scrollTo=b80b3f37

Some typos on your paper

Hello @ZhuiyiTechnology

In the section Instruction on the first page.

The sequential order of words is of great value to natural language understanding. "Recurrent neural networks (RRNs)"
based models encode tokens’ order by recursively computing a hidden state along the time dimension.

It should be RNN, right ?

I have found that there are two parts in the first page that using RRN not RNN.

It's not a big deal to read this great work, just want to mention.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.