I just try to do some small changes on model '2b' 1, Limit max_position_embeddings

Is max_position_embeddings=8096 neccessary in 2b model? about gemma_pytorch HOT 3 OPEN

google commented on August 9, 2024

Is max_position_embeddings=8096 neccessary in 2b model?

from gemma_pytorch.

Comments (3)

agiwave commented on August 9, 2024

Maybe we can extend Gemma context-length to unlimited size(depend on the compress-rate) in this way(with limited kv-cache length - 256 or little more?) in linear complexity.

from gemma_pytorch.

pengchongjin commented on August 9, 2024

2, Trim kv-cache in GemmaAttention to max_position_embeddings(256).

Do you mean using a slide window of size 256 as you generate the output tokens?

I think this is an interesting observation. I believe there are some related work in the literature which tries to use sliding window to extrapolate the context. It sounds like you are doing similar things.

from gemma_pytorch.

agiwave commented on August 9, 2024

Yeah. I try to limit max_position_embeddings to 256 and generate beyond 400 tokens answer. It looks work well. I wish the model can compress the context info far beyond 256 token first. So, I tried it. But, unfortunately. I told the model my name first, and followed by 300 tokens about another infos. At last , I try to ask Gemma " Do you know what's my name". Gemma couldn't give me the right answer. So gemma has no sliding windows memory. This test only work in 256 tokens（Attention scope）. Emm, a little bit lose here :).

from gemma_pytorch.

Recommend Projects

Is max_position_embeddings=8096 neccessary in 2b model? about gemma_pytorch HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent