Comments (3)
Spliting kv cache into key cache and value cache is also important (https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/gemma/gemma_attention.py#L166).
from keras-nlp.
@lingzhi98 thanks! We are planning some generation improvements so will definitely check this out. Agreed we can let performance be our guide. Probably particularly jax compiled performance.
Were you thinking of a specific backend/compiled with XLA/not compiled? What's motivating the suggestion?
from keras-nlp.
I use jax as keras backend. I have seen the concatenation become the main overhead if increasing batch size. Due to keep kv caches as one tensor, we need slice the kv cache to get corresponding key/value cache to compute attention output and then update cache. Dynamic update slice fusion will blocked by this slice op (https://github.com/openxla/xla/blob/main/xla/service/gpu/ir_emission_utils.cc#L472) and hurts performance again.
from keras-nlp.
Related Issues (20)
- Add ViT HOT 1
- StartEndPacker left padding HOT 2
- KerasNLP Bug/Error at Docstring/Documentation class example provided.
- Does Keras-NLP support (Q)Dora?
- [keras-hub]Issue in running recently added models HOT 5
- How to add presets with custom license HOT 2
- `from_preset` issues for huggingface/transformers checkpoint converters HOT 1
- Add coverage report for KerasNLP
- Add Support for Dynamically Specifying Layers in enable_lora Method
- π’ KerasNLP is becoming KerasHub π’ HOT 1
- Buggy BloomCausalLM text generation
- π Contributing to KerasHub π HOT 1
- πΊοΈ KerasHub Roadmap πΊοΈ
- Guide for multi-host distributed training with KerasHub
- Move generation functionality to base classes HOT 1
- Directly use the backbone functional graph for `CausalLM.generate()`
- Add support for JetStream generative inference for all KerasHub LLMs HOT 2
- Add YOLOV10 to KerasHub
- TokenAndPositionEmbedding missing from keras_nlp.layers
- Add FLUX to KerasHub
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from keras-nlp.