Comments (4)
@lingzhi98 thanks for filing and timely. We actually just hit this as it was causing some numerical differences at half precision for llama 2. I do wonder if we could flip this safely, the best default behavior would be to follow the default dtype strategy whatever that is. cc @tirthasheshpatel who was working on this for llama.
I'll try to check out if this is a safe change to make (particularly with Gemma). As long as we make sure to cast to float32 before any output softmax during sampling, seems like we should be ok.
from keras-nlp.
Hi @lingzhi98 ,
As per documentation that states the reason for this as below.
keras-nlp/keras_nlp/layers/modeling/reversible_embedding.py
Lines 52 to 54 in 1889369
from keras-nlp.
Thanks. My understanding is that use float32 here to ensure the stability of training, is it necessary for inference also?
from keras-nlp.
Here's the fix, will push this out next week #1548
from keras-nlp.
Related Issues (20)
- add StableDiffusionV1
- add StableDiffusionV2
- add YOLOV8 HOT 1
- add BASNet HOT 1
- add DeepLabV3Plus
- add Segformer HOT 4
- add videoClassifier
- Add ViT HOT 1
- StartEndPacker left padding HOT 2
- KerasNLP Bug/Error at Docstring/Documentation class example provided.
- Does Keras-NLP support (Q)Dora?
- [keras-hub]Issue in running recently added models HOT 5
- How to add presets with custom license HOT 2
- `from_preset` issues for huggingface/transformers checkpoint converters HOT 1
- Add coverage report for KerasNLP
- Add Support for Dynamically Specifying Layers in enable_lora Method
- π’ KerasNLP is becoming KerasHub π’ HOT 1
- Buggy BloomCausalLM text generation
- π Contributing to KerasHub π HOT 1
- πΊοΈ KerasHub Roadmap πΊοΈ
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from keras-nlp.