Comments (5)
The load model should be fast compared to the download, as it does not compile anything, and it will help validate you have downloaded the right artifact. And I believe safe tensors, which we want to make the default, load parameters lazily, so it should be even less work.
from bumblebee.
Maybe the right idea but this makes it kind of rough if you just want to load/upload the model on machine that isn't set up for inference. Or at least with the times I'm seeing.
Example: https://huggingface.co/google-bert/bert-base-cased
Download, clocked by counting out loud while the progress bars were going:
~14 seconds
Total load_model
execution time:
76 seconds
Tested with:
:timer.tc(fn -> Bumblebee.load_model({:hf, "google-bert/bert-base-cased"}) end) |> elem(0) |> then(& &1 / 1000) |> IO.inspect(label: "ms")
No configuration done at all.
from bumblebee.
Ideally I'd love to stream the download from hugging face to an S3-compatible but that is further out of scope from what Bumblebee is about.
from bumblebee.
You can take files from HF repository and put in S3 or wherever, then when you download onto the local machine use {:file, path_to_repo_dir}
(just make sure you don't copy parameter files in multiple formats, as that would be unnecessary).
In the future we may have our own serialisation format for things, but I don't think we should be exposing the download of hf/transformers files.
76 seconds
You'd need to use EXLA.Backend, because there are some transformations that are going to be slow otherwise.
from bumblebee.
That was a lot faster. I can make do.
from bumblebee.
Related Issues (20)
- Improve the error message when Hugging Face resource isn't found a the referenced location. HOT 1
- `TextEmbedding` crashes when both Mean Pooling and `compile` opts is specified HOT 2
- Featurizer different from python? HOT 10
- Weird behaviour with progress status. HOT 2
- Error when using TinyLlama HOT 2
- Confidences/probabilities for Whisper results
- Parameter persistence with sharding support HOT 4
- Tied word embeddings HOT 2
- [Documentation] Image struct is not correct in Livebook Examples HOT 2
- Token Classification error in Livebook HOT 1
- Running Whisper using bf16 fails HOT 1
- Bumblebee error when serving Mistral LLM with latest 0.5.0 release HOT 1
- Unknown error serving Llama 2 derivative model HOT 2
- Llama 2 derivative model errors expecting top_k to be provided HOT 2
- CUDA 12.2 support HOT 3
- Huggingface repository not found - `bert-base-google-bert` HOT 1
- Add support for "google/gemma-7b-it" HOT 1
- Support Mixtral HOT 4
- llama3 requires 2 eos_token_id's HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bumblebee.